Frontier Atlas / 2026

Distilled is an independent project inspired by the explanatory craft of Distill.pub.

Thirty papers for entering the frontier.

An editorial research map for machine learning, biology, and neuroscience: what to read, what concepts to master, and where the frontier lies.

Read papers Interactive tools

Papers that changed the world

Machine Learning / arXiv

Attention Is All You Need

Vaswani et al.

Read explainer

Attention makes sequence modeling a content-addressable memory problem.

Why it matters

Replaced recurrence with self-attention and made scalable sequence modeling the default substrate for modern AI.

Mental model

Imagine every token writing a small query to the rest of the sentence: who has information I need right now? Keys answer that query, values carry the payload, and the weighted sum becomes the token's updated state.

Core concept

Tokens route information to each other through learned query-key-value matching.

Mechanism

Embed tokens into vectors that carry identity, position, and local context.
Project each vector into query, key, and value spaces so matching and payload are separated.
Take query-key dot products, normalize them with softmax, and use the weights to blend values.
Repeat this in multiple heads so syntax, reference, locality, and long-range dependencies can be represented in parallel.

Frontier Move

Modern frontier work asks how far this content-addressable memory can stretch: longer context, cheaper attention, multimodal tokens, and reasoning traces that use attention over intermediate work.

querykeyvalueheadsoftmaxresidual stream

Learning tools / not decorations

Interactive explainers are here to make the papers operational.

Every full blog post uses inline labs like these to turn the paper's core idea into something you can manipulate. The goal is not to add visual garnish. It is to expose the object, signal, and intervention: what the paper measures, what makes it improve, and what knob a researcher can turn next.

1. Make the object visible.Attention rows, genome context, residue contacts, spike traces, and connectome paths become concrete.

2. Move the critical variable.Sliders and toggles show which assumption carries the result and where the mechanism breaks.

3. Return to the paper sharper.The demos give you a mental model before the math, methods, and ablations get dense.

Machine learning / attention

The routing table

A transformer layer turns every token into a query and asks which keys should be read. The heatmap is the softmax of QK scores: sharper temperature makes a head behave like a hard pointer; warmer temperature makes it blend evidence across the sentence.

Selected routing row: RNA -> RNA. Strongest value source: enzyme, then mixed with the rest of the row.

theenzymebindsRNAthenfoldstheenzymebindsRNAthenfolds

the

enzyme

binds

RNA

then

folds

Softmax temperature

Attention head 2

Machine learning / scaling

The frontier optimizer

Scaling papers made capability forecastable. The useful mental model is not bigger is better; it is a constrained optimization over parameters, tokens, and compute. Move away from the Chinchilla-like ridge and loss gets worse.

Balance: 98%. Estimated loss 2.03. Bottleneck: balanced.

Parameters

Training tokens

Compute budget

Machine learning / diffusion

The score field

Diffusion models learn a vector field that points noisy samples back toward data. The paper's core trick is turning generation into many small denoising decisions instead of one impossible jump.

At step 42, iterative denoising is still broad and exploratory.

Denoising step

Biology / structure

Distance constraints

AlphaFold-style models are easiest to understand as constraint machines: evolutionary statistics predict which residues want to be near each other, then geometry searches for a fold that satisfies the map.

Binding-site perturbation changes the local pocket while leaving far contacts mostly stable.

Binding-site perturbation

Biology / genomics

Regulatory grammar

Frontier genomic models treat DNA like a long program: a mutation can matter because it changes a motif locally, or because it changes which distant enhancer can talk to a promoter.

Context window: 58 kb. Longer context exposes distal regulatory arcs instead of isolated letters.

Context window

Neuroscience / connectomics

Graph propagation

Connectomics turns anatomy into an executable hypothesis: if this cell fires, where should activity go next? The frontier is linking wiring diagrams to measured dynamics and behavior.

Synaptic gain 52% makes recurrent paths decay through the circuit.

Synaptic gain

Neuroscience / spikes

Hodgkin-Huxley engine

Hodgkin and Huxley made the action potential a dynamical system. Sodium conductance creates positive feedback; potassium conductance restores the membrane so the spike is brief instead of explosive.

Subthreshold: leak and potassium recovery keep the cell near resting potential.

Sodium conductance

Potassium conductance

Stimulus current

Neuroscience / population codes

Manifold and surprise

Modern neuroscience papers often trade single-cell stories for population geometry: activity is a point moving through a low-dimensional manifold, and learning reshapes where that point wants to go.

Stimulus manifolds sharpen in early sensory cortex, then spread into task-aligned subspaces.

Sensory stateEarly sensory evidence sharpens before later regions attach value and action.

Sensory evidence