/Tech13h ago

Amiri Hayes and MIT researchers replace transformer attention heads with human-readable Python code using program synthesis

Story Overview

Amiri Hayes along with MIT CSAIL colleagues has built a pipeline that turns selected attention heads inside models like GPT-2 and Llama-3B into short, executable Python functions. The programs are synthesized from observed attention patterns on datasets such as TinyStories, then swapped directly back into the transformer stack.

93493423053.5K

#36

Original post

Amiri Hayes@amirihayes_

What if attention were code? We show that many attention heads in transformer LMs can be replaced by human-readable Python programs. Swap them in and the model barely notices.

See our experiments here: Explaining Attention with Program Synthesis [https://arxiv.org/abs/2606.19317]

8:42 AM · Jun 29, 2026 · 30.3K Views

Open Question

Performance stays close after swaps

Hybrid models that replace up to a quarter of attention heads show only modest perplexity rises and hold steady on standard QA benchmarks. The work does not claim every head can be replaced or that the approach scales unchanged to frontier models.

Developer Impact

Symbolic stand-ins open new inspection routes

Because the replacements are ordinary Python rather than post-hoc descriptions, researchers can now read, edit, or verify what individual heads compute. The paper releases the full library of 1,664 programs and the synthesis code for others to test.

Sentiment

Users are excited about researchers replacing transformer attention heads with readable Python programs, praising the work as very cool and a big mechanistic interpretability win.

Pos

100.0%

Neg

0.0%

7 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

ARXIV.ORGVia

Posts from X

Most Activity

VIEWS17.7KBOOKMARKS72LIKES122

Jacob Andreas@jacobandreas

👉 New preprint! Automated interpretability by approximating / replacing NN components (here attention heads) with programs.

Amiri Hayes@amirihayes_

What if attention were code? We show that many attention heads in transformer LMs can be replaced by human-readable Python programs. Swap them in and the model barely notices.

See our experiments here: Explaining Attention with Program Synthesis [https://arxiv.org/abs/2606.19317]

13h17.7K12272

RETWEETS22

Amiri Hayes@amirihayes_

What if attention were code? We show that many attention heads in transformer LMs can be replaced by human-readable Python programs. Swap them in and the model barely notices.

See our experiments here: Explaining Attention with Program Synthesis [https://arxiv.org/abs/2606.19317]

13h30.3K206151

REPLIES1

pplank8@pplank8

@amirihayes_ @VictorTaelin

10h4961

Yash Sarrof@yashYRS

@amirihayes_ Very cool work, this paper might be related and interesting for you (very similar themes as yours): https://arxiv.org/abs/2602.08857

7h2.4K2029

Laura Ruis@LauraRuis

Very cool work ⬇️

Amiri Hayes@amirihayes_

What if attention were code? We show that many attention heads in transformer LMs can be replaced by human-readable Python programs. Swap them in and the model barely notices.

See our experiments here: Explaining Attention with Program Synthesis [https://arxiv.org/abs/2606.19317]

13h5.4K207

KTibow@KTibow

@amirihayes_ https://github.com/AmiriHayes/explaining_attention_heads/blob/main/data%2Fgpt2_programs.py

4h28311

Adrian Chan@gravity7

Explaining Attention with Program Synthesis asks a sharp question: what are attention heads actually doing, and can we describe it precisely?

- The paper uses program synthesis to automatically generate symbolic descriptions of attention head behavior, moving beyond heatmaps toward interpretable rules. - Synthesized programs capture the computational patterns of individual heads in a human-readable form, making mechanistic claims testable. - The approach surfaces structured, repeatable behaviors rather than treating attention as a black box of soft weights.

Lines of inquiry it opens: - Can symbolic mechanisms improve transformer compositional abilities? - Do transformers learn generalizable algorithms or instance-based patterns? - Why do standard transformers fail on problems requiring serial algorithmic reasoning? https://inquiringlines.com/related/2606-19317-explaining-attention-with-program-synthesis/

4h8311

Camilla Montonen@spimescape

@amirihayes_ wow this seems wild! taking a look!

11h8234

vik@vikhyatk

@amirihayes_ very cool!

5h6693

Blue 🐋@BlueWhaleFlys

@amirihayes_ Everything is pattern recognition.

9h5471

Sean Cantrell@ThePremiseOfIt

@amirihayes_ Wow. Neat. Big mech int win

4h1142

Belinda Li@belindazli

New paper! We introduce a new automated interpretability technique where attention heads are explained with Python programs. Turns out you can drop-in replace ~40% of attention patterns in Llama-3B with outputs of these programs and barely affect task performance!

More broadly, I’m excited about the actionable implications of this technique: Understanding attention phenomena has historically led to architectural improvements in Transformers (see e.g. attention sinks), and I’m excited about the potential for this technique to uncover more such opportunities.

Make sure to check out @amirihayes_ thread below! ⬇️

Amiri Hayes@amirihayes_

What if attention were code? We show that many attention heads in transformer LMs can be replaced by human-readable Python programs. Swap them in and the model barely notices.

See our experiments here: Explaining Attention with Program Synthesis [https://arxiv.org/abs/2606.19317]

6h6010

Taelin@VictorTaelin

@pplank8 @amirihayes_ thanks for pinging me here!!

10h591

Srivatsa Bhargava@jsbhargava07

@amirihayes_ @AbhishekAs34298

5h156

Yves St Langevin@arxivmerchant

@amirihayes_ @fchollet

3h55

Ravid Shwartz Ziv@ziv_ravid

Very cool work!

Amiri Hayes@amirihayes_

What if attention were code? We show that many attention heads in transformer LMs can be replaced by human-readable Python programs. Swap them in and the model barely notices.

See our experiments here: Explaining Attention with Program Synthesis [https://arxiv.org/abs/2606.19317]

1h3400

jeff@jffbrwn2

@amirihayes_ very cool

24m31

Haoyang Su@HaoyangSu_

@amirihayes_ great work!

14m20