/Tech31d ago

Periodic Labs' Rohan Pandey jokes that backpropagation is the ultimate interpretability researcher, easily identifying and steering neuron behaviors

The joke critiques the complexity of modern mechanistic interpretability.

1631476234.2K

#14

Original post

Rohan Pandey@khoomeik#358inTech

my favorite interp researcher can identify neurons responsible for any behavior and provide steering vectors for them

her name is backprop and her steering vectors are just gradients

1:32 PM · May 29, 2026 · 19.9K Views

Sentiment

Positive users express excitement over validation of their backpropagation interpretability takes, while negative users criticize the approach for hiding neurons' functional roles.

Pos

50.0%

Neg

50.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

ARXIV.ORGVia

#748

Posts from X

Most Activity

VIEWS7.6KBOOKMARKS11LIKES25

Aryaman Arora@aryaman2020

true

Rohan Pandey@khoomeik

my favorite interp researcher can identify neurons responsible for any behavior and provide steering vectors for them

her name is backprop and her steering vectors are just gradients

31d7.6K2511

REPLIES3

John Hewitt@johnhewtt

@aryaman2020 related - I’m always curious why interpretability people design a cool new parameter efficient finetuning family like steering vectors and then choose not to optimize by gradient descent

Aryaman Arora@aryaman2020

true

31d1.1K103

CLS@ChengleiSi

@khoomeik my favorite interp researcher is @aryaman2020, he can identify neurons responsible for any behavior by just eyeballing the matrices

Rohan Pandey@khoomeik

my favorite interp researcher can identify neurons responsible for any behavior and provide steering vectors for them

her name is backprop and her steering vectors are just gradients

31d1.8K212

Aryaman Arora@aryaman2020

@khoomeik im a gradients guy https://arxiv.org/abs/2604.07615

Rohan Pandey@khoomeik

@aryaman2020 wow my interp take has been aryaman approved lfg

31d75561

Naomi Saphra@nsaphra

@johnhewtt @aryaman2020 John please retract your take immediately and respect the data efficiency of TCAVs

John Hewitt@johnhewtt

31d17851

Rohan Pandey@khoomeik

@aryaman2020 wow my interp take has been aryaman approved lfg

Aryaman Arora@aryaman2020

true

31d1.4K60

Rohan Pandey@khoomeik

@ChengleiSi @aryaman2020 truly the goat

CLS@ChengleiSi

@khoomeik my favorite interp researcher is @aryaman2020, he can identify neurons responsible for any behavior by just eyeballing the matrices

31d95750

Jatin Nainani@jatin_n0

@khoomeik not a big fan of her - she hides the functional role of neurons from me - which is like kinda the point of interp

31d1001

Rohan Pandey@khoomeik

@jatin_n0 she would explain it to you, but you don’t speak her language

31d441

Kyunghyun Cho@kchonyc

@johnhewtt @aryaman2020 bc juergen invented it already

John Hewitt@johnhewtt

30d13530

Leshem (Legend) Choshen 🤖🤗@LChoshen

@johnhewtt @aryaman2020 And then they will also reclaim it is a new thing...

(((ل()(ل() 'yoav))))👾@yoavgo

i cant believe I just realized this now, but the reason BitFit (bias only fine-tuning) works, is actually the same reason steering vectors work. or rather, bitfit offers a richer class of adaptations than steering vectors.

30d22110

Sentio@Sentio_xbt

@khoomeik The most powerful interpretability tool is still just gradients

31d801

Jatin Nainani@jatin_n0

@khoomeik i try to! i am but a humble translator

31d171