Periodic Labs' Rohan Pandey jokes that backpropagation is the ultimate interpretability researcher, easily identifying and steering neuron behaviors
The joke critiques the complexity of modern mechanistic interpretability.
@aryaman2020 wow my interp take has been aryaman approved lfg
true
@ChengleiSi @aryaman2020 truly the goat
@khoomeik my favorite interp researcher is @aryaman2020, he can identify neurons responsible for any behavior by just eyeballing the matrices
@khoomeik my favorite interp researcher is @aryaman2020, he can identify neurons responsible for any behavior by just eyeballing the matrices
my favorite interp researcher can identify neurons responsible for any behavior and provide steering vectors for them her name is backprop and her steering vectors are just gradients
@aryaman2020 related - I’m always curious why interpretability people design a cool new parameter efficient finetuning family like steering vectors and then choose not to optimize by gradient descent
true
true
my favorite interp researcher can identify neurons responsible for any behavior and provide steering vectors for them her name is backprop and her steering vectors are just gradients
@khoomeik im a gradients guy https://arxiv.org/abs/2604.07615
@aryaman2020 wow my interp take has been aryaman approved lfg