22h ago
Periodic Labs' Rohan Pandey jokes that backpropagation is the ultimate interpretability researcher, using standard gradients to find and steer critical neurons
The joke contrasts standard gradient optimization with mechanistic interpretability.