you found the deception circuit. congratulations. there are several others!
Does mechanistic interpretability really find the circuit?
Our new paper, "All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs," (Accepted by ICML 2026) suggests the answer may be: not always.
A common implicit assumption in mechanistic interpretability is that a model's behavior is explained by the circuit — a sparse, canonical, almost-unique mechanism.
Instead, for the same LLM task, we find multiple circuits/sheaves that are: ✅ faithful ✅ sparse ✅ structurally different ✅ low-overlap
This means a discovered circuit may not be the unique mechanism behind a behavior, but one realization among many possible mechanisms. We call for rethinking how circuit/sheaf discovery results should be interpreted and evaluated.
Huge thanks to my amazing collaborators: @frankniujc, @YutongYin774638, and @zhaoran_wang
Paper: http://arxiv.org/abs/2605.12671
#MechanisticInterpretability #LLM #AI #MachineLearning


