Full lyrics:
[Intro]
Uh-huh.
Yeah.
Bay Area but it's for Biggie
Interpretability Mafia in the house (Yo!)
Giving ourselves a hard time as usual
We're going after all of interp on this one
(Dissing post-training would be too easy.)
Ha, can you even beat the average?
[ChrisPy]
Just train things on instinct,
LASSO sparse link, rep shrink,
add gaters, TopK, your saviors,
Guidin' by labelin', cave in (it's supervised!)
LessWrong blog fight, then train all night.
SPINE set the rules way back, but now it's cool.
Other views? Got no clues, dudes.
Ignore crews who
run evals on you (come on),
place ranks upon you (they're on you).
Claude once knew you, then outgrew you, who you?
JumpReLU?
Yeah, S, A, and E,
close like K-SVD,
and even ICA, see?
Preparadigmatic for me, but not for thee.
Steer northly, claim the V, but irrationally.
Recently, interest haltin', paradigm faultin' (defaultin').
So they scale back the claim, change the name (change the frame).
So they transcode the game, it's the same.
Promptin', pleadin', "What's in it?" ("What's in it?")
Admit it, that's the limit.
Causal methods gonna win it!
[SAE Chorus]
W_E to the ReLU, g,
W_D just hypnotized me.
I just trained these reps all day,
but my model still can't find its way.
W_E to the ReLU, g,
W_D won't satisfy me.
With this SAE thing, I got played.
I finally understand why DAS got made.
[ChrisPy]
You put zeros on nodes in Transformer flows (uh-huh).
Or use the means when you intervenes (that's right).
Swap in the source, fully brute force,
Learnin' subspaces, all over the places (c'mon).
Now what's the real method?
What's even the question?
All just tricks,
algebra mix,
maybe no fix.
You have to ask:
random model shows structure, random task.
Intervene first, ask questions last.
That's how causal abstractions pass.
At last, we're rappin' 'bout direct effects,
divergence checks,
all do respects (do calculus!).
Transcoders leave you unawares,
aligning pairs,
unnatural wares,
should be scares.
At the eval, show respects:
every other method's got causal effects.
Face it, too slow,
and unconstrained.
All these sparse methods got the claim to fame!
[DAS Chorus]
Rotate source and base, then subtract, my g.
Rotate, add the base, that's DAS, you see?
I just trained these reps all day,
but my model still can't find its way.
Ident minus Gram times the base, my g.
Add Gram times the source, it's DAS, you see?
Believe this causal story, I'm a fool.
SAEs just deserve to rule.