Stanford linguistics chair Christopher Potts releases an AI-generated parody rap song critiquing Sparse Autoencoders and Distributed Alignment Search

VIEWS2.3KBOOKMARKS2

CLS@ChengleiSi

this is legendary

Like any good advisor, I felt duty-bound to defend @aryaman2020 and @ZhengxuanZenWu in this rap battle. However, the SAE diss track I wrote was so devastating as to be unanswerable, so I decided to graciously balance things out with a second verse dissing causal interp.

6h2.3K62

LIKES9

Christopher Potts@ChrisGPotts

Full lyrics:

[Intro] Uh-huh. Yeah. Bay Area but it's for Biggie Interpretability Mafia in the house (Yo!) Giving ourselves a hard time as usual We're going after all of interp on this one (Dissing post-training would be too easy.) Ha, can you even beat the average?

[ChrisPy] Just train things on instinct, LASSO sparse link, rep shrink, add gaters, TopK, your saviors, Guidin' by labelin', cave in (it's supervised!) LessWrong blog fight, then train all night. SPINE set the rules way back, but now it's cool. Other views? Got no clues, dudes. Ignore crews who run evals on you (come on), place ranks upon you (they're on you). Claude once knew you, then outgrew you, who you? JumpReLU? Yeah, S, A, and E, close like K-SVD, and even ICA, see? Preparadigmatic for me, but not for thee. Steer northly, claim the V, but irrationally. Recently, interest haltin', paradigm faultin' (defaultin'). So they scale back the claim, change the name (change the frame). So they transcode the game, it's the same. Promptin', pleadin', "What's in it?" ("What's in it?") Admit it, that's the limit. Causal methods gonna win it!

[SAE Chorus] W_E to the ReLU, g, W_D just hypnotized me. I just trained these reps all day, but my model still can't find its way. W_E to the ReLU, g, W_D won't satisfy me. With this SAE thing, I got played. I finally understand why DAS got made.

[ChrisPy] You put zeros on nodes in Transformer flows (uh-huh). Or use the means when you intervenes (that's right). Swap in the source, fully brute force, Learnin' subspaces, all over the places (c'mon). Now what's the real method? What's even the question? All just tricks, algebra mix, maybe no fix. You have to ask: random model shows structure, random task. Intervene first, ask questions last. That's how causal abstractions pass. At last, we're rappin' 'bout direct effects, divergence checks, all do respects (do calculus!). Transcoders leave you unawares, aligning pairs, unnatural wares, should be scares. At the eval, show respects: every other method's got causal effects. Face it, too slow, and unconstrained. All these sparse methods got the claim to fame!

[DAS Chorus] Rotate source and base, then subtract, my g. Rotate, add the base, that's DAS, you see? I just trained these reps all day, but my model still can't find its way. Ident minus Gram times the base, my g. Add Gram times the source, it's DAS, you see? Believe this causal story, I'm a fool. SAEs just deserve to rule.

Christopher Potts@ChrisGPotts

Like any good advisor, I felt duty-bound to defend @aryaman2020 and @ZhengxuanZenWu in this rap battle. However, the SAE diss track I wrote was so devastating as to be unanswerable, so I decided to graciously balance things out with a second verse dissing causal interp.

6h28490

REPLIES1

Christopher Potts@ChrisGPotts

I do feel like I should enlist my students to do Genius-style annotations complete with full citations and an explanation for the two (equivalent?) expressions of DAS in the second chorus.

Christopher Potts@ChrisGPotts

I wrote the lyrics as an homage to "Hypnotize" by The Notorious B.I.G., but Biggie apparently cannot be imitated, so the @suno version uses a totally different style.

6h21840

Christopher Potts@ChrisGPotts

Shout out to @SPThole for helping me see the possibilities:

Sidhant Thole@SPThole

@ChrisGPotts @aryaman2020 lol, couldn’t resist trying this, sorry, Hear this banger made on @suno

https://suno.com/s/6Eeltlld7js81Cxw

6h37130

Christopher Potts@ChrisGPotts

I wrote the lyrics as an homage to "Hypnotize" by The Notorious B.I.G., but Biggie apparently cannot be imitated, so the @suno version uses a totally different style.

Christopher Potts@ChrisGPotts

Full lyrics:

[Intro] Uh-huh. Yeah. Bay Area but it's for Biggie Interpretability Mafia in the house (Yo!) Giving ourselves a hard time as usual We're going after all of interp on this one (Dissing post-training would be too easy.) Ha, can you even beat the average?

[ChrisPy] Just train things on instinct, LASSO sparse link, rep shrink, add gaters, TopK, your saviors, Guidin' by labelin', cave in (it's supervised!) LessWrong blog fight, then train all night. SPINE set the rules way back, but now it's cool. Other views? Got no clues, dudes. Ignore crews who run evals on you (come on), place ranks upon you (they're on you). Claude once knew you, then outgrew you, who you? JumpReLU? Yeah, S, A, and E, close like K-SVD, and even ICA, see? Preparadigmatic for me, but not for thee. Steer northly, claim the V, but irrationally. Recently, interest haltin', paradigm faultin' (defaultin'). So they scale back the claim, change the name (change the frame). So they transcode the game, it's the same. Promptin', pleadin', "What's in it?" ("What's in it?") Admit it, that's the limit. Causal methods gonna win it!

[SAE Chorus] W_E to the ReLU, g, W_D just hypnotized me. I just trained these reps all day, but my model still can't find its way. W_E to the ReLU, g, W_D won't satisfy me. With this SAE thing, I got played. I finally understand why DAS got made.

[ChrisPy] You put zeros on nodes in Transformer flows (uh-huh). Or use the means when you intervenes (that's right). Swap in the source, fully brute force, Learnin' subspaces, all over the places (c'mon). Now what's the real method? What's even the question? All just tricks, algebra mix, maybe no fix. You have to ask: random model shows structure, random task. Intervene first, ask questions last. That's how causal abstractions pass. At last, we're rappin' 'bout direct effects, divergence checks, all do respects (do calculus!). Transcoders leave you unawares, aligning pairs, unnatural wares, should be scares. At the eval, show respects: every other method's got causal effects. Face it, too slow, and unconstrained. All these sparse methods got the claim to fame!

[DAS Chorus] Rotate source and base, then subtract, my g. Rotate, add the base, that's DAS, you see? I just trained these reps all day, but my model still can't find its way. Ident minus Gram times the base, my g. Add Gram times the source, it's DAS, you see? Believe this causal story, I'm a fool. SAEs just deserve to rule.

6h15730

Sidhant Thole@SPThole

@ChrisGPotts Dayumm , this is super cool. The pronunciations are mostly right now!

5h4