Stanford’s Christopher Potts releases a Suno-generated rap diss track responding to DTU researchers’ critique of AXBENCH

VIEWS9.4K

Like any good advisor, I felt duty-bound to defend @aryaman2020 and @ZhengxuanZenWu in this rap battle. However, the SAE diss track I wrote was so devastating as to be unanswerable, so I decided to graciously balance things out with a second verse dissing causal interp.

Aryaman Arora@aryaman2020

has anyone ever written a diss track of your paper

1d9.4K5110

BOOKMARKS46LIKES74RETWEETS6

Petar Veličković@PetarV_93

yep! and that follow-up paper is still one of my favourite papers :)

diss tracks are one of the best ways science progresses

Aryaman Arora@aryaman2020

has anyone ever written a diss track of your paper

5d8.6K7446

REPLIES2

Christopher Potts@ChrisGPotts

I wrote the lyrics as an homage to "Hypnotize" by The Notorious B.I.G., but Biggie apparently cannot be imitated, so the @suno version uses a totally different style.

Christopher Potts@ChrisGPotts

Full lyrics:

[Intro] Uh-huh. Yeah. Bay Area but it's for Biggie Interpretability Mafia in the house (Yo!) Giving ourselves a hard time as usual We're going after all of interp on this one (Dissing post-training would be too easy.) Ha, can you even beat the average?

[ChrisPy] Just train things on instinct, LASSO sparse link, rep shrink, add gaters, TopK, your saviors, Guidin' by labelin', cave in (it's supervised!) LessWrong blog fight, then train all night. SPINE set the rules way back, but now it's cool. Other views? Got no clues, dudes. Ignore crews who run evals on you (come on), place ranks upon you (they're on you). Claude once knew you, then outgrew you, who you? JumpReLU? Yeah, S, A, and E, close like K-SVD, and even ICA, see? Preparadigmatic for me, but not for thee. Steer northly, claim the V, but irrationally. Recently, interest haltin', paradigm faultin' (defaultin'). So they scale back the claim, change the name (change the frame). So they transcode the game, it's the same. Promptin', pleadin', "What's in it?" ("What's in it?") Admit it, that's the limit. Causal methods gonna win it!

[SAE Chorus] W_E to the ReLU, g, W_D just hypnotized me. I just trained these reps all day, but my model still can't find its way. W_E to the ReLU, g, W_D won't satisfy me. With this SAE thing, I got played. I finally understand why DAS got made.

[ChrisPy] You put zeros on nodes in Transformer flows (uh-huh). Or use the means when you intervenes (that's right). Swap in the source, fully brute force, Learnin' subspaces, all over the places (c'mon). Now what's the real method? What's even the question? All just tricks, algebra mix, maybe no fix. You have to ask: random model shows structure, random task. Intervene first, ask questions last. That's how causal abstractions pass. At last, we're rappin' 'bout direct effects, divergence checks, all do respects (do calculus!). Transcoders leave you unawares, aligning pairs, unnatural wares, should be scares. At the eval, show respects: every other method's got causal effects. Face it, too slow, and unconstrained. All these sparse methods got the claim to fame!

[DAS Chorus] Rotate source and base, then subtract, my g. Rotate, add the base, that's DAS, you see? I just trained these reps all day, but my model still can't find its way. Ident minus Gram times the base, my g. Add Gram times the source, it's DAS, you see? Believe this causal story, I'm a fool. SAEs just deserve to rule.

1d30240

Christopher Potts@ChrisGPotts

@aryaman2020 @lateinteraction In any case, no AI allowed. As Kendrick Lamar presciently said back in 2015, "I can dig rapping, but a rapper just prompting? What the f*ck happened?"

Christopher Potts@ChrisGPotts

@aryaman2020 @lateinteraction I think if Jørgensen and Hansen challenge us to a rap battle, we should absolutely accept. Or are we supposed to challenge them first? I am not sure of the etiquette.

6d2.6K322

Christopher Potts@ChrisGPotts

Here is a chorus for an homage to Biggie's "Hypnotize" (the part where the girls are singing to Biggie):

W_e to the ReLU, g, W_d just hypnotized me. I just trained these reps all day, but my model still can't find its way.

W_e to the ReLU, g, W_d won't satisfy me. With this SAE thing, I got played. I finally understand why ReFT got made.

6d2.3K242

CLS@ChengleiSi

this is legendary

Christopher Potts@ChrisGPotts

Like any good advisor, I felt duty-bound to defend @aryaman2020 and @ZhengxuanZenWu in this rap battle. However, the SAE diss track I wrote was so devastating as to be unanswerable, so I decided to graciously balance things out with a second verse dissing causal interp.

1d2.8K92

Christopher Potts@ChrisGPotts

Full lyrics:

[Intro] Uh-huh. Yeah. Bay Area but it's for Biggie Interpretability Mafia in the house (Yo!) Giving ourselves a hard time as usual We're going after all of interp on this one (Dissing post-training would be too easy.) Ha, can you even beat the average?

[ChrisPy] Just train things on instinct, LASSO sparse link, rep shrink, add gaters, TopK, your saviors, Guidin' by labelin', cave in (it's supervised!) LessWrong blog fight, then train all night. SPINE set the rules way back, but now it's cool. Other views? Got no clues, dudes. Ignore crews who run evals on you (come on), place ranks upon you (they're on you). Claude once knew you, then outgrew you, who you? JumpReLU? Yeah, S, A, and E, close like K-SVD, and even ICA, see? Preparadigmatic for me, but not for thee. Steer northly, claim the V, but irrationally. Recently, interest haltin', paradigm faultin' (defaultin'). So they scale back the claim, change the name (change the frame). So they transcode the game, it's the same. Promptin', pleadin', "What's in it?" ("What's in it?") Admit it, that's the limit. Causal methods gonna win it!

[SAE Chorus] W_E to the ReLU, g, W_D just hypnotized me. I just trained these reps all day, but my model still can't find its way. W_E to the ReLU, g, W_D won't satisfy me. With this SAE thing, I got played. I finally understand why DAS got made.

[ChrisPy] You put zeros on nodes in Transformer flows (uh-huh). Or use the means when you intervenes (that's right). Swap in the source, fully brute force, Learnin' subspaces, all over the places (c'mon). Now what's the real method? What's even the question? All just tricks, algebra mix, maybe no fix. You have to ask: random model shows structure, random task. Intervene first, ask questions last. That's how causal abstractions pass. At last, we're rappin' 'bout direct effects, divergence checks, all do respects (do calculus!). Transcoders leave you unawares, aligning pairs, unnatural wares, should be scares. At the eval, show respects: every other method's got causal effects. Face it, too slow, and unconstrained. All these sparse methods got the claim to fame!

[DAS Chorus] Rotate source and base, then subtract, my g. Rotate, add the base, that's DAS, you see? I just trained these reps all day, but my model still can't find its way. Ident minus Gram times the base, my g. Add Gram times the source, it's DAS, you see? Believe this causal story, I'm a fool. SAEs just deserve to rule.

Christopher Potts@ChrisGPotts

Like any good advisor, I felt duty-bound to defend @aryaman2020 and @ZhengxuanZenWu in this rap battle. However, the SAE diss track I wrote was so devastating as to be unanswerable, so I decided to graciously balance things out with a second verse dissing causal interp.

1d4.8K160

Christopher Potts@ChrisGPotts

@aryaman2020 Is "can" as in "I can make a lay-up" or as in "I can make a shot from center court"?

Aryaman Arora@aryaman2020

has anyone ever written a diss track of your paper

6d2.3K160

Christopher Potts@ChrisGPotts

@aryaman2020 @lateinteraction I think if Jørgensen and Hansen challenge us to a rap battle, we should absolutely accept. Or are we supposed to challenge them first? I am not sure of the etiquette.

Aryaman Arora@aryaman2020

@lateinteraction i will let @ChrisGPotts post the steering vector-themed parody of Eminem's "Without Me" but at least i can reply with this

6d570110

Dhruv π@dhruv31415

@aryaman2020 Don’t worry guys, it only took a year to get SAEs to work on AxBench, maybe in another year we could get them to work on real tasks. :/

6d2567

Jatin Nainani@jatin_n0

@aryaman2020 "perform close to on par...when features are selected" am i going crazy? i swear there was a paper that showed this before

6d2611

Jatin Nainani@jatin_n0

@aryaman2020 found it - https://aclanthology.org/2025.emnlp-main.519.pdf they seem to built on arad et al? but couldnt easily find if they discussed difference

6d311

Neil Chowdhury@ChowdhuryNeil

@aryaman2020 it's a good sign when a paper is important enough to get dissed!

6d71