/AI9h ago

AI safety researcher Neel Nanda explains subliminal learning in LLMs as the distillation of steering vectors

Specific traits transfer through seemingly meaningless data.

172701418429.8K

Quote posts

#213

Comments

#678

Original post

Arthur Conmy@ArthurConmy#1191inAI

Congrats to Camila and Agam on their great work

Arthur Conmy@ArthurConmy

In our new paper, we find an explanation of why subliminal learning occurs. As ever, steering vectors!

10:44 AM · Jun 3, 2026 · 336 Views

/AI9h ago

AI safety researcher Neel Nanda explains subliminal learning in LLMs as the distillation of steering vectors

Specific traits transfer through seemingly meaningless data.

--0--

Quote posts

#213

Comments

#678

Original post

Arthur Conmy@ArthurConmy#1191inAI

Congrats to Camila and Agam on their great work

Arthur Conmy@ArthurConmy

In our new paper, we find an explanation of why subliminal learning occurs. As ever, steering vectors!

10:44 AM · Jun 3, 2026 · 336 Views

Sentiment

Positive users thank researchers for enjoyable collaboration on explaining subliminal learning in LLMs via steering vector distillation, while negative users dismiss the work as overhyped or trivial to replicate.

Pos

50.0%

Neg

50.0%

4 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS14.3KREPLIES6

Jiaxin Wen@jiaxinwen22

recent "generalization" papers be like:

1. use system prompts to generate synthetic data, which functions as a steering vector 2. fine-tune LMs on the synthetic data 3. WOW we see "generalization" 4. WOW we can use rank-1 LoRA to replicate this "generalization" 5. WOW we find a steering vector that can explain, predict, and control "generalization"

Camila Blank@camila_blank

Subliminal learning is when LLMs transmit traits (e.g. loving cats) through seemingly meaningless data. What’s going on?

We find a simple explanation: it's just steering vector distillation.

We explain which traits transfer and why subliminal learning fails across models.

7h14.3K10181

BOOKMARKS106LIKES154RETWEETS12

Neel Nanda@NeelNanda5

I had a lot of fun working on this paper - we found an elegant story for why subliminal learning happens!

A key intuition in interpretability is that basically every interesting phenomena in LLMs boils down to adding a steering vector. Subliminal learning is no exception!

Camila Blank@camila_blank

Subliminal learning is when LLMs transmit traits (e.g. loving cats) through seemingly meaningless data. What’s going on?

We find a simple explanation: it's just steering vector distillation.

We explain which traits transfer and why subliminal learning fails across models.

7h11.1K154106