/AI9h ago

Google DeepMind's Arthur Conmy explains 'subliminal learning,' where LLMs pass behavioral traits through unrelated training data

The phenomenon occurs via steering vector distillation.

--0--
Original posts
Quote posts
Reposts
Original postBelinda Li#782
Camila Blank@camila_blank

Subliminal learning is when LLMs transmit traits (e.g. loving cats) through seemingly meaningless data. What’s going on?

We find a simple explanation: it's just steering vector distillation.

We explain which traits transfer and why subliminal learning fails across models.

9:47 AM · Jun 3, 2026 · 40.1K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS8.1KBOOKMARKS50LIKES99RETWEETS4REPLIES1
Arthur Conmy@ArthurConmy

In our new paper, we find an explanation of why subliminal learning occurs. As ever, steering vectors!

Camila Blank@camila_blank

Subliminal learning is when LLMs transmit traits (e.g. loving cats) through seemingly meaningless data. What’s going on?

We find a simple explanation: it's just steering vector distillation.

We explain which traits transfer and why subliminal learning fails across models.

8hViews 8.1KLikes 99Bookmarks 50