/AI1d ago

Subliminal Learning Holds Across Full Fine-Tuning, SGD, and Arbitrary Networks

7150109212.9K
Original post
Owain Evans@OwainEvans_UK#301inAI

Our original subliminal learning paper showed subliminal learning in MNIST with MLPs. We show this holds across many ablations. In particular, it holds with full fine-tuning (not LoRA) and with SGD (as well as other optimizers).

We also prove a theorem about subliminal learning, which applies to SGD, full-weight updates, and arbitrary neural networks. https://www.nature.com/articles/s41586-026-10319-8/figures/7

12:28 PM · Jun 6, 2026 · 11.1K Views
Sentiment

Users call subliminal learning a promising research direction for AI alignment because the effect holds across full fine-tuning, SGD, and arbitrary networks.

Pos
100.0%
Neg
0.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.8KBOOKMARKS2
Owain Evans@OwainEvans_UK

For clarity, the LLM results in our paper do use LoRA (for open-weight models) or OpenAI's fine-tuning API (which likely uses some kind of parameter-efficient fine-tuning). https://www.nature.com/articles/s41586-026-10319-8

Owain Evans@OwainEvans_UK

Our original subliminal learning paper showed subliminal learning in MNIST with MLPs. We show this holds across many ablations. In particular, it holds with full fine-tuning (not LoRA) and with SGD (as well as other optimizers).

We also prove a theorem about subliminal learning, which applies to SGD, full-weight updates, and arbitrary neural networks. https://www.nature.com/articles/s41586-026-10319-8/figures/7

8hViews 1.8KLikes 3Bookmarks 2
LIKES7
Owain Evans@OwainEvans_UK

@tmkadamcz This is not a new result. But just highlighting this old result in the light of some recent papers.

1dViews 282Likes 7
REPLIES1
Tim Kostolansky@thkostolansky

@OwainEvans_UK curious why ur focusing a lot on SL these days? what do u hope to get from this research direction?

13hViews 99
Honam Wong@MH2023ML

@OwainEvans_UK Where could we find the updated theorem?

1dViews 244Likes 1Bookmarks 1
Tom Adamczewski@tmkadamcz

@OwainEvans_UK The link is just a link to a particular figure. Confused. Is there a new result here?

1dViews 550Likes 2
deep Manifold@BetaTomorrow

@OwainEvans_UK please see

17hViews 237Likes 2
Owain Evans@OwainEvans_UK

@thkostolansky It's not a major focus right now for me. We published the original paper 11 months ago. But I do think it's a promising research direction and relevant to alignment.

8hViews 7