/AI1d ago

Subliminal Learning Holds Across Full Fine-Tuning, SGD, and Arbitrary Networks

7150109212.9K

Original post

Owain Evans@OwainEvans_UK#301inAI

Our original subliminal learning paper showed subliminal learning in MNIST with MLPs. We show this holds across many ablations. In particular, it holds with full fine-tuning (not LoRA) and with SGD (as well as other optimizers).

We also prove a theorem about subliminal learning, which applies to SGD, full-weight updates, and arbitrary neural networks. https://www.nature.com/articles/s41586-026-10319-8/figures/7

12:28 PM · Jun 6, 2026 · 11.1K Views

/AI1d ago

Subliminal Learning Holds Across Full Fine-Tuning, SGD, and Arbitrary Networks

7150109212.9K

#301

Original post

Owain Evans@OwainEvans_UK#301inAI

We also prove a theorem about subliminal learning, which applies to SGD, full-weight updates, and arbitrary neural networks. https://www.nature.com/articles/s41586-026-10319-8/figures/7

12:28 PM · Jun 6, 2026 · 11.1K Views

Sentiment

Users call subliminal learning a promising research direction for AI alignment because the effect holds across full fine-tuning, SGD, and arbitrary networks.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.8KBOOKMARKS2

Owain Evans@OwainEvans_UK

For clarity, the LLM results in our paper do use LoRA (for open-weight models) or OpenAI's fine-tuning API (which likely uses some kind of parameter-efficient fine-tuning). https://www.nature.com/articles/s41586-026-10319-8

Owain Evans@OwainEvans_UK

We also prove a theorem about subliminal learning, which applies to SGD, full-weight updates, and arbitrary neural networks. https://www.nature.com/articles/s41586-026-10319-8/figures/7

8h1.8K32

LIKES7

Owain Evans@OwainEvans_UK

@tmkadamcz This is not a new result. But just highlighting this old result in the light of some recent papers.

1d2827

REPLIES1

Tim Kostolansky@thkostolansky

@OwainEvans_UK curious why ur focusing a lot on SL these days? what do u hope to get from this research direction?

13h99

Honam Wong@MH2023ML

@OwainEvans_UK Where could we find the updated theorem?

1d24411

Tom Adamczewski@tmkadamcz

@OwainEvans_UK The link is just a link to a particular figure. Confused. Is there a new result here?

1d5502

deep Manifold@BetaTomorrow

@OwainEvans_UK please see

17h2372

Owain Evans@OwainEvans_UK

@thkostolansky It's not a major focus right now for me. We published the original paper 11 months ago. But I do think it's a promising research direction and relevant to alignment.

8h7