9h ago

Anthropic Paper Shows LLMs Gain Self-Recognition via Post-Training

202903018119.5K

——0——

Original post

Evidence that post-training gives models a "self-recognition" capability, manifesting as higher confidence when continuing their own text than reading others' text. I think this opens up an exciting line of inquiry into the emergence of "selfhood" in models via post-training!

8:53 PM · May 25, 2026

#1341Jack Lindsey@JACK_W_LINDSEY

One somewhat surprising finding here is that on-policy RL is not required to instill self-recognition! SFT is sufficient, and (off-policy) DPO adds some more juice

Jack Lindsey@Jack_W_Lindsey

3:53 AM · May 26, 2026 · 18.3K Views

4:07 AM · May 26, 2026 · 1.2K Views

Anthropic Paper Shows LLMs Gain Self-Recognition via Post-Training

Sentiment

Cluster engagement