9h ago

Anthropic Paper Shows LLMs Gain Self-Recognition via Post-Training

0
Original post

Evidence that post-training gives models a "self-recognition" capability, manifesting as higher confidence when continuing their own text than reading others' text. I think this opens up an exciting line of inquiry into the emergence of "selfhood" in models via post-training!

8:53 PM · May 25, 2026 View on X

One somewhat surprising finding here is that on-policy RL is not required to instill self-recognition! SFT is sufficient, and (off-policy) DPO adds some more juice

Jack LindseyJack Lindsey@Jack_W_Lindsey

Evidence that post-training gives models a "self-recognition" capability, manifesting as higher confidence when continuing their own text than reading others' text. I think this opens up an exciting line of inquiry into the emergence of "selfhood" in models via post-training!

3:53 AM · May 26, 2026 · 18.3K Views
4:07 AM · May 26, 2026 · 1.2K Views