6h ago

NYU's Tal Linzen and collaborators argue LLMs lack genuine introspection, explaining self-reporting as anomaly detection and confabulation

A separate study found AI introspection is content-agnostic.

0
Original post

1/ Can LLMs introspect, i.e., reason about their internal states? Recent work claims LLMs notice when their "thoughts" get tampered with, and can report their content. We looked closely and we think it's too early to say that. Work led by @shashwat_s19 , with @tallinzen and me.

6:16 AM · May 28, 2026 View on X

Great work! See also https://arxiv.org/abs/2603.05414 from @LedermanHarvey & @kmahowald

This is a nice cautionary tale about Morgan's canon in interpretability: "introspection" here is closer to anomaly detection with confabulation than to direct/privileged access to injected content.

Shauli RavfogelShauli Ravfogel@ravfogel

1/ Can LLMs introspect, i.e., reason about their internal states? Recent work claims LLMs notice when their "thoughts" get tampered with, and can report their content. We looked closely and we think it's too early to say that. Work led by @shashwat_s19 , with @tallinzen and me.

1:16 PM · May 28, 2026 · 2.5K Views
4:36 PM · May 28, 2026 · 999 Views