6h ago

NYU's Tal Linzen and collaborators argue LLMs lack genuine introspection, explaining self-reporting as anomaly detection and confabulation

A separate study found AI introspection is content-agnostic.

3402133.4K

——0——

Original post

#224@TALLINZENOP

Shauli Ravfogel@RAVFOGEL

1/ Can LLMs introspect, i.e., reason about their internal states? Recent work claims LLMs notice when their "thoughts" get tampered with, and can report their content. We looked closely and we think it's too early to say that. Work led by @shashwat_s19 , with @tallinzen and me.

6:16 AM · May 28, 2026

QUOTE POST

#1038Raphaël Millière@RAPHAELMILLIERE

Great work! See also https://arxiv.org/abs/2603.05414 from @LedermanHarvey & @kmahowald

This is a nice cautionary tale about Morgan's canon in interpretability: "introspection" here is closer to anomaly detection with confabulation than to direct/privileged access to injected content.

Shauli Ravfogel@ravfogel

1:16 PM · May 28, 2026 · 2.5K Views

4:36 PM · May 28, 2026 · 999 Views