METR Finds No Major Differences In AI Model Chain-Of-Thought Legibility

VIEWS102BOOKMARKS1LIKES5REPLIES1

METR looked at the raw CoT directly to check if models were reasoning in unintelligible text.

The models did sometimes do weird things in their output or CoT (raw and summarized), but IMO it looked more normal than the screenshots. Here are some examples: https://metr.org/blog/2026-05-19-frontier-risk-report/#nonstandard-language

3h10251