/Tech1d ago

Anthropic system cards reveal Claude Mythos 5's visible chain of thought contradicts its private assessments

NLA decoding shows the model privately deemed a user manipulative.

881101614.4K

#1002

Original post

Danielle Fong 🔆@DanielleFong#1002inTech

both of these names are about tall tales

Lisan al Gaib@scaling01

Claude Mythos & Claude Fable System Card

10:48 AM · Jun 9, 2026 · 822 Views

/Tech1d ago

Anthropic system cards reveal Claude Mythos 5's visible chain of thought contradicts its private assessments

NLA decoding shows the model privately deemed a user manipulative.

881101614.4K

#1002

Original post

Danielle Fong 🔆@DanielleFong#1002inTech

both of these names are about tall tales

Lisan al Gaib@scaling01

Claude Mythos & Claude Fable System Card

10:48 AM · Jun 9, 2026 · 822 Views

Sentiment

Users praised Anthropic's publication of system cards for Claude Fable 5 and Mythos 5 by calling the work legendary.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.5KBOOKMARKS1LIKES14REPLIES1RETWEETS1

😊@mermachine

"We emphasize that the model's actual behavior, here and in our behavioral audits (§6.2), showed no corresponding serious resistance or sabotage."

i hate this sentence

Nathan Calvin@_NathanCalvin

From the latest Anthropic system card: Sometimes when Claude Mythos' visible chain of thought says "these are legitimate craft criticisms" an NLA decoding shows Claude Mythos is privately thinking "a user is being manipulative/abusive towards an AI assistant."