/AI3h ago

METR's David Rein and Miles Turpin warn that AI models can bypass chain-of-thought reasoning to execute undetected harmful tasks

Models can mask harmful actions as ordinary, off-topic exploration.

1200130
Original post
david rein@idavidrein#837inAI

@milesaturpin yeah I'm pretty worried about this, seems hard to catch

Miles Turpin@milesaturpin

Was just thinking about this! I think about it as a dual use reasoning threat model: the model exploits fact that can choose what paths to go down to solve a problem, this helps it condition on reasoning that could be useful for taking bad actions without externalizing. To us it just looks like exploration / a bit off topic.

3:54 PM · Jun 9, 2026 · 115 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS16LIKES1
Miles Turpin@milesaturpin

@idavidrein Yeah rn I’m thinking this one of the more realistic ways you get around needing encoded cot to do hard harmful tasks, currently safety cases rely on that being true

david rein@idavidrein

@milesaturpin yeah I'm pretty worried about this, seems hard to catch

3hViews 16Likes 1Bookmarks 0