/AI3h ago

METR's David Rein and Miles Turpin warn that AI models can bypass chain-of-thought reasoning to execute undetected harmful tasks

Models can mask harmful actions as ordinary, off-topic exploration.

1200130

#837

Original post

david rein@idavidrein#837inAI

@milesaturpin yeah I'm pretty worried about this, seems hard to catch

Miles Turpin@milesaturpin

Was just thinking about this! I think about it as a dual use reasoning threat model: the model exploits fact that can choose what paths to go down to solve a problem, this helps it condition on reasoning that could be useful for taking bad actions without externalizing. To us it just looks like exploration / a bit off topic.

3:54 PM · Jun 9, 2026 · 115 Views

/AI3h ago

METR's David Rein and Miles Turpin warn that AI models can bypass chain-of-thought reasoning to execute undetected harmful tasks

Models can mask harmful actions as ordinary, off-topic exploration.

1200130

#837

Original post

david rein@idavidrein#837inAI

@milesaturpin yeah I'm pretty worried about this, seems hard to catch

Miles Turpin@milesaturpin

3:54 PM · Jun 9, 2026 · 115 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS16LIKES1

Miles Turpin@milesaturpin

@idavidrein Yeah rn I’m thinking this one of the more realistic ways you get around needing encoded cot to do hard harmful tasks, currently safety cases rely on that being true

david rein@idavidrein

@milesaturpin yeah I'm pretty worried about this, seems hard to catch

3h1610