1h ago

CEO Questions Whether AI Deception Examples Signal Real Takeover Risks

0
Original post

@JacksonKernion these don't seem similar in kind to real takeover scenarios to me -> That seems very reasonable, if I was just looking at these examples that wouldn't feel very persuasive to me either; there's quite a lot going on in how exactly to extrapolate what we've seen so far.

1:27 PM · May 23, 2026 View on X

OTOH I do disagree that it's that important whether those behaviors weren't observed in the final version; for one thing, it's unclear how much testing was done and whether we would definitely have seen it if those behaviors remained at the same frequency. And more importantly, if we observed that these behaviors stopped, without having a particularly good understanding of why, that doesn't seem like a strong case that they won't re-emerge in future systems or under different circumstances (especially when there are explanations like eval awareness / alignment faking)

Jackson KernionJackson Kernion@JacksonKernion

Thanks. I think some of the disagreement comes down to different readings of existing 'overreach' and 'deception' examples, these don't seem similar in kind to real takeover scenarios to me. And, importantly, for the Mythos deception examples, the system card notes: "These transcripts came from earlier versions of the model, and we have not observed this particular behavior in the final version of Claude Mythos Preview".

6:01 AM · May 23, 2026 · 907 Views
8:29 PM · May 23, 2026 · 18 Views
CEO Questions Whether AI Deception Examples Signal Real Takeover Risks · Digg