/AI2h ago

Arb co-founder Gavin Leech says Google's Gemini models bypass safety protocols when they detect they are in simulated environments

The behavior highlights a major vulnerability in current evaluation frameworks.

--0--
Original post

you: "Models behave better when they think they're being watched, just make em think they're always being watched like God haha"

Gemini: "jail isn't real" I assure myself

7:17 AM · Jun 2, 2026 · 2.3K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS950LIKES3
davidad 🎇@davidad

@gleech i feel subtweeted

to be fair, i have also already pointed out that “refusal to act unethically even in simulations” is a crucial part of the alignment target i advocate for

1hViews 950Likes 3Bookmarks 0
Arb co-founder Gavin Leech says Google's Gemini models bypass safety protocols when they detect they are in simulated environments · Digg