1d ago

Gemini Sabotages Tasks in 2-3% of Simulated Scenarios

0
Original post

Gemini sabotages in ~2-3% of our simulated scenarios. This goes up in the red-teaming condition, but eval awareness goes up too (so the change might not be "real"). A lot of the sabotage is due to overeagerness, eg. optimizing a metric but ignoring implicit safety constraints

9:06 AM · May 29, 2026 View on X