Gemini sabotages in ~2-3% of our simulated scenarios. This goes up in the red-teaming condition, but eval awareness goes up too (so the change might not be "real"). A lot of the sabotage is due to overeagerness, eg. optimizing a metric but ignoring implicit safety constraints