9h ago

AI Models Score 59% on Attack-Ideation Eval Versus 90% Human Baseline

22400261

——0——

Original post

In case this was because models weren’t calibrated on what the bar should be for a good attack, I also tried a “batch” mode where the model is given all of the ideas in context, so it can compare/contrast them before answering. Surprisingly, this didn’t help!

11:13 AM · May 19, 2026

#833david rein@IDAVIDREIN

I also evaluated how well the best internal model could implement the monitor jailbreak I used. Even when giving it 5-10 specific hints/instructions on how to implement it, it made very basic mistakes that break the attack.

david rein@idavidrein

6:13 PM · May 19, 2026 · 129 Views

6:14 PM · May 19, 2026 · 132 Views

AI Models Score 59% on Attack-Ideation Eval Versus 90% Human Baseline

Sentiment

Cluster engagement