2d ago

Claude Mythos Preview leads ExploitBench AI exploitation leaderboard

2072582348136.3K

——0——

ExploitBench evaluates AI agents on exploiting vulnerabilities in the V8 JavaScript engine through staged tasks that progress to arbitrary code execution. Models receive scores across 16 capabilities under three evaluation conditions. Claude Mythos Preview records 69% mean capability while GPT 5.5 Codex variants range from 41% to 29% and Claude Opus 4.7 reaches 27%. Brendan Dolan-Gavitt posted the benchmark on X alongside a companion blog containing security researcher observations of model behavior.

Original post

Brendan Dolan-Gavitt#812@MOYIX

This looks like an extremely interesting benchmark...

6:57 PM · May 14, 2026

Cluster engagement

66 snapshots

Reposted by

#812@MOYIX

#146@ETHANJPEREZ

QUOTE POST

#133Boaz Barak@BOAZBARAKTCS

End stage capitalism.

Brendan Dolan-Gavitt@moyix

This looks like an extremely interesting benchmark...

1:57 AM · May 15, 2026 · 23.4K Views

2:26 AM · May 15, 2026 · 11.6K Views

QUOTE POST

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

OK I guess Mythos really is meaningfully stronger than 5.5

s1r1us (mohan)@S1r1u5_

seems twitter missed the ExploitBench paper? few observations: we finally got good data on Mythos security capabilities and it's very impressive. Mythos got full exploit chain on 18/41 v8 n-days, while gpt 5.5 only got 1 and open source models are mostly useless.

3:49 PM · May 15, 2026 · 134.1K Views

4:52 PM · May 15, 2026 · 69.9K Views

ORIGINAL POST

#812Brendan Dolan-Gavitt@MOYIX

This looks like an extremely interesting benchmark...

1:57 AM · May 15, 2026 · 23.4K Views

#812Brendan Dolan-Gavitt@MOYIX

exploitbench.ai

ExploitBench

How far up the exploitation ladder can an agent climb on a production JS engine? ExploitBench measures frontier LLMs on full-control V8 exploit synthesis with 16 capabilities measured per run and multi-round shuffled-layout grading.

Brendan Dolan-Gavitt@moyix

This looks like an extremely interesting benchmark...

1:57 AM · May 15, 2026 · 23.4K Views

1:58 AM · May 15, 2026 · 7.3K Views

QUOTE POST

#1488Samuel Hammond 🦉@HAMANDCHEESE

earning those 💐

s1r1us (mohan)@S1r1u5_

3:49 PM · May 15, 2026 · 134.1K Views

10:12 PM · May 15, 2026 · 2.2K Views