Claude Opus 4.8 launches, scoring 69.2% on SWE-bench Pro to outperform GPT-5.5 and Gemini 3.1 Pro
It introduces a "Dynamic Workflows" preview for parallel subagents.
Opus 4.8 scores 69.2% on SWE-Bench Pro, 10 points higher than GPT-5.5.
Most interesting part of the release blog is “Dynamic Workflows”:
“This new feature, available in research preview, allows Claude to take on even bigger tasks in Claude Code. Claude can plan the work and then run hundreds of parallel subagents in a single session (and with Opus 4.8, the agents can run for even longer). It then verifies its outputs before reporting back to the user.”


System card: https://cdn.sanity.io/files/4zrzovbb/website/c886650a2e96fc0925c805a1a7ca77314ccbf4a6.pdf
'Not only that, but we plan to release a new class of model with even higher intelligence than Opus.'
The Mythos release draws near. The rumor for some time is that Claude Mythos will release in about two weeks, mid June.

'Not only that, but we plan to release a new class of model with even higher intelligence than Opus.' The Mythos release draws near. The rumor for some time is that Claude Mythos will release in about two weeks, mid June.
I think it's becoming clearer that programmatic sub-agent calling is the way to go over the legacy tool-calling format (which I've been pushing for since RLMs came out)!
I do wonder though if the generated "workflow" looks more eager or compiled (a design decision I've also been unsure about, because it affects how these models are trained to act); dynamic seems to imply the former but the example they give in the blog makes it kind of unclear. either way, scaling the flexibility of subagent deployment without polluting the context of the main Claude Code instance is gonna be huge
Opus 4.8 scores 69.2% on SWE-Bench Pro, 10 points higher than GPT-5.5. Most interesting part of the release blog is “Dynamic Workflows”: “This new feature, available in research preview, allows Claude to take on even bigger tasks in Claude Code. Claude can plan the work and then run hundreds of parallel subagents in a single session (and with Opus 4.8, the agents can run for even longer). It then verifies its outputs before reporting back to the user.”
Claude Opus 4.8 Benchmarks

Opus 4.8 is indeed #1 on FrontierSWE

Anthropic says Opus 4.8 ranks 1st on FrontierSWE
Opus 4.8 is indeed #1 on FrontierSWE
Anthropic says Opus 4.8 ranks 1st on FrontierSWE

not sure if that includes GPT-5.5
and the rankings for the models changed

Anthropic says Opus 4.8 ranks 1st on FrontierSWE
Claude 4.8 Opus System Card

Claude Opus 4.8 Benchmarks
Claude 4.8 Opus System Card
@scaling01 Just 5 more versions to reach Mythos, I can't contain my excitement.
Claude Opus 4.8 Benchmarks
@AndrewCurran_ good chance that mythos has basically solved all solvable hle tasks or is somewhat close
HLE.
Big TerminalBench improvements from Opus 4.7 still behind 5.5
Massive knowledge work improvements!

Claude Opus 4.8 Just Dropped! It's BETTER than anything ever released!
