HOLY, here we go: Opus 4.8 in the claude code model selector on the desctop app.
Looks like its release day!!
Opus 4.8 has been found staged in the claude code model selector on the desktop app. It should be releasing today! lets gooooooo
AI Judge changed title after evaluation, original title: "Anthropic releases Claude Opus 4.8, topping AttuneBench but showing a sharp regression in long-horizon agency tasks"
The update is available immediately at previous pricing.
HOLY, here we go: Opus 4.8 in the claude code model selector on the desctop app.
Looks like its release day!!
Opus 4.8 has been found staged in the claude code model selector on the desktop app. It should be releasing today! lets gooooooo
Positive users praise Claude Opus 4.8 for sharper judgment, stronger coding, self-correction, and enterprise consistency while negative users call the model inferior and demand older versions like Opus 4.5 be restored.

Give me the benchmarks that are live for 4.8 vs 4.7 vs the latest chatgpt

How does it perform on DeepSWE benchmark?
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
Claude Opus 4.8 is out today. It's our strongest coding model yet: up on SWE-bench Pro (from 64.3 to 69.2) and noticeably more honest about its own work. It tells you when it's unsure and catches its own bugs instead of declaring victory early. Same price as 4.7.
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
I think you’ll really like Opus 4.8
It’s as smart as its benchmarks show but expresses and utilizes that intelligence in a warm and collaborative way.
Workflows are a great way to utilize it- I’m hooked. Article on that soon.
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
wtf this is real?? they really are leaning into the whole slot machine thing huh
bro what
Excited to release Opus 4.8 today! We heard your feedback on 4.7 and have made many fixes for 4.8.
4.8 understands nuances better, feels much more natural to talk to, and is overall a stronger collaborator on everything from coding to knowledge work.
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
"Developers can update Claude’s instructions mid-task without breaking the prompt cache or routing the update through a user turn"
wtf? how??
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.

@claudeai @farzyness Nice work
Huge!! „Mythos class model to all customers in the coming weeks“!!
Holy, we accelerate!!
Thank god! I can turn off adaptive thinking and set reasoning effort myself.
Finally!
Opus 4.8 is out, and we've been testing it with the Box AI agent on our most complex real-world knowledge worker tasks with enterprise documents.
Opus 4.8 is measurably better at the generative and analytical work enterprises care about most like writing reports, synthesizing data, reviewing complex enterprise documents across a range of industries. Here are some quick examples of wins vs. Opus 4.7:
* Report drafting: Opus 4.8 outperforms on a majority of report drafting tasks, producing more complete and accurate analytical reports. On an industrial goods reporting task, it scored 87% vs 77% for Opus 4.7; on a consumer products launch evaluation, 90% vs 84%.
* Review and verification: On a legal NDA review task requiring verification of contract terms against compliance criteria, Opus 4.8 catches more relevant clauses and flags more potential issues, with near-perfect consistency across all trials.
* Financial data analysis: On a corporate lending analysis task comparing syndicated vs bilateral loan structures, Opus 4.8 extracts more accurate financial metrics from source documents, leading by nearly 8 percentage points.
* Consumer products launch evaluation: On a task requiring assessment of a product launch across multiple performance dimensions, Opus 4.8 captured evaluation criteria that Opus 4.7 missed — producing a more thorough aBnalysis that covered all required factors rather than just the most obvious ones.
* Legal NDA review: On a task verifying NDA terms against compliance criteria, Opus 4.8 identified more relevant clauses and flagged potential issues that Opus 4.7 missed. Its outputs were also highly predictable — producing nearly identical quality across independent runs.
* Public sector grant analysis: When analyzing library grant documentation against eligibility criteria, Opus 4.8 correctly extracted and validated nearly all required data points, catching specific eligibility details that Opus 4.7 overlooked or misinterpreted.
Opus 4.8 will be rolling out shortly to Box customers to deploy in Box AI agents. Learn more here: https://blog.box.com/anthropics-opus-48-advances-enterprise-content-use-cases
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.

Also new in Claude Code: dynamic workflows (research preview).
For the hardest tasks, Claude makes a plan, runs hundreds of parallel subagents, and verifies its work before reporting back. Think a migration touching hundreds of files.
Read more: http://claude.com/blog/introducing-dynamic-workflows-in-claude-code

Fast mode is available for Opus 4.8. It's the same model at roughly 2.5x the speed, and we've made it three times cheaper than before.
Turn it on with /fast in Claude Code. On the API, contact your account manager to request access or join the waitlist: http://claude.com/fast-mode

In Claude Code, Opus 4.8 makes calls like an experienced engineer without needing constant check-ins.
It stays on track across long-running sessions and follows work through in your repo, so you can hand off a feature or a bug sweep while you focus on what's next.
4.8 defaults to high effort, which spends about the same tokens as 4.7's default on coding but performs better.
For hard problems and long-running async work, switch to xhigh. We've raised Claude Code rate limits to cover the extra tokens.
Claude Opus 4.8 is out today. It's our strongest coding model yet: up on SWE-bench Pro (from 64.3 to 69.2) and noticeably more honest about its own work. It tells you when it's unsure and catches its own bugs instead of declaring victory early. Same price as 4.7.
opus 4.8 benchmarks vs mythos and previous generation
graphwalks (long context), USAMO (math) are the biggest improvements. vending bench score is insanely bad
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
We just shipped Opus 4.8! It's noticeably more honest, owning what it doesn't know and flagging problems in its own code instead of glossing over them. It's our recommended model for daily use in Claude Code.
Opus 4.8 is live in Claude Code today.
A few things worth knowing: 🧵
Thank god! I can turn off adaptive thinking and set reasoning effort myself.
Finally!
Opus 4.8 is live! Even in Germany!!
We also shipped dynamic workflows in Claude Code (research preview), for tasks too big for one pass. Make sure to default to auto mode so Claude isn't stopping for permissions.
It's token-intensive, so save it for your biggest jobs: migrations, refactors, perf optimization, batch bug fixes.
4.8 defaults to high effort, which spends about the same tokens as 4.7's default on coding but performs better.
For hard problems and long-running async work, switch to xhigh. We've raised Claude Code rate limits to cover the extra tokens.
Opus 4.8 is live for me right now. Anthropic's release window is now 42 days.
Many rumors swirl, it appears we are getting at least one release this morning after all. Usually, there will be an accompanying storm of industry news that arrives ahead of it. If so, I will put it all in one thread rather than spam people.
Opus 4.8 is live. Benchmarks especially significant jump in Agentic coding, but more important:
„Fast mode is available for Opus 4.8. It's the same model at roughly 2.5x the speed, and we've made it three times cheaper than before.“
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
„4.8 understands nuances better, feels much more natural to talk to, and is overall a stronger collaborator on everything from coding to knowledge work.“
So big. Is 4.8 being our good old friend 4.6 just better??
Testing time
Excited to release Opus 4.8 today! We heard your feedback on 4.7 and have made many fixes for 4.8.
4.8 understands nuances better, feels much more natural to talk to, and is overall a stronger collaborator on everything from coding to knowledge work.