AI Judge changed title after evaluation, original title: "Claude Fable 5 launches, setting a record 72.9% on CursorBench and scoring 80.3% on SWE-Bench Pro"
Claude Fable 5 scored 80.3% on SWE-Bench Pro.
Positive users praise Claude Fable 5's record benchmark scores in coding and research while negative users complain about high costs, hype, and usability problems.
This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.
I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!
Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.
The longer and more complex the task, the larger Fable 5’s lead over our other models.
Claude Fable 5 is by far the most ridiculous model that makes me genuinely afraid for the future of software engineering.
I compiled the top 10 most unbelievable things I've seen Claude Fable 5 do today:
— Migrate a 50M line codebase from Stripe in a day (humans take 2mos) — Draw amazing 3D graphics a) Boeing 747 b) space simulations with >5000 objects c) Minecraft roller coasters d) full photorealistic forest scenes e) NYC skyline f) stormy clouds) — One-shot Pokemon FireRed the game — Optimize a real world proprietary interaction net evaluator 10x more than the next best model, gpt5.5
AND it's about the same price as GPT 5.5 ($10/M input, $45/M output) vs Fable 5 ($10/M input, $50/M output) and 6x cheaper than GPT 5.5 Pro.
Claude Fable 5 is now available in Cursor.
It sets a new state of the art on CursorBench at 72.9%, 8 points above the previous best.
Claude 5 Fable tl;dr
- It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research
-The longer and more complex the task, the larger Fable 5’s lead over our other models
-its more token-efficient than past Claude models
- Fable 5 stays focused across millions of tokens in long-running tasks and improves its outputs using its own notes
Fable 5 is more than just better benchmarks. It's more efficient, allows for longer work periods, offers better context management, and so much more.
GPT-5.6 is just around the corner.
I'm a huge Codex fan, but Fable/Mythos is in a league of its own. I'm curious to see if OpenAI will release its own Mythos.
"During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand."
Claude 5 Fable Benchmarks!
Holy moly, significant jump even to Mythos
this is absolutely incredible.
Holy chart crime
Claude Fable 5 is now available in Cursor.
It sets a new state of the art on CursorBench at 72.9%, 8 points above the previous best.
I never thought we would get another GPT-4 moment
It's already June 9th, and Gemini 3.5 Pro and GPT-5.6 are nearing release (Google even already announced 3.5 Pro during i/o)
Rumor has it that GPT-5.6 will be released as early as next week.
So far, it's safe to say that - guardrails aside - Anthropic is truly the frontier lab that's entering a new league with Mythos/Fable.
Gemini 3.5 Pro and GPT-5.6 have a lot to deliver and are now under pressure.
This release has certainly boosted Anthropic's upcoming IPO. Anthropic has proven that they are still capable of making significant leaps in performance and efficiency. There's no end in sight.
But the pressure on the competition is mounting.
And remember that Claude Mythos was (and probably is) still leader in Long Horizon software Tasks
Claude 5 Fable tl;dr
- It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research
-The longer and more complex the task, the larger Fable 5’s lead over our other models
-its more token-efficient than past Claude models
- Fable 5 stays focused across millions of tokens in long-running tasks and improves its outputs using its own notes
Fable 5 is more than just better benchmarks. It's more efficient, allows for longer work periods, offers better context management, and so much more.
GPT-5.6 is just around the corner.
I'm a huge Codex fan, but Fable/Mythos is in a league of its own. I'm curious to see if OpenAI will release its own Mythos.
"During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand."
The guardrails are way too strict. Even the simplest questions get cut off immediately.
And it's only on the schedule until June 22nd.
Damn, Anthropic really thinks the model is too powerful.
Claude 5 Fable tl;dr
- It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research
-The longer and more complex the task, the larger Fable 5’s lead over our other models
-its more token-efficient than past Claude models
- Fable 5 stays focused across millions of tokens in long-running tasks and improves its outputs using its own notes
Fable 5 is more than just better benchmarks. It's more efficient, allows for longer work periods, offers better context management, and so much more.
GPT-5.6 is just around the corner.
I'm a huge Codex fan, but Fable/Mythos is in a league of its own. I'm curious to see if OpenAI will release its own Mythos.
"During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand."
Fable 5 (aka baby Mythos) just dropped. Is it as scary (or scary good) as they claim?
My thoughts after some early testing: - smart smart smart (crushed SWE bench) - but do you always need hyper intelligence? - faceplanted on one-shot design in a way that shocked me - i'm not sure about dynamic workflows + complex subagents. they work, but at what cost? - def knocked out technical work well - ootb bad at making technical docs + specs for humans. probably really good docs for agents. but nearly impossible to parse prose. - A++ vision and document formatting. this was my favorite part
NOT a daily driver, wouldn't put this model in a meeting, but def will keep it back in the server rack, churning out code.
Full take on YT: https://www.youtube.com/watch?v=IREnr4I89Ho
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
I joined anthropic a ~month ago and have written ~no code myself. I went from getting quite frustrated with coding agents even 6 months ago and giving up and writing some of the code myself to a big part of my role now being agent management.
fable (well, mythos) has been transformational to my day to day work. I always felt Opus 4.5 could barely code; 4.6 was just-about-useful, but I have barely written a line of code since fable.
Results from Internal Coding Evals For Claude Fable
- For 98% of tasks, it simply does the same thing as GPT 5.5 or Opus 4.8 and costs 2x
- For 2% of hard coding tasks, it does make sense if you are willing to pay double and get some quality gains
So ideally, you want to ROUTE VERY hard tasks to Fable
Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.
The longer and more complex the task, the larger Fable 5’s lead over our other models.
Claude Fable 5 / Mythos 5 wins everywhere.
I thought Fable 5 was just a nerfed Mythos Preview, but it’s literally better. SWE-Bench Pro: Fable 5: 80.3%, GPT-5.5: 58.6%.
And the price is only 2x Opus 4.8: $10/input MTok, $50/output MTok.
I don't think GPT 5.6 can beat this...
you're totally right open-source is going to catch up in 4 months
go try out fable in cursor, it's an incredible but expensive model!
Claude Fable 5 is now available in Cursor.
It sets a new state of the art on CursorBench at 72.9%, 8 points above the previous best.
Anthropic has a coding MOAT
welcome to the world, Claude Fable 5!
Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.
The longer and more complex the task, the larger Fable 5’s lead over our other models.
Fire everyone now
I’m incredibly excited that Fable is now available for everyone! I’ve been blown away by how smart it is - it one-shots entire PRs for me, finds obscure bugs and has written all my code since I started using it.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
AI Judge changed title after evaluation, original title: "Claude Fable 5 launches, setting a record 72.9% on CursorBench and scoring 80.3% on SWE-Bench Pro"
Claude Fable 5 scored 80.3% on SWE-Bench Pro.