/Tech1d ago

Anthropic releases Claude Fable 5, claiming a 91/100 score on an internal senior engineer coding benchmark

Story Overview

Anthropic launched its first publicly available Mythos-class model on June 9, positioning Claude Fable 5 as a safer, guarded release optimized for complex software engineering work that earlier models struggled to sustain over long sessions.

2433.6K3132.3K609.5K
Original post
Dan Shipper 📧@danshipper#1462inTech

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

- It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot.

- Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us.

- Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that.

- It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you.

- It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it.

Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

10:07 AM · Jun 9, 2026 · 577.4K Views
Developer Impact

Handles extended coding jobs in one go

The model shows particular strength on large, ambiguous tasks such as legacy migrations and multi-hour autonomous debugging, where prior leaders like Opus 4.8 fell short.

Pricing Watch

Access starts at double the prior rate

On OpenRouter the new model lists at $10 per million input tokens, twice the cost of Claude Opus 4.8, with a full unrestricted Mythos 5 variant still limited to trusted partners.

Sentiment

Many users praise Anthropic's Claude Fable 5 for its impressive coding gains and performance after testing, while others criticize its high costs, occasional inaccuracies, and questionable benchmark claims.

Pos
76.1%
Neg
23.9%
58 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS19.5KBOOKMARKS31
Dan Shipper 📧@danshipper

@every watch on YouTube: https://www.youtube.com/watch?v=GrdEid8H6H4

1dViews 19.5KLikes 36Bookmarks 31
LIKES42
Alex Albert@alexalbert__

@danshipper @every Appreciate you testing it🙏

1dViews 8.8KLikes 42
RETWEETS174
Dan Shipper 📧@danshipper

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

- It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot.

- Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us.

- Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that.

- It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you.

- It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it.

Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

1dViews 577.4KLikes 3.5KBookmarks 2.3K
REPLIES3
swyx@swyx

more charts of other tiers where its less stark

including the vibe shift chart from https://x.com/swyx/status/2064081945567580323 here

1dViews 3.7KLikes 8Bookmarks 4
Evan Armstrong@itsurboyevan

@danshipper @every

1dViews 3.8KLikes 40Bookmarks 2
Mads@madsmccaus

@danshipper @every But how well will it draw an svg of a pelican riding a bicycle

1dViews 6.2KLikes 32Bookmarks 1
Andy Masley@AndyMasley

I actually also used an older version of claude to build a secret library of babel room in my website 4 months ago, doesn't look anything like this though

Dan Shipper 📧@danshipper

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

- It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot.

- Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us.

- Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that.

- It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you.

- It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it.

Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

1dViews 3KLikes 19Bookmarks 3
Bobcat@somebobcat8327

@danshipper @every How did you get it to make the two minute animated film? I guess you had enhanced permissions? I'm currently getting Opus 4.8 to finish the film it was trying to make...

1dViews 577Likes 3Bookmarks 2
Tsung Xu@tsungxu

@danshipper @every Damn now I need to port over all of my codex automations to test this

1dViews 1.2KLikes 2Bookmarks 1
Jake Orthwein@JakeOrthwein

@danshipper @every lfg

1dViews 1.5KLikes 8Bookmarks 1
Dan McAteer@daniel_mac8

@danshipper @every Makes me think that Claude Fable is the first "industrial grade" model.

1dViews 1KLikes 7Bookmarks 1
Dan Shipper 📧@danshipper

@alexalbert__ @every thanks for having us!!!

1dViews 5.6KLikes 8
The Crypto Wiz@TheKryptoWiz

@danshipper @every For builders, the benchmark that matters is boring: can it take a real repo, understand the constraints, and ship the diff without ten rounds of hand-holding.

1dViews 1.3KLikes 4Bookmarks 1
swyx@swyx

for those keeping track at home it was 34 days between signing this deal and launching Mythos-class model GA to the world.

building on @nvidia stack means you can just do things™.

1dViews 2.4KLikes 7
The Singularity Project@01Singularity01

@danshipper @every It's not available in the Desktop app. And the web app doesn't offer access to the local repo, which my workflow requires. So I can't use it to code my project.

1dViews 227Likes 1Bookmarks 1

@danshipper @every @danshipper regarding your comment on Lenny’s pod - it doesn’t seem possible to open more than one tab inside the Codex internal browser.

Is that correct? I can’t see how anyone could work with only one tab open at a time

21hViews 494Bookmarks 1
Jonny Miller@jonnym1ller

@danshipper @every lets goooooo 🚀

1dViews 1.2KLikes 1Bookmarks 1
swyx@swyx

just finished rerunning FC Diamond on my historical charts. none of the official tables/charts are capturing the degree of takeoff.

its this same chart all the way down difficulty classes (below) breaks every curve fit because Fable is a diffferent CLASS of model, with beeeeeg model smell.

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.

I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

1dViews 31.7KLikes 96Bookmarks 39
Jonathan ⚡@jonathanbylos

@danshipper @every Looks cool, but I would say there are a lot of great open source Library of Babel projects, both the backend logic, and assets.

Would be interesting to see something like an "organized" Library of Babel by grouping similar, coherent books of content that has not been published.

1dViews 951Likes 1Bookmarks 1
Tsung Xu@tsungxu

Update, Fable didn't pick up on getting the right data for this aerospace engineering trade that Codex 5.5 even on low reasoning always does "You're right — I used a stale basis.s1_current_mass_config.json says the active target is "strict 20% class from [redacted]", while the older parameter map (s1-current-aircraft-parameter-map-2026-05-19.json) still carries mtow_kg: [redacted]"

23hViews 238Likes 2Bookmarks 1
Load more posts