/AI2h ago

Anthropic releases Claude Fable 5, claiming a 91/100 score on an internal senior engineer coding benchmark

OpenRouter pricing starts at $10 per million input tokens.

1712.4K1761.5K354.4K

#194

Original post

Andy Masley@AndyMasley#1693inAI

I actually also used an older version of claude to build a secret library of babel room in my website 4 months ago, didn't look anything like this though

Dan Shipper 📧@danshipper

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

- It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot.

- Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us.

- Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that.

- It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you.

- It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it.

Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

12:00 PM · Jun 9, 2026 · 42 Views

/AI2h ago

Anthropic releases Claude Fable 5, claiming a 91/100 score on an internal senior engineer coding benchmark

OpenRouter pricing starts at $10 per million input tokens.

1712.4K1761.5K354.4K

#194

Original post

Andy Masley@AndyMasley#1693inAI

I actually also used an older version of claude to build a secret library of babel room in my website 4 months ago, didn't look anything like this though

Dan Shipper 📧@danshipper

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

12:00 PM · Jun 9, 2026 · 42 Views

Sentiment

Many users expressed excitement about Anthropic's Claude Fable 5 coding model after quick tests showed strong performance, while some criticized its high cost and occasional inaccuracies.

Pos

82.1%

Neg

17.9%

28 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS13.7KBOOKMARKS24

Dan Shipper 📧@danshipper

@every watch on YouTube: https://www.youtube.com/watch?v=GrdEid8H6H4

4h13.7K2424

LIKES37

Alex Albert@alexalbert__

@danshipper @every Appreciate you testing it🙏

4h7.1K37

RETWEETS174

Dan Shipper 📧@danshipper

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

4h361.8K2.5K1.6K

REPLIES2

Mads@madsmccaus

@danshipper @every But how well will it draw an svg of a pelican riding a bicycle

3h4.1K18

Evan Armstrong@itsurboyevan

@danshipper @every

4h3.1K331

Andy Masley@AndyMasley

I actually also used an older version of claude to build a secret library of babel room in my website 4 months ago, doesn't look anything like this though

Dan Shipper 📧@danshipper

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

2h2.3K152

Dan McAteer@daniel_mac8

@danshipper @every Makes me think that Claude Fable is the first "industrial grade" model.

4h87061

Bobcat@somebobcat8327

@danshipper @every How did you get it to make the two minute animated film? I guess you had enhanced permissions? I'm currently getting Opus 4.8 to finish the film it was trying to make...

3h41221

Dan Shipper 📧@danshipper

@madsmccaus @every @simonw inquiring minds want to know

3h3.7K12

Dan Shipper 📧@danshipper

@alexalbert__ @every thanks for having us!!!

4h4.3K7

The Crypto Wiz@TheKryptoWiz

@danshipper @every For builders, the benchmark that matters is boring: can it take a real repo, understand the constraints, and ship the diff without ten rounds of hand-holding.

4h1.1K31

Jonathan 🇺🇲@thaonlyjonathan

@danshipper @every "It's best for power users" no, it's best for users who have a lot of money. It's incredibly expensive.

3h45511

Jonny Miller@jonnym1ller

@danshipper @every lets goooooo 🚀

4h85611

Jonathan ⚡@jonathanbylos

@danshipper @every Looks cool, but I would say there are a lot of great open source Library of Babel projects, both the backend logic, and assets.

Would be interesting to see something like an "organized" Library of Babel by grouping similar, coherent books of content that has not been published.

3h78911

Matt@matthew_hartman

@danshipper @every Dang, that's big news. Can't wait to build with it.

4h6841

mathias coudert@Mcoudert

@danshipper @every "If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you."

So for knowledge worker you recommend to stay on Codex?

3h3841

Ondřej Tesárek@bratricek

@danshipper @every After watching AI do completely useless shit for 3 years straight, this one example beats them all. Thank you for your service.

2h8197

Nyel Bangash@nyelbangash

@madsmccaus @danshipper @every just tried

3h624

Christian Darnton@CMDarnton0

@danshipper @every @nikoliasgoninus

3h3192

Tsung Xu@tsungxu

@danshipper @every Damn now I need to port over all of my codex automations to test this

2h5311