/AI2h ago

Anthropic releases Claude Fable 5, claiming a 91/100 score on an internal senior engineer coding benchmark

OpenRouter pricing starts at $10 per million input tokens.

1712.4K1761.5K354.4K
Original post
Andy Masley@AndyMasley#1693inAI

I actually also used an older version of claude to build a secret library of babel room in my website 4 months ago, didn't look anything like this though

Dan Shipper 📧@danshipper

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

- It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot.

- Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us.

- Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that.

- It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you.

- It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it.

Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

12:00 PM · Jun 9, 2026 · 42 Views
Sentiment

Many users expressed excitement about Anthropic's Claude Fable 5 coding model after quick tests showed strong performance, while some criticized its high cost and occasional inaccuracies.

Pos
82.1%
Neg
17.9%
28 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS13.7KBOOKMARKS24
Dan Shipper 📧@danshipper

@every watch on YouTube: https://www.youtube.com/watch?v=GrdEid8H6H4

4hViews 13.7KLikes 24Bookmarks 24
LIKES37
Alex Albert@alexalbert__

@danshipper @every Appreciate you testing it🙏

4hViews 7.1KLikes 37
RETWEETS174
Dan Shipper 📧@danshipper

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

- It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot.

- Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us.

- Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that.

- It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you.

- It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it.

Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

4hViews 361.8KLikes 2.5KBookmarks 1.6K
REPLIES2
Mads@madsmccaus

@danshipper @every But how well will it draw an svg of a pelican riding a bicycle

3hViews 4.1KLikes 18
Evan Armstrong@itsurboyevan

@danshipper @every

4hViews 3.1KLikes 33Bookmarks 1
Andy Masley@AndyMasley

I actually also used an older version of claude to build a secret library of babel room in my website 4 months ago, doesn't look anything like this though

Dan Shipper 📧@danshipper

BREAKING:

Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.

We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:

- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.

- It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot.

- Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us.

- Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that.

- It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you.

- It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it.

Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable.

The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.

Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://every.to/vibe-check/anthropic-mythos-our-fable-vibe-check

2hViews 2.3KLikes 15Bookmarks 2
Dan McAteer@daniel_mac8

@danshipper @every Makes me think that Claude Fable is the first "industrial grade" model.

4hViews 870Likes 6Bookmarks 1
Bobcat@somebobcat8327

@danshipper @every How did you get it to make the two minute animated film? I guess you had enhanced permissions? I'm currently getting Opus 4.8 to finish the film it was trying to make...

3hViews 412Likes 2Bookmarks 1
Dan Shipper 📧@danshipper

@madsmccaus @every @simonw inquiring minds want to know

3hViews 3.7KLikes 12
Dan Shipper 📧@danshipper

@alexalbert__ @every thanks for having us!!!

4hViews 4.3KLikes 7
The Crypto Wiz@TheKryptoWiz

@danshipper @every For builders, the benchmark that matters is boring: can it take a real repo, understand the constraints, and ship the diff without ten rounds of hand-holding.

4hViews 1.1KLikes 3Bookmarks 1
Jonathan 🇺🇲@thaonlyjonathan

@danshipper @every "It's best for power users" no, it's best for users who have a lot of money. It's incredibly expensive.

3hViews 455Likes 11
Jonny Miller@jonnym1ller

@danshipper @every lets goooooo 🚀

4hViews 856Likes 1Bookmarks 1
Jonathan ⚡@jonathanbylos

@danshipper @every Looks cool, but I would say there are a lot of great open source Library of Babel projects, both the backend logic, and assets.

Would be interesting to see something like an "organized" Library of Babel by grouping similar, coherent books of content that has not been published.

3hViews 789Likes 1Bookmarks 1
Matt@matthew_hartman

@danshipper @every Dang, that's big news. Can't wait to build with it.

4hViews 684Bookmarks 1

@danshipper @every "If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you."

So for knowledge worker you recommend to stay on Codex?

3hViews 384Bookmarks 1

@danshipper @every After watching AI do completely useless shit for 3 years straight, this one example beats them all. Thank you for your service.

2hViews 819Likes 7
Nyel Bangash@nyelbangash

@madsmccaus @danshipper @every just tried

3hViews 62Likes 4
Tsung Xu@tsungxu

@danshipper @every Damn now I need to port over all of my codex automations to test this

2hViews 531Likes 1
Load more posts