/AI3h ago

Anthropic releases Claude Fable 5, topping SWE-bench but struggling with one-shot design and technical documentation

Ethan Mollick built a data tool in 9.5 hours.

2012367433.1K
Original post
Alex Volkov@altryne#1245inAI

@karpathy Andrej do you still think that humans reading code will be a requirement for production at the end of this year?

http://thursdai.news/zl

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.

I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

11:13 AM · Jun 9, 2026 · 934 Views
Sentiment

Positive users are excited about Claude Fable 5's SOTA benchmarks and qualitative leaps on tasks like SWE-Bench, while negative users criticize its high token use, flawed design outputs, and incorrect biology risk flagging.

Pos
57.1%
Neg
42.9%
7 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS33.8KBOOKMARKS76LIKES128RETWEETS6REPLIES18

Fable 5 (aka baby Mythos) just dropped. Is it as scary (or scary good) as they claim?

My thoughts after some early testing: - smart smart smart (crushed SWE bench) - but do you always need hyper intelligence? - faceplanted on one-shot design in a way that shocked me - i'm not sure about dynamic workflows + complex subagents. they work, but at what cost? - def knocked out technical work well - ootb bad at making technical docs + specs for humans. probably really good docs for agents. but nearly impossible to parse prose. - A++ vision and document formatting. this was my favorite part

NOT a daily driver, wouldn't put this model in a meeting, but def will keep it back in the server rack, churning out code.

Full take on YT: https://www.youtube.com/watch?v=IREnr4I89Ho

Claude@claudeai

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.

Its capabilities exceed those of any model we’ve ever made generally available.

3hViews 33.8KLikes 128Bookmarks 76
snow@snowclipsed

>You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

i am sorry, but it literally is refusing half of these.

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.

I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

1hViews 1.6KLikes 51Bookmarks 2

@clairevo This is really helpful. I have til the 22nd to use this and it eats up my usage. I have to prioritize and will use it to solve a couple lingering buggies.

2hViews 36Likes 1Bookmarks 1
Conor Bronsdon@ConorBronsdon

@clairevo Re subagents: I recommend having Fable orchestrate a bunch of Sonnet/Opus sub agents to cut token costs.

Fable is excellent at orchestrating sub agents and this significantly reduces your overall costs while still getting most of the benefits of the ultra-smart lead model.

2hViews 154Likes 2
Greyson MacAlpine@GreysonSofia

@clairevo Queued the full YT take to watch this afternoon! 🚀

2hViews 185Likes 1

@ConorBronsdon Yeah but the subagents just aren’t adding a lot to the quality of output. What are your favorite use cases

2hViews 112Likes 1
Sojunky@sojunky

@clairevo Fable apparently thinks oncology genes like KRAS and EGFR pose a "biology" risk --> punted me back to 4.8!

2hViews 104Likes 1
sarang@EsPeeKid

@clairevo been patiently waiting for the day of review!

2hViews 100Likes 1
Bachi@julibachi

@clairevo baby mythos lol

2hViews 55Likes 1

@clairevo was the deck in the video made by Fable?

2hViews 151
Alex McD@alexqmcd

@clairevo surprising to see those horrid design results. we've been getting pretty solid artifacts, eg: https://hyperagent.com/s/SwNlqxNVOhxSfr9qle_-dQ

hard agree about the density of its writing, still reaching for Sonnet there

2hViews 146
Salma@Salmaaboukarr

@clairevo this is going to be good!

2hViews 1.3KLikes 1
Olamide@PM_Mide

@clairevo Incredibly smart. But not as scary as we thought it'd be

2hViews 117Likes 2

@sdrth Hah I temp didn’t have access so it was good ol opus

2hViews 132Likes 1

@alexqmcd It was prompted w a pretty technical spec and I just wonder if it over rotated

2hViews 120Likes 1
C.NASIR@CNASIR2_0

@clairevo I thought I would be scared. I’m just excited.

2hViews 87Likes 1

@GreysonSofia Lmk what you think of what I think

2hViews 119
Load more posts