/Tech12h ago

Maithra Raghu says Claude Fable 5 migrated a 50-million-line Ruby codebase in one day during Cognition's evaluation

A human team would need over two months.

171403715.7K
Original post
François Fleuret@francoisfleuret#352inTech

No slowing down in sight, this is so weird.

Claude@claudeai

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.

The longer and more complex the task, the larger Fable 5’s lead over our other models.

12:26 AM · Jun 10, 2026 · 15.4K Views
Sentiment

Some users express excitement over Fable 5 topping benchmarks as progress feels unreal, while others doubt the results due to unverifiable figures, silent steering, and possible fabrication.

Pos
37.5%
Neg
62.5%
8 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS630BOOKMARKS1
alyxya@_alyxya

@francoisfleuret I think of this as the point of inflection where it'll slow down because we'll stop being able to differentiate or measure most improvements to models

bigger models should in theory have more potential capability but the difficulty of the training data is bounded by us

12hViews 630Likes 3Bookmarks 1
LIKES10

@francoisfleuret It's now impossible to verify these figures because the model is silently steered

11hViews 458Likes 10
REPLIES1
Florian S@airesearch12

@francoisfleuret the wall

9hViews 189Likes 1

@francoisfleuret No one can prove that biology/cybersecurity bench wasn't fabricated since you can't use it for biology/cybersecurity. It's bullshit.

11hViews 520Likes 7
Maithra Raghu@maithra_raghu

Controversies on model nerfing aside, Fable seeing its biggest performance gains on more complex tasks is a clear indicator of large AI capabilities jumps ahead.

(We’re far from saturating the frontier and far from defining / measuring the next wave of frontier tasks)

5hViews 327Likes 2Bookmarks 0
Min@min_aws

@francoisfleuret Look at the price per token then try to normalize it for benchmark scores, and then you realize this has slowed down significantly since the end of last year.

9hViews 195Likes 1
Mayz@lunan_ai

@francoisfleuret the "weird" part is actually the respectful one in this whole timeline

11hViews 321

@francoisfleuret There’s a wall, but transformers did not hit it, wallets did

11hViews 274
Draven@notdrvx

@francoisfleuret not every week opens like this

must feel like riding lightning and sand at the same time

9hViews 206
Asher@ashergmi

@francoisfleuret acting like youre not impressed but you definitely reloaded the benchmarks after reading this

10hViews 131
Cheehk@Cheehk2

@francoisfleuret agents behind the curtain.

10hViews 109
anon@_thetanon_

@airesearch12 @francoisfleuret this wall is crazy

9hViews 12
Invincible@InvincibleEdge

@francoisfleuret weird is a generous word for it

most people dont even use half the features they add per update

12h
Blissy@BlissyOnX

@francoisfleuret first time in a while progress actually feels unreal

12h
Rugbist@rugbist_

@francoisfleuret had to check if i was reading about a language model or the gym guy from mass effect?

12h