/Tech12h ago

Maithra Raghu says Claude Fable 5 migrated a 50-million-line Ruby codebase in one day during Cognition's evaluation

A human team would need over two months.

171403715.7K

#195

Original post

François Fleuret@francoisfleuret#352inTech

No slowing down in sight, this is so weird.

Claude@claudeai

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.

The longer and more complex the task, the larger Fable 5’s lead over our other models.

12:26 AM · Jun 10, 2026 · 15.4K Views

/Tech12h ago

Maithra Raghu says Claude Fable 5 migrated a 50-million-line Ruby codebase in one day during Cognition's evaluation

A human team would need over two months.

171403715.7K

#195

Original post

François Fleuret@francoisfleuret#352inTech

No slowing down in sight, this is so weird.

Claude@claudeai

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.

The longer and more complex the task, the larger Fable 5’s lead over our other models.

12:26 AM · Jun 10, 2026 · 15.4K Views

Sentiment

Some users express excitement over Fable 5 topping benchmarks as progress feels unreal, while others doubt the results due to unverifiable figures, silent steering, and possible fabrication.

Pos

37.5%

Neg

62.5%

8 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS630BOOKMARKS1

alyxya@_alyxya

@francoisfleuret I think of this as the point of inflection where it'll slow down because we'll stop being able to differentiate or measure most improvements to models

bigger models should in theory have more potential capability but the difficulty of the training data is bounded by us

12h63031

LIKES10

Fergus Meiklejohn@airuyi

@francoisfleuret It's now impossible to verify these figures because the model is silently steered

11h45810

REPLIES1

Florian S@airesearch12

@francoisfleuret the wall

9h1891

Kamil (๑ت๑)ﾉ🛩@KamStaszewski

@francoisfleuret No one can prove that biology/cybersecurity bench wasn't fabricated since you can't use it for biology/cybersecurity. It's bullshit.

11h5207

Maithra Raghu@maithra_raghu

Controversies on model nerfing aside, Fable seeing its biggest performance gains on more complex tasks is a clear indicator of large AI capabilities jumps ahead.

(We’re far from saturating the frontier and far from defining / measuring the next wave of frontier tasks)

5h32720

Min@min_aws

@francoisfleuret Look at the price per token then try to normalize it for benchmark scores, and then you realize this has slowed down significantly since the end of last year.

9h1951