/Tech3h ago

Fable 5 Model Shows Calibrated Self-Assessments in Agent Benchmarks

116021.9K

Original post

rohit@krishnanrohit#1210inTech

🚨 Fable benchmark update for hub vs spoke!

Fable 5 is the first model I've tested whose self-assessments are genuinely calibrated! But even a simply designed market seems to do better.

- Solo Fable 5 beat *every* topology on quality (8.1 avg, 87% pass vs the market's 7.2/76%), at 2.6x the market's cost! - A pricier market of frontier coding agents (Opus 4.7 and GPT-5.5) bought no quality at 4x the cost - Routing based on track record - give Fable the tasks the cheap pool has failed - hits 93%, beating solo Fable on quality and cost!

What this means is that we really can make much better topologies that actually manage both cost and effort, even with older models that aren't as good at being calibrated!

11:37 AM · Jun 10, 2026 · 1.3K Views

/Tech3h ago

Fable 5 Model Shows Calibrated Self-Assessments in Agent Benchmarks

116021.9K

#1210

Original post

rohit@krishnanrohit#1210inTech

🚨 Fable benchmark update for hub vs spoke!

Fable 5 is the first model I've tested whose self-assessments are genuinely calibrated! But even a simply designed market seems to do better.

What this means is that we really can make much better topologies that actually manage both cost and effort, even with older models that aren't as good at being calibrated!

11:37 AM · Jun 10, 2026 · 1.3K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS636LIKES1

rohit@krishnanrohit

The previous essay: https://www.strangeloopcanon.com/p/why-smart-planners-lose-to-simple

It's interesting that as models get smarter ways to route them also get more complicated.

rohit@krishnanrohit

🚨 Fable benchmark update for hub vs spoke!

Fable 5 is the first model I've tested whose self-assessments are genuinely calibrated! But even a simply designed market seems to do better.

What this means is that we really can make much better topologies that actually manage both cost and effort, even with older models that aren't as good at being calibrated!

3h63610