馃毃 Fable benchmark update for hub vs spoke!
Fable 5 is the first model I've tested whose self-assessments are genuinely calibrated! But even a simply designed market seems to do better.
- Solo Fable 5 beat *every* topology on quality (8.1 avg, 87% pass vs the market's 7.2/76%), at 2.6x the market's cost! - A pricier market of frontier coding agents (Opus 4.7 and GPT-5.5) bought no quality at 4x the cost - Routing based on track record - give Fable the tasks the cheap pool has failed - hits 93%, beating solo Fable on quality and cost!
What this means is that we really can make much better topologies that actually manage both cost and effort, even with older models that aren't as good at being calibrated!