Claude Fable 5 achieves a new high score of 161 on the Epoch Capabilities Index! This beats out GPT-5.5 Pro by 1 point, and is the first time Anthropic has taken the lead on the ECI in over a year.
Claude Fable 5 scores a record 161 on the Epoch Capabilities Index, beating GPT-5.5 Pro by one point
Story Overview
Anthropic's Claude Fable 5 has claimed the top position on the Epoch Capabilities Index for the first time in over a year, landing at 161 to edge GPT-5.5 Pro by a single point with a reported 90% confidence interval spanning 156 to 169.
Benchmark work stops for now
Epoch researchers have halted additional testing on the model while awaiting future U.S. developments, so the published score could still shift.
Other tests paint a mixed picture
Fable 5 sits near the front on certain program-reconstruction tasks, yet GPT-5.5 leads on accuracy-per-cost and token metrics in separate evaluations such as BrokenArXiv.
Positive users celebrate Claude Fable 5's record Epoch index score as exciting progress and intensifying competition, while negative users dismiss the one-point edge over GPT-5.5 Pro as trivial hype not worth the surrounding claims.
Most Activity
Fable is slightly higher rated than GPT-5.5-Pro on Epoch's ECI
I suspect as we get more benchmark results its ECI should improve to ~163
Claude Fable 5 achieves a new high score of 161 on the Epoch Capabilities Index! This beats out GPT-5.5 Pro by 1 point, and is the first time Anthropic has taken the lead on the ECI in over a year.
don't worry guys, Mythos is not that good at coding
insane score on ProgramBench
Fable is slightly higher rated than GPT-5.5-Pro on Epoch's ECI
I suspect as we get more benchmark results its ECI should improve to ~163
even when you truncate the x axis at 150, exaggerating the gains, this looks like gradual change rather than a whole new order.
Claude Fable 5 achieves a new high score of 161 on the Epoch Capabilities Index! This beats out GPT-5.5 Pro by 1 point, and is the first time Anthropic has taken the lead on the ECI in over a year.
trump and anthropic freaking out together, over what turns out to be a 1 point benchmark gain 🤦♂️
even when you truncate the x axis at 150, exaggerating the gains, this looks like gradual change rather than a whole new order.

There isn't yet enough data to say confidently whether Fable 5 also outperforms on software. It's performance on WeirdML v2 alone would suggest a SWE-specific ECI of 169.
Fable being a proxy for Mythos here
Mythos should be at least as strong as Fable
don't worry guys, Mythos is not that good at coding
insane score on ProgramBench
https://www.vals.ai/benchmarks/programbench
don't worry guys, Mythos is not that good at coding
insane score on ProgramBench
@scaling01 I think it's a fair rating GPT 5.5 has very strong post-training and we see on MathArena that Fable isn't uniformly the strongest model even against normal 5.5. Pro can genuinely lead in some dimensions
Fable is slightly higher rated than GPT-5.5-Pro on Epoch's ECI
I suspect as we get more benchmark results its ECI should improve to ~163

Historically Anthropic models have been slightly behind the frontier on the ECI, with outsized performance on software benchmarks compared to many other tasks. Fable 5 bucked this trend, taking the lead on math benchmarks:

@GaryMarcus I think this is the graph you want if you're interested in the longer term trend:
recursive self improvement! 🤯🤯🤯🤯
even when you truncate the x axis at 150, exaggerating the gains, this looks like gradual change rather than a whole new order.

You can more about the performance of Claude Fable 5 on our website! https://epoch.ai/models/claude-fable-5
@teortaxesTex it's so much better at everything code related
and relative to prior Anthropic models it also performs really well on math
don't worry guys, Mythos is not that good at coding
insane score on ProgramBench

@GaryMarcus Yep! Overall trend seems stable atm.

@EpochAIResearch @AndrewCurran_ Rerun that benchmark now, I bet the results will differ quite a bit.

@GaryMarcus We don't have points further back, though we might add more in the future. A big part of why we started as high as we did was to have room for that.

@EpochAIResearch probably means you could get another point out of mythos by doing a parallel generation & consensus strategy (assuming this is what gpt-5.5-pro does)

@YafahEdelman i see progress but not discontinuities

@scaling01 Imagine if you worked at Anthropic and had unlimited access to uncensored Fable for free.

@GaryMarcus Have you actually used it? Because benchmarks mean almost nothing in my opinion, using it was actually insane, it fixed persistent issues in a way that 5.5 doesn't get at and no other model even comes close too.