/AI5h ago

Fable 5 Tops EQ-Bench and Creative Writing Evaluations

579484.9K
Original postLisan al Gaib#975
Sam Paech@sam_paech

Fable 5 tops EQ-Bench and both creative writing evals!

Personal take from reading some outputs: It has tics and tells, and isn't compelling in the way human writing is. But I think it earned its spot, in the sense that it's *relatively* excellent & hasn't reward-hacked the eval.

3:47 AM · Jun 10, 2026 · 4.9K Views
Sentiment

Some users showed excitement about the new EQ-Bench for Claude-Fable-5 while others complained about its high running costs from using multiple judges.

Pos
0.0%
Neg
100.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS133
VioP@AcousimHss

@sam_paech how do u set a model to top ?? how does this work ? so now is fable 5 the new judge ? if we r to do local reproduction tests for our models do we have to use fable 5 then?

4hViews 133Likes 2
LIKES2REPLIES2
Sam Paech@sam_paech

@AcousimHss Yes, that's correct. It compares the responses in head-to-head matchups and picks the winner/loser. So with eqbench3, the judge thinks those 3 models are better than its own outputs. In eqbench4 (releasing soon) 3x judges are used to mitigate self-bias.

4hViews 6Likes 2
Sam Paech@sam_paech

@AcousimHss It's llm-judged, but the judges are kept constant. To reproduce the results, use the same judge as the leaderboard (noted in the about page & on repo readme). Lmk if you run into any issues reproducing results, I'm happy to help.

4hViews 92Likes 1
VioP@AcousimHss

@sam_paech oh no i understand that this is llm judged , my question was how do top 1 model get placed , so like did opus 4.6 place fable over itself and 4.7,4.8?? or was there any other way im jst curious to learn is all!

4hViews 17Likes 1
VioP@AcousimHss

@sam_paech but god damn the benchmark is so costly to run , if its 3x judges 🥲 its so over

3hViews 5
VioP@AcousimHss

@sam_paech ooo new bench👀

3hViews 1