/AI5h ago

Fable 5 Tops EQ-Bench and Creative Writing Evaluations

579484.9K

#975

Original post

Lisan al Gaib#975

Sam Paech@sam_paech

Fable 5 tops EQ-Bench and both creative writing evals!

Personal take from reading some outputs: It has tics and tells, and isn't compelling in the way human writing is. But I think it earned its spot, in the sense that it's *relatively* excellent & hasn't reward-hacked the eval.

3:47 AM · Jun 10, 2026 · 4.9K Views

/AI5h ago

Fable 5 Tops EQ-Bench and Creative Writing Evaluations

579484.9K

#975

Original post

Lisan al Gaib#975

Sam Paech@sam_paech

Fable 5 tops EQ-Bench and both creative writing evals!

3:47 AM · Jun 10, 2026 · 4.9K Views

Sentiment

Some users showed excitement about the new EQ-Bench for Claude-Fable-5 while others complained about its high running costs from using multiple judges.

Pos

0.0%

Neg

100.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VioP@AcousimHss

@sam_paech how do u set a model to top ?? how does this work ? so now is fable 5 the new judge ? if we r to do local reproduction tests for our models do we have to use fable 5 then?

4h1332

LIKES2REPLIES2

Sam Paech@sam_paech

@AcousimHss Yes, that's correct. It compares the responses in head-to-head matchups and picks the winner/loser. So with eqbench3, the judge thinks those 3 models are better than its own outputs. In eqbench4 (releasing soon) 3x judges are used to mitigate self-bias.

4h62

Sam Paech@sam_paech

@AcousimHss It's llm-judged, but the judges are kept constant. To reproduce the results, use the same judge as the leaderboard (noted in the about page & on repo readme). Lmk if you run into any issues reproducing results, I'm happy to help.

4h921

VioP@AcousimHss

@sam_paech oh no i understand that this is llm judged , my question was how do top 1 model get placed , so like did opus 4.6 place fable over itself and 4.7,4.8?? or was there any other way im jst curious to learn is all!

4h171

VioP@AcousimHss

@sam_paech but god damn the benchmark is so costly to run , if its 3x judges 🥲 its so over

3h5

VioP@AcousimHss

@sam_paech ooo new bench👀

3h1