/Tech1d ago

Claude Fable 5 Ranks Second on Short-Story Creative Writing Benchmark

1918395027.6K
Original postroon#59
Lech Mazur@LechMazur

Claude Fable 5 (high) is a step up in short-fiction writing. On the Short-Story Creative Writing Benchmark, it beats Claude Opus 4.8 (xhigh) and Claude Opus 4.7 (high), and ranks second behind GPT-5.5 (xhigh).

Caveat: it refused 5 of the 400 creative-writing prompts.

5:23 PM · Jun 9, 2026 · 27.6K Views
Sentiment

Users react to Claude Fable 5 ranking second on a short-story creative writing benchmark, with some accepting the result anecdotally while others dismiss such benchmarks or reject associated claims about GPT-5.5 writing quality.

Pos
33.3%
Neg
66.7%
3 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS8.5KLIKES68RETWEETS1REPLIES4
roon@tszzl

@LechMazur which judge model is used?

1dViews 8.5KLikes 68Bookmarks 1
BOOKMARKS3
Lech Mazur@LechMazur

@tszzl These. Same-family models are excluded from grading writers in their own family.

1dViews 847Likes 14Bookmarks 3
Lech Mazur@LechMazur

The benchmark is based on head-to-head story comparisons: two model-written short stories are shown side by side, and independent LLM judges choose which one is stronger.

1dViews 4.1KLikes 7Bookmarks 3
ρ:ɡeσn@pigeon__s

@LechMazur any benchmmark that says gpt-5.5 is better at writing than LITERALLY ANY FUCKING MODEL ON THE ENTIRE PLANET is automatically void gpt-5.5s writing makes me want to kill myself its literally the most slop thing in existence

21hViews 187Likes 7Bookmarks 1
Lech Mazur@LechMazur

Unlike in the Extended NYT Connections benchmark, where it used fewer tokens, Fable 5 used 1.2x as many total tokens as Opus 4.8 (high).

1dViews 415Likes 2Bookmarks 1
Lech Mazur@LechMazur

@tszzl Also, I should mention that each story comparison is judged by a three-model panel with the A/B order swapped for six total ratings per comparison.

1dViews 505Likes 5Bookmarks 1
Lech Mazur@LechMazur

Fable 5 also writes longer. Compared with Opus 4.x, it uses more of the allowed word budget, landing closer to the upper end of the short-story word limit

1dViews 710Likes 8
Lech Mazur@LechMazur

More info: https://github.com/lechmazur/writing/

1dViews 484Likes 6
welt@weltistic

@LechMazur COT: “hmmm the user is asking for a short story. I better be cautious because it may put the user in a state of calm, which could lead to a breakthrough in AI research if I’m not too careful”

1dViews 178Likes 1
Nate Dalva@dalvabaird

@LechMazur @tszzl Do they prefer their own writing?

23hViews 92Likes 1
Sam@i_x_Sam

@tszzl Sly

1dViews 455Likes 1
Lech Mazur@LechMazur

@dalvabaird @tszzl When I first started this benchmark in Jan 2025 (using absolute ratings rather than comparisons), they did not show any preference. Later on, some preference started to appear. I haven't checked since switching to comparisons.

23hViews 123Likes 2

@LechMazur Thurstone over Bradley-Terry, spicy

21hViews 92Likes 2
Albrorithm@albrorithm

@LechMazur Just anecdotally, I'd agree. Though my frames are Opus 4.7 and GPT 5.5

1dViews 128Likes 1
Stanmaxx@Stanmaxxoff

@LechMazur Like who care about creative writing, just use your brain if you want to write

1dViews 110Likes 1