/AI6h ago

Claude Fable 5 Ranks Second on Short-Story Creative Writing Benchmark

148132615.4K
Original postroon#57
Lech Mazur@LechMazur

Claude Fable 5 (high) is a step up in short-fiction writing. On the Short-Story Creative Writing Benchmark, it beats Claude Opus 4.8 (xhigh) and Claude Opus 4.7 (high), and ranks second behind GPT-5.5 (xhigh).

Caveat: it refused 5 of the 400 creative-writing prompts.

5:23 PM · Jun 9, 2026 · 15.4K Views
Sentiment

Users react to Claude Fable 5 ranking second on a short-story creative writing benchmark, with some accepting the result anecdotally while others dismiss such benchmarks or reject associated claims about GPT-5.5 writing quality.

Pos
33.3%
Neg
66.7%
3 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS4.9KLIKES53RETWEETS1
roon@tszzl

@LechMazur which judge model is used?

6hViews 4.9KLikes 53Bookmarks 1
BOOKMARKS2REPLIES2
Lech Mazur@LechMazur

@tszzl These. Same-family models are excluded from grading writers in their own family.

6hViews 461Likes 13Bookmarks 2
Lech Mazur@LechMazur

The benchmark is based on head-to-head story comparisons: two model-written short stories are shown side by side, and independent LLM judges choose which one is stronger.

6hViews 319Likes 4Bookmarks 2
Lech Mazur@LechMazur

@tszzl Also, I should mention that each story comparison is judged by a three-model panel with the A/B order swapped for six total ratings per comparison.

5hViews 269Likes 4Bookmarks 1
Lech Mazur@LechMazur

Fable 5 also writes longer. Compared with Opus 4.x, it uses more of the allowed word budget, landing closer to the upper end of the short-story word limit

6hViews 313Likes 6
Lech Mazur@LechMazur

Unlike in the Extended NYT Connections benchmark, where it used fewer tokens, Fable 5 used 1.2x as many total tokens as Opus 4.8 (high).

5hViews 187Likes 1Bookmarks 1
Lech Mazur@LechMazur

More info: https://github.com/lechmazur/writing/

6hViews 223Likes 5
welt@weltistic

@LechMazur COT: “hmmm the user is asking for a short story. I better be cautious because it may put the user in a state of calm, which could lead to a breakthrough in AI research if I’m not too careful”

6hViews 79Likes 1
Nate Dalva@dalvabaird

@LechMazur @tszzl Do they prefer their own writing?

5hViews 39Likes 1
Sam@i_x_Sam

@tszzl Sly

6hViews 284Likes 1
Lech Mazur@LechMazur

@dalvabaird @tszzl When I first started this benchmark in Jan 2025 (using absolute ratings rather than comparisons), they did not show any preference. Later on, some preference started to appear. I haven't checked since switching to comparisons.

4hViews 44Likes 2

@LechMazur Thurstone over Bradley-Terry, spicy

2hViews 19Likes 2
Albrorithm@albrorithm

@LechMazur Just anecdotally, I'd agree. Though my frames are Opus 4.7 and GPT 5.5

5hViews 61Likes 1
Stanmaxx@Stanmaxxoff

@LechMazur Like who care about creative writing, just use your brain if you want to write

6hViews 61Likes 1
ρ:ɡeσn@pigeon__s

@LechMazur any benchmmark that says gpt-5.5 is better at writing than LITERALLY ANY FUCKING MODEL ON THE ENTIRE PLANET is automatically void gpt-5.5s writing makes me want to kill myself its literally the most slop thing in existence

2hViews 22Likes 1