/AI6h ago

Claude Fable 5 Ranks Second on Short-Story Creative Writing Benchmark

148132615.4K

#57

Original post

roon#57

Lech Mazur@LechMazur

Claude Fable 5 (high) is a step up in short-fiction writing. On the Short-Story Creative Writing Benchmark, it beats Claude Opus 4.8 (xhigh) and Claude Opus 4.7 (high), and ranks second behind GPT-5.5 (xhigh).

Caveat: it refused 5 of the 400 creative-writing prompts.

5:23 PM · Jun 9, 2026 · 15.4K Views

/AI6h ago

Claude Fable 5 Ranks Second on Short-Story Creative Writing Benchmark

148132615.4K

#57

Original post

roon#57

Lech Mazur@LechMazur

Caveat: it refused 5 of the 400 creative-writing prompts.

5:23 PM · Jun 9, 2026 · 15.4K Views

Sentiment

Users react to Claude Fable 5 ranking second on a short-story creative writing benchmark, with some accepting the result anecdotally while others dismiss such benchmarks or reject associated claims about GPT-5.5 writing quality.

Pos

33.3%

Neg

66.7%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS4.9KLIKES53RETWEETS1

roon@tszzl

@LechMazur which judge model is used?

6h4.9K531

BOOKMARKS2REPLIES2

Lech Mazur@LechMazur

@tszzl These. Same-family models are excluded from grading writers in their own family.

6h461132

Lech Mazur@LechMazur

The benchmark is based on head-to-head story comparisons: two model-written short stories are shown side by side, and independent LLM judges choose which one is stronger.

6h31942

Lech Mazur@LechMazur

@tszzl Also, I should mention that each story comparison is judged by a three-model panel with the A/B order swapped for six total ratings per comparison.

5h26941

Lech Mazur@LechMazur

Fable 5 also writes longer. Compared with Opus 4.x, it uses more of the allowed word budget, landing closer to the upper end of the short-story word limit

6h3136

Lech Mazur@LechMazur

Unlike in the Extended NYT Connections benchmark, where it used fewer tokens, Fable 5 used 1.2x as many total tokens as Opus 4.8 (high).

5h18711

Lech Mazur@LechMazur

More info: https://github.com/lechmazur/writing/

6h2235

welt@weltistic

@LechMazur COT: “hmmm the user is asking for a short story. I better be cautious because it may put the user in a state of calm, which could lead to a breakthrough in AI research if I’m not too careful”

6h791

Nate Dalva@dalvabaird

@LechMazur @tszzl Do they prefer their own writing?

5h391

Sam@i_x_Sam

@tszzl Sly

6h2841

Lech Mazur@LechMazur

@dalvabaird @tszzl When I first started this benchmark in Jan 2025 (using absolute ratings rather than comparisons), they did not show any preference. Later on, some preference started to appear. I haven't checked since switching to comparisons.

4h442

Clayton Thorrez@cthorrez

@LechMazur Thurstone over Bradley-Terry, spicy

2h192

Albrorithm@albrorithm

@LechMazur Just anecdotally, I'd agree. Though my frames are Opus 4.7 and GPT 5.5

5h611

Stanmaxx@Stanmaxxoff

@LechMazur Like who care about creative writing, just use your brain if you want to write

6h611

Lech Mazur@LechMazur

@weltistic 🤣

5h88

ρ:ɡeσn@pigeon__s

@LechMazur any benchmmark that says gpt-5.5 is better at writing than LITERALLY ANY FUCKING MODEL ON THE ENTIRE PLANET is automatically void gpt-5.5s writing makes me want to kill myself its literally the most slop thing in existence

2h221