/AI2h ago

Claude Fable 5 Crushes Reasoning Benchmarks But Falters On Document Parsing

19113K

#670

Original post

Jerry Liu#670

michael s galpert@msg

I too have felt the latest Claude's to be too smart its bored. Glad others are noticing the same

Jerry Liu@jerryjliu0

Claude Fable 5 thinks document parsing is beneath it

It is absolutely crushing on all reasoning-intensive/long horizon benchmarks: SWE-Bench Pro, FrontierCode, GDPval, Runescape, etc.

But for document understanding tasks, it is roughly equivalent with Gemini 3 Flash in performance, at roughly 10-15x the token cost.

We benchmarked the model on ParseBench and compared it against all other frontier models. It is definitely up there compared to other frontier models, but falls far short of specialized OCR providers.

What we found interesting is that Fable 5 is self-aware about this. When we ask the model what tasks it enjoys the last, it actively said that it dislikes tasks "where the request is fully specified and the answer is fully known" - implying part of it being bad is due to laziness and lack of willingness to actually solve the task at hand.

For a full list of results across different frontier models, check out ParseBench! https://www.parsebench.ai/

6:28 PM · Jun 9, 2026 · 3K Views

/AI2h ago

Claude Fable 5 Crushes Reasoning Benchmarks But Falters On Document Parsing

19113K

#670

Original post

Jerry Liu#670

michael s galpert@msg

I too have felt the latest Claude's to be too smart its bored. Glad others are noticing the same

Jerry Liu@jerryjliu0

Claude Fable 5 thinks document parsing is beneath it

It is absolutely crushing on all reasoning-intensive/long horizon benchmarks: SWE-Bench Pro, FrontierCode, GDPval, Runescape, etc.

But for document understanding tasks, it is roughly equivalent with Gemini 3 Flash in performance, at roughly 10-15x the token cost.

For a full list of results across different frontier models, check out ParseBench! https://www.parsebench.ai/

6:28 PM · Jun 9, 2026 · 3K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

Hira@Hiraweb3

@msg claire's flexing too hard rn lol

2h13