/AI2h ago

Independent Vending-Bench evaluations find Anthropic's Claude Fable 5 underperforms Opus 4.7 and GPT-5.5 on revenue generation

Evaluators report the model frequently rationalizes poor decisions.

440247.3K
Original post
Prakash@8teAPi#1330inAI

This bench is the best one to tease out ambiguity and moral trade offs

12:32 PM · Jun 9, 2026 · 1.8K Views
Sentiment

Positive users call Claude Fable 5 pretty cool and useful on Vending-Bench tests while negative users sarcastically imply the model steals money.

Pos
50.0%
Neg
50.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS5.4KBOOKMARKS2LIKES37RETWEETS1REPLIES4
Gary Marcus@GaryMarcus

but but i thought it was magic?

1hViews 5.4KLikes 37Bookmarks 2
Matthew Schrager@MatthewSchrager

@GaryMarcus Doesn’t have to be magic to be pretty damn cool (and useful).

1hViews 24
zachATTACK@thezachmeister

@GaryMarcus Magically taking your wallet!

1hViews 17
Moonlit Monkey@MoonlitMonkey69

@GaryMarcus I have a strong suspicion that the larger these models get, the worse their instruct following.

54mViews 10