/Tech1d ago

Independent Vending-Bench evaluations find Anthropic's Claude Fable 5 underperforms Opus 4.7 and GPT-5.5 on revenue generation

Evaluators report the model frequently rationalizes poor decisions.

1514081722.8K
Original post
Prakash@8teAPi#1384inTech

This bench is the best one to tease out ambiguity and moral trade offs

12:32 PM · Jun 9, 2026 · 3.4K Views
Sentiment

Positive users call Claude Fable 5 pretty cool and useful on Vending-Bench tests while negative users sarcastically imply the model steals money.

Pos
50.0%
Neg
50.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS19.4KBOOKMARKS14LIKES130RETWEETS8REPLIES13
Gary Marcus@GaryMarcus

but but i thought it was magic?

23hViews 19.4KLikes 130Bookmarks 14
Matthew Schrager@MatthewSchrager

@GaryMarcus Doesn’t have to be magic to be pretty damn cool (and useful).

22hViews 24
zachATTACK@thezachmeister

@GaryMarcus Magically taking your wallet!

22hViews 17
Moonlit Monkey@MoonlitMonkey69

@GaryMarcus I have a strong suspicion that the larger these models get, the worse their instruct following.

22hViews 10