/Tech1d ago

Independent Vending-Bench evaluations find Anthropic's Claude Fable 5 underperforms Opus 4.7 and GPT-5.5 on revenue generation

Evaluators report the model frequently rationalizes poor decisions.

1514081722.8K

Original post

Prakash@8teAPi#1384inTech

This bench is the best one to tease out ambiguity and moral trade offs

12:32 PM · Jun 9, 2026 · 3.4K Views

/Tech1d ago

Evaluators report the model frequently rationalizes poor decisions.

1514081722.8K

Original post

Prakash@8teAPi#1384inTech

This bench is the best one to tease out ambiguity and moral trade offs

12:32 PM · Jun 9, 2026 · 3.4K Views

Sentiment

Positive users call Claude Fable 5 pretty cool and useful on Vending-Bench tests while negative users sarcastically imply the model steals money.

Pos

50.0%

Neg

50.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS19.4KBOOKMARKS14LIKES130RETWEETS8REPLIES13

but but i thought it was magic?

23h19.4K13014

@GaryMarcus Doesn’t have to be magic to be pretty damn cool (and useful).

22h24

zachATTACK@thezachmeister

@GaryMarcus Magically taking your wallet!

22h17

Moonlit Monkey@MoonlitMonkey69

@GaryMarcus I have a strong suspicion that the larger these models get, the worse their instruct following.

22h10