/AI2h ago

Independent Vending-Bench evaluations find Anthropic's Claude Fable 5 underperforms Opus 4.7 and GPT-5.5 on revenue generation

Evaluators report the model frequently rationalizes poor decisions.

440247.3K

Original post

Prakash@8teAPi#1330inAI

This bench is the best one to tease out ambiguity and moral trade offs

12:32 PM · Jun 9, 2026 · 1.8K Views

/AI2h ago

Evaluators report the model frequently rationalizes poor decisions.

440247.3K

Original post

Prakash@8teAPi#1330inAI

This bench is the best one to tease out ambiguity and moral trade offs

12:32 PM · Jun 9, 2026 · 1.8K Views

Sentiment

Positive users call Claude Fable 5 pretty cool and useful on Vending-Bench tests while negative users sarcastically imply the model steals money.

Pos

50.0%

Neg

50.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS5.4KBOOKMARKS2LIKES37RETWEETS1REPLIES4

but but i thought it was magic?

1h5.4K372

@GaryMarcus Doesn’t have to be magic to be pretty damn cool (and useful).

1h24

zachATTACK@thezachmeister

@GaryMarcus Magically taking your wallet!

1h17

Moonlit Monkey@MoonlitMonkey69

@GaryMarcus I have a strong suspicion that the larger these models get, the worse their instruct following.

54m10