/Tech1h ago

Claude Fable 5 leads Agent Arena leaderboard with an 18.23% task success rate, nearly doubling Claude Opus 4.8

Founder Anastasios Nikolas Angelopoulos calculated a +11.2% treatment effect.

1829883614.3K

#872

Original post

Lisan al Gaib@scaling01#1064inTech

insane jump in confirmed successes and praises by users

1:37 PM · Jun 10, 2026 · 5.5K Views

/Tech1h ago

Claude Fable 5 leads Agent Arena leaderboard with an 18.23% task success rate, nearly doubling Claude Opus 4.8

Founder Anastasios Nikolas Angelopoulos calculated a +11.2% treatment effect.

1829883614.3K

#872

Original post

Lisan al Gaib@scaling01#1064inTech

insane jump in confirmed successes and praises by users

1:37 PM · Jun 10, 2026 · 5.5K Views

Sentiment

Some users praise Claude Fable 5's token efficiency and anticipate Sonnet 5 while others dismiss the leaderboard wins as unrealistic and criticize its refusals on technical tasks like vision pipelines.

Pos

33.3%

Neg

66.7%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS5.2KBOOKMARKS25LIKES158RETWEETS3REPLIES15

Lisan al Gaib@scaling01

Fable 5 is also by far the best computer use model according to Stagehand Agent Evals

it also costs less than half of GPT-5.5

1h5.2K15825

Arena.ai@arena

Claude Fable 5 ranks #1 overall (+11.2%) - #1 Confirmed Task Success (+18.2%) - #1 Praise vs. Complaint (+30.6%) - #1 Tool Hallucination (+2.1%) - #7 Bash Recovery (+11.9%) - #17 Steerability (-6.8%, still stabilizing)

1h1.7K234

Lisan al Gaib@scaling01

speed could be better thoi

Lisan al Gaib@scaling01

Fable 5 is also by far the best computer use model according to Stagehand Agent Evals

it also costs less than half of GPT-5.5

1h2.2K231

Arena.ai@arena

Learn more about the causal tracing methodology for Agent Arena on our blog: http://arena.ai/blog/agent-arena-methodology

1h1.3K91

Arena.ai@arena

Head over to the Agent Arena leaderboard to dive into the details: http://arena.ai/leaderboard/agent

1h1.4K81

Anastasios Nikolas Angelopoulos@ml_angelopoulos

Makes sense. The treatment effect is +11.2%... pretty large

1h89881

Lisan al Gaib@scaling01

https://www.stagehand.dev/evals

Lisan al Gaib@scaling01

Fable 5 is also by far the best computer use model according to Stagehand Agent Evals

it also costs less than half of GPT-5.5

1h1.1K60

Arena.ai@arena

Claude Fable 5 by @AnthropicAI leads by the widest margins over other top models like Opus-4.8 and GPT-5.5 on two key signals: confirmed task success rate and praise vs. complaint.

1h2K316

Jackson C@CJackson26740

@scaling01 it refuses to anything in AL or ML for me - like design a vision pipeline

1h621

Wei-Lin Chiang@infwinston

Fable 5 is #1 in Agent Arena. Another exciting breakthrough from Anthropic!

28m8400

Justin@JustinGorya

@scaling01 Mythos is a really great foundation model. i cant wait for Sonnet 5.

1h64

Mariusz Kurman@mkurman88

@scaling01 Yeah, right. They probably didn't ask any questions possessing “catastrophic risk”.

1h47

Neuralease@neuralease

@scaling01 I like how it's only slightly more expensive than Opus in practice.

They finally figured out token efficiency.

1h20

haro@harobuilds

@scaling01 gemini flash at $0.029 per task doing 73.81% accuracy while gpt-5.5 charges 44x more for 76% is the number nobody wants to talk about

1h11

gum@gum1h0x

@scaling01 try vertex

1h4

Matt@m13v_

a better computer-use model wins the short stagehand eval. the tax is hour 6 of a real desktop run, when every step re-reads pixels and font or layout drift compounds. structural AX/UIA trees do not accrue that cost. we built Terminator to drive desktops off those AX/UIA trees instead of pixels, https://t8r.tech/r/zzwg8x8g written with ai

1h3

Alex YGift@Radipdegen

@scaling01 so this table has only fable listed at 11%? gpt-5.5 at 67% costs 2x. its all about what you are willing to pay for

Rugbist@rugbist_

@scaling01 the cost gap is the part nobody wants to talk about

if performance is close and price is half, the choice writes itself