insane jump in confirmed successes and praises by users
Claude Fable 5 leads Agent Arena leaderboard with an 18.23% task success rate, nearly doubling Claude Opus 4.8
Founder Anastasios Nikolas Angelopoulos calculated a +11.2% treatment effect.
Some users praise Claude Fable 5's token efficiency and anticipate Sonnet 5 while others dismiss the leaderboard wins as unrealistic and criticize its refusals on technical tasks like vision pipelines.
Most Activity
Fable 5 is also by far the best computer use model according to Stagehand Agent Evals
it also costs less than half of GPT-5.5

Claude Fable 5 ranks #1 overall (+11.2%) - #1 Confirmed Task Success (+18.2%) - #1 Praise vs. Complaint (+30.6%) - #1 Tool Hallucination (+2.1%) - #7 Bash Recovery (+11.9%) - #17 Steerability (-6.8%, still stabilizing)
speed could be better thoi
Fable 5 is also by far the best computer use model according to Stagehand Agent Evals
it also costs less than half of GPT-5.5

Learn more about the causal tracing methodology for Agent Arena on our blog: http://arena.ai/blog/agent-arena-methodology

Head over to the Agent Arena leaderboard to dive into the details: http://arena.ai/leaderboard/agent
Makes sense. The treatment effect is +11.2%... pretty large
https://www.stagehand.dev/evals
Fable 5 is also by far the best computer use model according to Stagehand Agent Evals
it also costs less than half of GPT-5.5
Claude Fable 5 by @AnthropicAI leads by the widest margins over other top models like Opus-4.8 and GPT-5.5 on two key signals: confirmed task success rate and praise vs. complaint.

@scaling01 it refuses to anything in AL or ML for me - like design a vision pipeline
Fable 5 is #1 in Agent Arena. Another exciting breakthrough from Anthropic!

@scaling01 Mythos is a really great foundation model. i cant wait for Sonnet 5.

@scaling01 Yeah, right. They probably didn't ask any questions possessing “catastrophic risk”.

@scaling01 I like how it's only slightly more expensive than Opus in practice.
They finally figured out token efficiency.

@scaling01 gemini flash at $0.029 per task doing 73.81% accuracy while gpt-5.5 charges 44x more for 76% is the number nobody wants to talk about

@scaling01 try vertex

a better computer-use model wins the short stagehand eval. the tax is hour 6 of a real desktop run, when every step re-reads pixels and font or layout drift compounds. structural AX/UIA trees do not accrue that cost. we built Terminator to drive desktops off those AX/UIA trees instead of pixels, https://t8r.tech/r/zzwg8x8g written with ai

@scaling01 so this table has only fable listed at 11%? gpt-5.5 at 67% costs 2x. its all about what you are willing to pay for

@scaling01 the cost gap is the part nobody wants to talk about
if performance is close and price is half, the choice writes itself