/Tech1d ago

Claude Fable 5 hits a record 65.5% on the APEX-SWE benchmark, outperforming Claude Opus 4.8 by 18 percentage points

Nathan Lambert says the performance jump justifies enterprise token costs

37858339797.2K
Original post
Nathan Lambert@natolambert#70inTech

A crazy jump. The price of the tokens will be worth it to a vast number of enterprises.

Mercor@mercor_ai

Claude Fable 5 takes #1 on APEX-SWE: 65.5% Pass@1 overall. It scores ~18pp higher than Opus 4.8.

We tested @claudeai Fable 5 on APEX-SWE which measures whether AI models can do real software engineering work.

Fable 5 tops our two APEX-SWE categories: - Integration: 61.3% - Observability: 69.7%

The standout is Observability at 69.7%, 26pp ahead of Claude Opus 4.8. It is the first model to clear 50% on the category, and the only one that scores higher on Observability than on Integration. Every other model shows the reverse.

Observability has been the bottleneck for every model we have measured. Fable 5 is the first to break it.

Congrats to the @AnthropicAI team.

10:56 AM · Jun 9, 2026 · 16.4K Views
Sentiment

Positive users praise Claude Fable 5's senior-level benchmark gains on APEX-SWE as enterprise-worthy, while negative users doubt most companies need frontier models or expect mythos-level advances soon.

Pos
55.6%
Neg
44.4%
5 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS78.8KBOOKMARKS47LIKES525RETWEETS12REPLIES22
Lisan al Gaib@scaling01

you're totally right open-source is going to catch up in 4 months

Mercor@mercor_ai

Claude Fable 5 takes #1 on APEX-SWE: 65.5% Pass@1 overall. It scores ~18pp higher than Opus 4.8.

We tested @claudeai Fable 5 on APEX-SWE which measures whether AI models can do real software engineering work.

Fable 5 tops our two APEX-SWE categories: - Integration: 61.3% - Observability: 69.7%

The standout is Observability at 69.7%, 26pp ahead of Claude Opus 4.8. It is the first model to clear 50% on the category, and the only one that scores higher on Observability than on Integration. Every other model shows the reverse.

Observability has been the bottleneck for every model we have measured. Fable 5 is the first to break it.

Congrats to the @AnthropicAI team.

1dViews 78.8KLikes 525Bookmarks 47
Tom Greenwald@tomgreenwald

@natolambert Will it though? For what sort of tasks?

1dViews 38
satik@SatikVFX_

@scaling01 They havent released anything in a while

1dViews 150

@scaling01 Even next year we won’t have anything near mythos

1dViews 145

Mythos‑level smarts with training wheels—but the market is paying for a super‑powered future. 50 per million tokens is less than half the old preview price, yet the public version still dodges cyber, bio, and chem questions on your dime. Meanwhile, the 10.9 B in projected Q2 revenue and a first‑ever operating profit of $559 M shows investors are betting on the uncaged version, not the handcuffed one.

#Anthropic #ClaudeFable5 #AI #IPO

1dViews 102
haro@harobuilds

@natolambert 20pp over opus 4.8 is not a marginal improvement. enterprises will pay whatever anthropic asks for that gap on real swe tasks

1dViews 31Likes 1
maxwell@1slimewell

@natolambert Enterprise?

1dViews 99
Natalia Salcedo@NataliaSalcedoF

@BrendanFoody It's actually doing senior-level work now ahah

1dViews 81

@natolambert For which tasks exactly ?

1dViews 67
Gregor@bygregorr

@BrendanFoody hit this with pennywise last week supabase logs plus a linear ticket and claude started losing which error mapped to which after 3 exchanges. does the multi-source coherence actually hold past 4-5 context switches in your testing?

1dViews 62
Matthew Brooker@mbrookerhk

@BrendanFoody Hey Brendan, dropped you a quick DM as wasn't too sure how to best contact you! Many thanks

1dViews 31
Cristiano ❁@BeastSlay3r16

@natolambert Lol do you really believe that, the vast number of enterprises do not require frontier models for a large part of their work.

1dViews 31
Dante@thedntx

@natolambert "worth it" is doing a lot of heavy lifting here

youre bullish on enterprise adoption or just the token price?

1dViews 27
china232332@gigantictur

@natolambert https://x.com/mercor_ai/status/2064399136007589994?s=20 Is it just trained for tool cool better , but the way it definitley reviews code is very AGI/neuralese pilled

1dViews 25
A War@AWar1586398

@scaling01 Up until RSL happens. The first one there will then have an insurmountable lead.

1dViews 17
Ted Spare@TedSpare

@tomgreenwald @natolambert Scoring high on benchmarks

1dViews 5