/Tech1d ago

Claude Fable 5 hits a record 65.5% on the APEX-SWE benchmark, outperforming Claude Opus 4.8 by 18 percentage points

Nathan Lambert says the performance jump justifies enterprise token costs

37858339797.2K

#70

Original post

Nathan Lambert@natolambert#70inTech

A crazy jump. The price of the tokens will be worth it to a vast number of enterprises.

Mercor@mercor_ai

Claude Fable 5 takes #1 on APEX-SWE: 65.5% Pass@1 overall. It scores ~18pp higher than Opus 4.8.

We tested @claudeai Fable 5 on APEX-SWE which measures whether AI models can do real software engineering work.

Fable 5 tops our two APEX-SWE categories: - Integration: 61.3% - Observability: 69.7%

The standout is Observability at 69.7%, 26pp ahead of Claude Opus 4.8. It is the first model to clear 50% on the category, and the only one that scores higher on Observability than on Integration. Every other model shows the reverse.

Observability has been the bottleneck for every model we have measured. Fable 5 is the first to break it.

Congrats to the @AnthropicAI team.

10:56 AM · Jun 9, 2026 · 16.4K Views

/Tech1d ago

Claude Fable 5 hits a record 65.5% on the APEX-SWE benchmark, outperforming Claude Opus 4.8 by 18 percentage points

Nathan Lambert says the performance jump justifies enterprise token costs

37858339797.2K

#70

Original post

Nathan Lambert@natolambert#70inTech

A crazy jump. The price of the tokens will be worth it to a vast number of enterprises.

Mercor@mercor_ai

Claude Fable 5 takes #1 on APEX-SWE: 65.5% Pass@1 overall. It scores ~18pp higher than Opus 4.8.

We tested @claudeai Fable 5 on APEX-SWE which measures whether AI models can do real software engineering work.

Fable 5 tops our two APEX-SWE categories: - Integration: 61.3% - Observability: 69.7%

Observability has been the bottleneck for every model we have measured. Fable 5 is the first to break it.

Congrats to the @AnthropicAI team.

10:56 AM · Jun 9, 2026 · 16.4K Views

Sentiment

Positive users praise Claude Fable 5's senior-level benchmark gains on APEX-SWE as enterprise-worthy, while negative users doubt most companies need frontier models or expect mythos-level advances soon.

Pos

55.6%

Neg

44.4%

5 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS78.8KBOOKMARKS47LIKES525RETWEETS12REPLIES22

Lisan al Gaib@scaling01

you're totally right open-source is going to catch up in 4 months

Mercor@mercor_ai

Claude Fable 5 takes #1 on APEX-SWE: 65.5% Pass@1 overall. It scores ~18pp higher than Opus 4.8.

We tested @claudeai Fable 5 on APEX-SWE which measures whether AI models can do real software engineering work.

Fable 5 tops our two APEX-SWE categories: - Integration: 61.3% - Observability: 69.7%

Observability has been the bottleneck for every model we have measured. Fable 5 is the first to break it.

Congrats to the @AnthropicAI team.

1d78.8K52547

Tom Greenwald@tomgreenwald

@natolambert Will it though? For what sort of tasks?

1d38

satik@SatikVFX_

@scaling01 They havent released anything in a while

1d150

Philow🇬🇬@Phi10w

@scaling01 Even next year we won’t have anything near mythos

1d145

Aasim Mahmood | ₿@K9Aasim

Mythos‑level smarts with training wheels—but the market is paying for a super‑powered future. 50 per million tokens is less than half the old preview price, yet the public version still dodges cyber, bio, and chem questions on your dime. Meanwhile, the 10.9 B in projected Q2 revenue and a first‑ever operating profit of $559 M shows investors are betting on the uncaged version, not the handcuffed one.

#Anthropic #ClaudeFable5 #AI #IPO

1d102

haro@harobuilds

@natolambert 20pp over opus 4.8 is not a marginal improvement. enterprises will pay whatever anthropic asks for that gap on real swe tasks

1d311

maxwell@1slimewell

@natolambert Enterprise?

1d99

Natalia Salcedo@NataliaSalcedoF

@BrendanFoody It's actually doing senior-level work now ahah

1d81

Bioinfhotep@pp0196

@natolambert For which tasks exactly ?

1d67

Gregor@bygregorr

@BrendanFoody hit this with pennywise last week supabase logs plus a linear ticket and claude started losing which error mapped to which after 3 exchanges. does the multi-source coherence actually hold past 4-5 context switches in your testing?

1d62

Matthew Brooker@mbrookerhk

@BrendanFoody Hey Brendan, dropped you a quick DM as wasn't too sure how to best contact you! Many thanks

1d31

Cristiano ❁@BeastSlay3r16

@natolambert Lol do you really believe that, the vast number of enterprises do not require frontier models for a large part of their work.

1d31

Dante@thedntx

@natolambert "worth it" is doing a lot of heavy lifting here

youre bullish on enterprise adoption or just the token price?

1d27

china232332@gigantictur

@natolambert https://x.com/mercor_ai/status/2064399136007589994?s=20 Is it just trained for tool cool better , but the way it definitley reviews code is very AGI/neuralese pilled

1d25

A War@AWar1586398

@scaling01 Up until RSL happens. The first one there will then have an insurmountable lead.

1d17

Ted Spare@TedSpare

@tomgreenwald @natolambert Scoring high on benchmarks

1d5