/Tech21d ago

Oxford's Toby Ord disputes Anthropic's claim that all AI capabilities scale uniformly, arguing non-verifiable tasks plateau lower

Story Overview

Toby Ord is poking at a key assumption buried in Anthropic's recent write-up on recursive self-improvement, where the company treats every tracked AI skill—from hard benchmarks to fuzzier judgments like code quality—as riding the same unbroken upward curve. Ord counters that reinforcement learning with verifiable rewards drives clear gains on checkable tasks but shows weaker carry-over elsewhere, suggesting those non-verifiable areas may level off sooner.

--0--

#1474

Original post

Toby Ord@tobyordoxford#1474inTech

I'd have said that the verifiable tasks have been improving impressively (via RLVR) with some improvement transferring to non-verifiable tasks, but notably less. And I'd have guessed the latter are following a different curve (with a lower plateau).

Toby Ord@tobyordoxford

Anthropic's recent post 'When AI Builds Itself' included the following claim that I thought was crucial, yet unsupported. Are all measurable capabilities really improving on the same curve? What is the best evidence for this?

4:15 AM · Jun 9, 2026 · 937 Views

Open Question

Verifiable work pulls ahead while the rest lags

Ord points out that tasks with clear right answers benefit most from current RLVR techniques, while open-ended or subjective ones receive only partial lift and could hit a lower ceiling. This split matters because many real-world uses of AI, from novel research to ambiguous decision-making, fall into the harder-to-measure bucket.

FYI

Fresh dispute with little data yet

The exchange is only hours old, and neither side has posted detailed counter-benchmarks or charts. Without more evidence on how non-verifiable performance actually bends, the debate stays at the level of competing expectations rather than settled measurements.

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS465LIKES3

Seán Ó hÉigeartaigh@S_OhEigeartaigh

@tobyordoxford Yeah, that also struck me. Thanks for highlighting it, would be great to get more insight.

Toby Ord@tobyordoxford

21d46530