/AI14h ago

Cambridge PhD researcher Herbie Bradley argues LLMs will maintain jagged capabilities as they scale instead of achieving uniform intelligence

RLVR training widens the gap by prioritizing verifiable skills.

4601724

#1012

Original post

Herbie Bradley@herbiebradley#1012inAI

@bayeslord disagree on the non-verifiable thing. many people predicted 2 years ago that jagged capabilities would be less of a thing, but they're not. I expect jagged superintelligence

bayes@bayeslord

AI models, especially the frontier, will keep getting better. The only true wall is physics. Models are increasingly autonomous, smart, and are getting better all the time. Math and code are falling to scale+RL, everything else is up next. Verifiable vs. non-verifiable as a meaningful distinction will fade. Automated AI research and AI learning are going to look more and more related as we go forward. Training models well is closely related to models learning well in general. Sample efficiency, creativity, and all other limitations will be solved and then start approaching algorithmic optimality at whatever scale.

12:29 PM · Jun 4, 2026 · 30 Views

/AI14h ago

Cambridge PhD researcher Herbie Bradley argues LLMs will maintain jagged capabilities as they scale instead of achieving uniform intelligence

RLVR training widens the gap by prioritizing verifiable skills.

--0--

#1012

Original post

Herbie Bradley@herbiebradley#1012inAI

@bayeslord disagree on the non-verifiable thing. many people predicted 2 years ago that jagged capabilities would be less of a thing, but they're not. I expect jagged superintelligence

bayes@bayeslord

12:29 PM · Jun 4, 2026 · 30 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS630BOOKMARKS1LIKES2

Herbie Bradley@herbiebradley

I still think this frame doesn't deal with how AI's work today at all. AI's have jagged capabilities and are quite poor at judgement, largely because it's inherently hard to get data for. Therefore, we should expect superintelligence to also be jagged a priori (although of course with some weight on a smoother capabilities profile). This looks like being able to code any software, but being poor at articulating the long-term research direction which will produce the greatest GDP growth from automation.

I also put significant weight on the idea that at a high enough level of abstraction, AI research is similar to a product development exercise, and is significantly about human preferences for what shape of AI people will want or enjoy using. I think it's impractical to design a coherent boundary around AI R&D that excludes these qualitative questions and reduces it to a pure optimization exercise. I think quite a strange model of AI R&D is required to pose that eg Amanda Askell can be automated.

Eli Lifland@eli_lifland

@herbiebradley I'm imagining that AIs have better judgment than humans. I agree things will be bottlenecked on compute and potentially data. But I don't think humans will be adding any value once AIs are superintelligent. Maybe they won't hurt either, depending on how strongly they intervene.

13h63021

REPLIES1

Herbie Bradley@herbiebradley

how do you mean less interesting? it seems directly predictive of what gets automated, or how much productivity uplift different things get?

half the neolabs right now exist on the thesis that we'll need some breakthrough to solve some fundamental limitations, of which RLVR's weaknesses is one. meanwhile the main labs seem to be just pushing RLVR. so if we think jaggedness is an artefact of RLVR we shouldn't necessarily expect the frontier labs to suddenly produce models with the ability to write nobel winning novels (or pick any other hard to verify task)

bayes@bayeslord

To be clear with this I was largely saying that "the models are bad at writing because writing isn't verifiable" is going to be a less and less interesting point as we go forward

I agree ofc that the models are still uneven (it's not clear that evenness is a property we should expect from a randomly sampled model). Verifiable capabilities are making the absolute delta between worst skills and best skills increase. This could be an artifact of RLVR being one key way we're getting better capabilities right now (and just the general situation with how today's best models work), but I think in any case we will have continued evening out of capabilities up to the theoretical and utility limits, and that the rate of that evening will increase.

14h1810

Posts from X

Most Activity

VIEWS630BOOKMARKS1LIKES2

Herbie Bradley@herbiebradley

Eli Lifland@eli_lifland

13h63021

REPLIES1

Herbie Bradley@herbiebradley

how do you mean less interesting? it seems directly predictive of what gets automated, or how much productivity uplift different things get?

bayes@bayeslord

To be clear with this I was largely saying that "the models are bad at writing because writing isn't verifiable" is going to be a less and less interesting point as we go forward

14h1810