There will be an extreme irony if these models really are bound by human generated training data. RL doesn't generalize and is only useful in a handful of areas. And we all loose our skills to something that'll forever be a B+ player.
a16z's Martín Casado argues AI models are structurally capped by the limits of human training data
Story Overview
a16z's Martín Casado flags a potential ceiling for today's AI systems: they stay tethered to the finite pool of human-generated training data, with reinforcement learning offering only narrow wins rather than broad leaps. The result, he suggests, could leave models stuck as perpetual B+ performers even as people lose ground on the skills those models were meant to augment.
Reinforcement learning hits domain walls fast
Casado notes that RL techniques rarely transfer beyond a handful of tightly scoped tasks, leaving the field without a clear path to wider generalization from current methods.
Replies float verifiers and oversight as next bets
Thread participants highlight scalable oversight work and AI-built verifiers as possible ways forward, though no fresh benchmarks confirm whether these close the gap Casado describes.
Positive users praise analogies framing B+ AI as a country of geniuses thanks to rapid iteration and skill gains, while negative users insult participants in the debate over training data limits.
No Digg Deeper questions have been answered for this story yet.
Most Activity
Also, those who focus on using AI to help improve the skills of themselves and their teams will be diamonds in great demand, since they will be the rare A++ players in a sea of mediocrity.
There will be an extreme irony if these models really are bound by human generated training data. RL doesn't generalize and is only useful in a handful of areas. And we all loose our skills to something that'll forever be a B+ player.
I think AI is bound not by what humans can generate as training data but by what humans can write verifiers for. And that ceiling is one that we're still very far away from reaching. Once we have AI that can write FFmpeg in JavaScript I'll rethink this.
There will be an extreme irony if these models really are bound by human generated training data. RL doesn't generalize and is only useful in a handful of areas. And we all loose our skills to something that'll forever be a B+ player.
@martin_casado Time to read scalable oversight papers; they’re fun!
There will be an extreme irony if these models really are bound by human generated training data. RL doesn't generalize and is only useful in a handful of areas. And we all loose our skills to something that'll forever be a B+ player.

@martin_casado i doubt it given my own subjective experience and original research it's done... but even if true, B+ player orders of magnitude cheaper and orders of magnitude faster is worth it

@martin_casado Having someone at the head of the distribution managing an infinitely scalable team of B+ players can still do quite a bit.
If one pushes the convex hull of "in distribution" out and then have AIs fill in behind you there is still quite a lot of abundance to be had.

@martin_casado Yes!! And in that world, the unit of competition shifts from the A+ human to the A+ team...
One elite human may still beat AI on one axis: IQ, EQ, agency, taste, physical-world judgment. But to outperform overall, humans need to combine their spikes.

@martin_casado Hey, I'll take a C+ player that can give me results in seconds over an A+ player that takes days. Iteration is more powerful than sheer accuracy, because it gives you more time to think and interact with what you're making.

@jeremyphoward underrated use case is setting up an agent to act as a tutor. have it prompt you to do tasks on something you need to learn, call out your mistakes, nudge you etc. works very well

@martin_casado B+ to the smartest human in every field is still basically “country of geniuses in a data center”

@martin_casado synthetic data is already past the human ceiling on math

@martin_casado idk man, i shipped an entire feature last week where the model found a better architecture than what i had in mind. RL might not generalize but in practice it doesn't need to, it just needs to be better than me at the specific thing i'm building. and it already is

@benrayfield @martin_casado Is there a proof that Turing completeness is enough to generate human-level and human-like AGI?

@martin_casado on the bright side, as every job turns into a "communicate-your-intentions-to-AI" problem - everyone will get really good at communicating with each other (that many people - especially engineers) suck at today :)

@log_npierce Yeah, clearly code RL’s well.

@RealAstropulse Oh yeah for sure. Just don't loose your skill!

@martin_casado i always think about things where test and failure is cheap (eg coding) vs expensive (eg enterprise sales). I wonder if that will drive where RL-trained models become great in the near future.
Also once we do get to that point (I think in 1-2 years) we'll think of verifiers for tasks which are much harder, so I don't think we'll even need a new paradigm then.
I think AI is bound not by what humans can generate as training data but by what humans can write verifiers for. And that ceiling is one that we're still very far away from reaching. Once we have AI that can write FFmpeg in JavaScript I'll rethink this.

@martin_casado Isn't that Yann's whole argument

@martin_casado Dude we already know this isn't true

@empathyx100 @RealAstropulse Dude, don't hate on Astropulse. He does some of the best work on retro pixel stuff on the planet.