Narration: the data efficiency black hole.
00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?
Also on pod and YouTube feed.
Dwarkesh Patel highlights how intelligence can be viewed as the ability to master skills with minimal examples, yet frontier models still demand orders of magnitude more tokens than the roughly 200 million a human encounters by adulthood, with gains so far driven mainly by broader data and heavier compute rather than genuine efficiency leaps.
Narration: the data efficiency black hole.
00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?
Also on pod and YouTube feed.
Even optimistic projections show infinite parameters trimming data needs by only about 10x, leaving the thousands-to-millions-fold human gap largely untouched and forcing reliance on massive RL-generated synthetic data.
While inefficiency may not block routine white-collar automation, the same bottleneck could stall efforts to automate AI research itself, where new skills still require fresh torrents of expert trajectories.
Users praise discussions on AI data inefficiency versus humans because they agree optimizing sample use and scaling compute are crucial next steps for LLMs and RL.
No Digg Deeper questions have been answered for this story yet.
Current LLMs are outrageously data inefficient (and hence also compute inefficient) - this will be the next frontier https://www.dwarkesh.com/p/the-sample-efficiency-black-hole-2
Narration: the data efficiency black hole.
00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?
Also on pod and YouTube feed.
"AI needs 1,000,000 more data than us."
Appreciate the @mercor_ai shoutout 🚀
Narration: the data efficiency black hole.
00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?
Also on pod and YouTube feed.
@dwarkesh_sp Indeed. Have people sent Machine Studying and its definitions of expertise and intelligence your way yet?
Continual learning is widely discussed right now, but mostly as improving on the job or avoiding catastrophic forgetting. But it has a different, difficult, and already urgent form:
Given nothing but a corpus of documents, how should AI systems develop expertise in a new, unfamiliar domain? We call this problem Machine Studying.

At some point, you would need to scale compute so much more that not squeezing out more signal from each rollout becomes untenable. That's the regime we might be entering with agents that use compaction, where one rollout can easily pass 1m tokens.

GRPO then is just taking the compute scaling even further and not bothering to even re-use information from past rollouts by learning a value model for finer-grained credit assignment.
This has, surprisingly, worked quite well, but it may be coming to an end soon.

Spot on. Learning a value model indeed re-uses past rollouts, but there's an even more direct way that's been absent in LLM RL: replay. Some recent work (https://arxiv.org/abs/2604.08706) showed it it can work well. Imo, figuring out what to store and how to sample from the buffer holds the key for many open questions

@dwarkesh_sp If rubrics are the path to agi we’re in trouble

Incidentally, @dwarkesh_sp also just uploaded something about sample efficiency. I think having a mental model about AI that is heavily informed by information theory helps here.
https://www.youtube.com/watch?v=4pG3SJQPAwk

As a matter of fact, using SFT to bootstrap the base model instead of doing RL directly on top of the base model is already using a more sample efficient method that is heavily biased to cut down on the variance.

@hallerite my first real RL nerdsnipe was adversarial bandits

So while we can indeed scale compute and probably mostly continue doing so for a bit, it's definitely worth it to already do research on how we can get more signal out of our rollouts.

In any case, it seems to me like now is a good time to think about sample efficiency again and for that it probably makes sense to develop a first principles understanding on what makes a method more or less sample efficient.

Further, as models improve on judging or assigning credit and other interesting auxiliary methods, one big research problem will probably be about finding a good allocation for one's compute budget. GRPO then is just one extreme, but likely not optimal allocation strategy.
When you take a class on RL, you usually learn that different algorithms have variable sample efficiency. Naturally, you can overcome sample inefficiency by scaling compute. We have been doing this for quite some time - PPO itself being a great example.

@hallerite sample efficiency was exactly one of our motivations for echo. even if a rollout “fails” there is a lot to learn from its environment interactions. and “failure” is relative too — we can change the definition of any task and convert failure to success.

@abhijaymrana @dwarkesh_sp Do you think a 55 year old accountant likes writing the 223rd criterion?

@hallerite when i took a class on RL i learned Value Iteration, TD, Q-Learning, and Bandits

@GarrettLord @dwarkesh_sp why?

@carnot_cyclist yeah this is also very interesting, for sure :)