/Tech13h ago

Dwarkesh Patel argues AI progress is bottlenecked by extreme sample inefficiency compared to human learning

Story Overview

Dwarkesh Patel highlights how intelligence can be viewed as the ability to master skills with minimal examples, yet frontier models still demand orders of magnitude more tokens than the roughly 200 million a human encounters by adulthood, with gains so far driven mainly by broader data and heavier compute rather than genuine efficiency leaps.

6378849607107.4K

#60

Original post

Dwarkesh Patel@dwarkesh_sp#60inTech

Narration: the data efficiency black hole.

00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?

Also on pod and YouTube feed.

10:14 AM · Jun 19, 2026 · 54.5K Views

Open Question

Why scaling laws fall short here

Even optimistic projections show infinite parameters trimming data needs by only about 10x, leaving the thousands-to-millions-fold human gap largely untouched and forcing reliance on massive RL-generated synthetic data.

Developer Impact

Limits on automating the next leap

While inefficiency may not block routine white-collar automation, the same bottleneck could stall efforts to automate AI research itself, where new skills still require fresh torrents of expert trajectories.

Sentiment

Users praise discussions on AI data inefficiency versus humans because they agree optimizing sample use and scaling compute are crucial next steps for LLMs and RL.

Pos

66.7%

Neg

33.3%

17 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

The data black hole at the center of AI

DWARKESH.COMVia

#129

Posts from X

Most Activity

VIEWS22.8KBOOKMARKS75LIKES103REPLIES5

Kevin Patrick Murphy@sirbayes

Current LLMs are outrageously data inefficient (and hence also compute inefficient) - this will be the next frontier https://www.dwarkesh.com/p/the-sample-efficiency-black-hole-2

9h22.8K10375

RETWEETS26

Dwarkesh Patel@dwarkesh_sp

Narration: the data efficiency black hole.

00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?

Also on pod and YouTube feed.

13h54.5K385352

Brendan (can/do)@BrendanFoody

"AI needs 1,000,000 more data than us."

Appreciate the @mercor_ai shoutout 🚀

Dwarkesh Patel@dwarkesh_sp

Narration: the data efficiency black hole.

00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?

Also on pod and YouTube feed.

5h7.2K5528

Omar Khattab@lateinteraction

@dwarkesh_sp Indeed. Have people sent Machine Studying and its definitions of expertise and intelligence your way yet?

Jacob X. Li@jacobli99

Continual learning is widely discussed right now, but mostly as improving on the job or avoiding catastrophic forgetting. But it has a different, difficult, and already urgent form:

Given nothing but a corpus of documents, how should AI systems develop expertise in a new, unfamiliar domain? We call this problem Machine Studying.

7h5K4722

hallerite@hallerite

At some point, you would need to scale compute so much more that not squeezing out more signal from each rollout becomes untenable. That's the regime we might be entering with agents that use compaction, where one rollout can easily pass 1m tokens.

7h414162

hallerite@hallerite

GRPO then is just taking the compute scaling even further and not bothering to even re-use information from past rollouts by learning a value model for finer-grained credit assignment.

This has, surprisingly, worked quite well, but it may be coming to an end soon.

8h431132

Ilija Lichkovski@carnot_cyclist

Spot on. Learning a value model indeed re-uses past rollouts, but there's an even more direct way that's been absent in LLM RL: replay. Some recent work (https://arxiv.org/abs/2604.08706) showed it it can work well. Imo, figuring out what to store and how to sample from the buffer holds the key for many open questions

7h10851

Garrett Lord@GarrettLord

@dwarkesh_sp If rubrics are the path to agi we’re in trouble

13h7749

hallerite@hallerite

Incidentally, @dwarkesh_sp also just uploaded something about sample efficiency. I think having a mental model about AI that is heavily informed by information theory helps here.

https://www.youtube.com/watch?v=4pG3SJQPAwk

7h23351

hallerite@hallerite

As a matter of fact, using SFT to bootstrap the base model instead of doing RL directly on top of the base model is already using a more sample efficient method that is heavily biased to cut down on the variance.

7h1809

will brown@willccbb

@hallerite my first real RL nerdsnipe was adversarial bandits

5h8831

hallerite@hallerite

So while we can indeed scale compute and probably mostly continue doing so for a bit, it's definitely worth it to already do research on how we can get more signal out of our rollouts.

7h1786

hallerite@hallerite

In any case, it seems to me like now is a good time to think about sample efficiency again and for that it probably makes sense to develop a first principles understanding on what makes a method more or less sample efficient.

7h1706

hallerite@hallerite

Further, as models improve on judging or assigning credit and other interesting auxiliary methods, one big research problem will probably be about finding a good allocation for one's compute budget. GRPO then is just one extreme, but likely not optimal allocation strategy.

7h1496

hallerite@hallerite

When you take a class on RL, you usually learn that different algorithms have variable sample efficiency. Naturally, you can overcome sample inefficiency by scaling compute. We have been doing this for quite some time - PPO itself being a great example.

8h15.5K185125

Vaish Shrivastava@VaishShrivas

@hallerite sample efficiency was exactly one of our motivations for echo. even if a rollout “fails” there is a lot to learn from its environment interactions. and “failure” is relative too — we can change the definition of any task and convert failure to success.

6h351

Garrett Lord@GarrettLord

@abhijaymrana @dwarkesh_sp Do you think a 55 year old accountant likes writing the 223rd criterion?

12h351

will brown@willccbb

@hallerite when i took a class on RL i learned Value Iteration, TD, Q-Learning, and Bandits

5h311

Abhijay Rana@abhijaymrana

@GarrettLord @dwarkesh_sp why?

12h69

hallerite@hallerite

@carnot_cyclist yeah this is also very interesting, for sure :)

7h733