/AI7h ago

Dwarkesh Podcast host Dwarkesh Patel argues AI's million-fold human sample efficiency gap makes data volume the primary driver of gains

This allows open-source developers to quickly replicate frontier models

515882430076.6K
Original post
Dwarkesh Patel@dwarkesh_sp#70inAI

New blog post: on the million-x sample efficiency gap between AIs and humans, and whether it matters:

"The reason it is relatively easy for open source and previous laggards to catch up to within months of the frontier is that data is the real driver of progress.

And data can be easily distilled from public APIs, whereas hyper-parameters and training tricks and architectural micro-optimizations cannot - if the latter were driving most of progress, then catching up would be harder than we are observing it to be.

It is easy to forget how much data these models are trained on, and how much more it is than what we humans see in our lifetimes.

We see these AIs as a galaxy glittering with capabilities, but at their center, invisible to the naked eye, holding all the constellations together, is an unimaginably massive black hole of data."

Post in link below

11:10 AM · Jun 8, 2026 · 52K Views
Sentiment

Positive users praised articles on data's role in AI progress for insightful takes on scale and evolution, while negative users criticized links to underpaid labelers and flawed human brain analogies.

Pos
40.0%
Neg
60.0%
7 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS13.1KBOOKMARKS80LIKES134RETWEETS10REPLIES11
Nathan Lambert@natolambert

I feel like the obsession with continual learning / sample efficiency leads the field in the wrong direction. It's the bad career strategy of focusing on addressing your weaknesses instead of maximizing your strengths.

Yes, there is an existence proof in the human brain, but it doesn't by any means guarantee that that'll be the most interesting AI. It may require $100T of R&D on chips and AI methods to get that unlock.

On the other side of things, it's obvious that the coming models are extremely transformative and built on technologies that we already have. There's great reason to focus on just maximizing this. In reality, this is what the frontier labs are doing. They're going as fast as possible down the current development tree. This is good for progress and mixed for safety/geopolitics.

Things like "automate white color work" and "replace the AI researcher job" are the guesses of labs because it's super hard to imagine futures for what these dramatic technologies will be. Don't take the labs too seriously about this being the exact goal. The exact goal is to push the frontier and monetize later.

Solving continual learning, sample efficiency, etc would be great, but its trying to predict when a scientific breakthrough will come instead of trying to grapple with how the 100% sure thing coming technological revolution will change our lives.

This isn't to say the Dwarkesh post is bad, it addresses some reasonable critiques, but it is the least bitter lesson pilled thing to be obsessed with human intelligence and how that can inform AI.

We are in the AGI era of research. This is about embracing the unknown, scaling resources, and seeing what is enabled by making a series of magical tweaks to complex recipes that build frontier models. Lean into the alchemy.

(it should be pretty clear that I personally, investing in open research agree we need fundamental science -- just not agreeing that this is what the "cutting edge of the frontier" is governed by)

Dwarkesh Patel@dwarkesh_sp

New blog post: on the million-x sample efficiency gap between AIs and humans, and whether it matters:

"The reason it is relatively easy for open source and previous laggards to catch up to within months of the frontier is that data is the real driver of progress.

And data can be easily distilled from public APIs, whereas hyper-parameters and training tricks and architectural micro-optimizations cannot - if the latter were driving most of progress, then catching up would be harder than we are observing it to be.

It is easy to forget how much data these models are trained on, and how much more it is than what we humans see in our lifetimes.

We see these AIs as a galaxy glittering with capabilities, but at their center, invisible to the naked eye, holding all the constellations together, is an unimaginably massive black hole of data."

Post in link below

1hViews 13.1KLikes 134Bookmarks 80
Dwarkesh Patel@dwarkesh_sp

https://www.dwarkesh.com/p/the-sample-efficiency-black-hole

Dwarkesh Patel@dwarkesh_sp

New blog post: on the million-x sample efficiency gap between AIs and humans, and whether it matters:

"The reason it is relatively easy for open source and previous laggards to catch up to within months of the frontier is that data is the real driver of progress.

And data can be easily distilled from public APIs, whereas hyper-parameters and training tricks and architectural micro-optimizations cannot - if the latter were driving most of progress, then catching up would be harder than we are observing it to be.

It is easy to forget how much data these models are trained on, and how much more it is than what we humans see in our lifetimes.

We see these AIs as a galaxy glittering with capabilities, but at their center, invisible to the naked eye, holding all the constellations together, is an unimaginably massive black hole of data."

Post in link below

7hViews 9KLikes 59Bookmarks 49
Nathan Lambert@natolambert

the crux of my ick is the link to the human brain. Just saying the products aren't good enough is fine.

Nathan Lambert@natolambert

I feel like the obsession with continual learning / sample efficiency leads the field in the wrong direction. It's the bad career strategy of focusing on addressing your weaknesses instead of maximizing your strengths.

Yes, there is an existence proof in the human brain, but it doesn't by any means guarantee that that'll be the most interesting AI. It may require $100T of R&D on chips and AI methods to get that unlock.

On the other side of things, it's obvious that the coming models are extremely transformative and built on technologies that we already have. There's great reason to focus on just maximizing this. In reality, this is what the frontier labs are doing. They're going as fast as possible down the current development tree. This is good for progress and mixed for safety/geopolitics.

Things like "automate white color work" and "replace the AI researcher job" are the guesses of labs because it's super hard to imagine futures for what these dramatic technologies will be. Don't take the labs too seriously about this being the exact goal. The exact goal is to push the frontier and monetize later.

Solving continual learning, sample efficiency, etc would be great, but its trying to predict when a scientific breakthrough will come instead of trying to grapple with how the 100% sure thing coming technological revolution will change our lives.

This isn't to say the Dwarkesh post is bad, it addresses some reasonable critiques, but it is the least bitter lesson pilled thing to be obsessed with human intelligence and how that can inform AI.

We are in the AGI era of research. This is about embracing the unknown, scaling resources, and seeing what is enabled by making a series of magical tweaks to complex recipes that build frontier models. Lean into the alchemy.

(it should be pretty clear that I personally, investing in open research agree we need fundamental science -- just not agreeing that this is what the "cutting edge of the frontier" is governed by)

1hViews 1.2KLikes 11Bookmarks 1

Here is a thought: LLMs are super sample efficient at learning things in context. It's just that ICL is limited as of now. What we need are mechanisms to somehow absorb the information stored "in-context" into the weights. This is in a sense what fast weights do in GDNs to an extent.

Then the question becomes how do we train LLMs to learn to efficiently pack information into fast weights( and eventually slow weights ) over very long horizons. This is a problem that can be solved if trained on the right reward functions and tasks imo. Maybe just a matter of time.

In that case, we could still think of pre training as rolling up a base arch, similar to how evolution produced the human, brain with the difference that our pre training and post training still has not produced an arch that can do long range in-contex -> fast weights -> slow weights effectively.

Wrote about bit about it here: https://www.aravindjayendran.com/writing/few-shot-learners-cant-remember

5hViews 186Likes 2Bookmarks 3
Nathan Lambert@natolambert

@___Harald___ I agree -- but the labs are focused on the most useful technology quickly, else they financially implode

1hViews 281Likes 6Bookmarks 1
Asuka Zheng🎀@VoidAsuka

@dwarkesh_sp good writing, thank u! i like this take - Many billions of years of evolution is our pre-training, so it’s unfair to compare how little data we see simply within our lifetime to what these cold-started LLMs have to learn from.

3hViews 378Likes 6
Harald Schäfer@___Harald___

@natolambert I would like to see a lot more open research inspired by biological intelligence. Specifically on things like sample efficiency, sparse/delayed rewards, and digital evolution.

They may not lead to the most useful technology most quickly, but it can answer interesting questions.

1hViews 182Likes 1

@dwarkesh_sp Counterpoint: techniques aren’t a bottleneck because knowledge diffuses very easily due to distillation, publications, staff movements, and open sourcing. Consider that Llama 3 open sourcing brought everyone to frontier level overnight.

7hViews 434Likes 3
Artem Kukharenko@aikukharenko

@dwarkesh_sp the data black hole analogy is actually perfect everyone obsesses over architecture when it's really just about who can hoover up the most tokens

5hViews 325Likes 1

@natolambert Interesting how the frontier progress and product progress aren’t the same game. Labs can optimize for the next jump, founders have to optimize for usable systems in the present.

14mViews 8Likes 1

@dwarkesh_sp the black hole of data at the center holding all the capabilities together is the framing that makes the open source catch-up story make sense, if data is the real driver and data can be distilled from public APIs then the moat was never where people thought it was

5hViews 472
Cryptnate@Cryptnate

@dwarkesh_sp Nice! I'm excepting more on sample efficiency

7hViews 385
Bromo 🌟@bromo_bromius

@dwarkesh_sp "A black hole of data holding the constellations together" is the most honest metaphor for modern AI. Everything else is just marketing around the gravity well

6hViews 368
T Ay.@ayedtay

@natolambert Good point A bit like wanting planes to have articulate wings

1hViews 100Likes 1
Jake@jake_010202

@dwarkesh_sp excellent take. also the voice & the various typos ("breath" vs "breadth," "a word specialists" etc) are refreshingly human. thanks!

4hViews 92Likes 1
Eagletwotwenty@eagletwotwenty

@dwarkesh_sp Two thoughts:

For it to be a true AGI, wouldn't it have to be just as sample-efficient as human? In the end that's just a fancy term for learning.

Slightly philosophically: Is data really that cleanly separable from sample-efficiency? In a way, it's data/information all the way

6hViews 255
Gina Singer@TheQuietRebuild

@dwarkesh_sp Maybe it isn't that humans learn from less data, but that they promote more effectively. Most experience is discarded as noise. A small fraction becomes structurally binding, persistent constraints that shape everything learned afterward.

6hViews 231
Timothy Kassis@TimothyKassis

@dwarkesh_sp Good article. Thanks for the writeup.

5hViews 188
Frosty40@FrostForger

@natolambert I use ex-lax to maximize my strengths. extremely smooth and powerful

1hViews 168
Ankit Maloo@ankit2119

@dwarkesh_sp data and compute. its harder for open source models to train models at 5T 8T etc. and they are costlier when it comes to inference too

6hViews 103
Load more posts