/Tech5h ago

Dwarkesh Patel, host of the Dwarkesh Podcast, argues AI computer use is bottlenecked by a lack of parallelizable simulators

He contrasts computer use with highly parallelizable coding environments.

523262316876.6K

#60

Original post

Dwarkesh Patel@dwarkesh_sp#60inTech

Here's a question I find confusing and interesting and which actually tells us a lot about the nature of current AI progress:

Why has progress on computer use been so slow? Computer use is so clearly verifiable.

I think the answer is that it is not enough for a domain to be verifiable.

It also has to be very grindable—in the sense that you can run lots of parallel rollouts against a deterministic and replayable simulator.

If you’re trying to make a model better at coding, you can create an environment that has a software repo with some missing feature that you’ve tasked the AIs with creating, and then you have a thousand parallel agents just go at the problem, each with their identical copy of the container.

But this doesn’t work with computer use—at least not trivially. You can’t have a thousand agents go try the same checkout flow on Amazon. Because Andy Jassy will find and detect your bots and shut your ass down.

How would we train an AI to build a business? How would you make an AI that’s really good at winning court cases? Or having a profitable day trading in the markets? Or helping a candidate win an election?

What is the RL environment to make an AI as good at politics as Lyndon Johnson, or as good at building a space launch business as Elon Musk?

The rollout requires interacting with the world and cannot be recreated simply within the datacenter. And the outer loop verification may take months or years of real world actions to elicit, and cannot be re-observed by perturbing the model’s actions thousands of times in parallel so that you can isolate what exactly the model did that actually worked.

Dwarkesh Patel@dwarkesh_sp

What does the next training paradigm look like?

0:00:00 – The big research bet the labs are making 0:02:12 – Grindability is just as important as verifiability 0:06:10 – Will RLVR alone generalize? 0:08:41 – Getting the learning back to the weights 0:15:22 – Dreaming 0:17:23 – What 2027 looks like

Also on YouTube, pod feed, and Substack.

5:54 PM · Jun 26, 2026 · 84.2K Views

Sentiment

Positive users praise recent GPT 5.5 Computer Use improvements and explanations for slow real-world progress, while negative users call current models painfully slow and bad enough to prompt selling AI investments.

Pos

66.7%

Neg

33.3%

6 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

ar0cket1@ar0cket1

@dwarkesh_sp llms aren't built to do computer use natively nearly as well as code writing, I don't think the amazon thing is really a big bottleneck but just an LLM skill issue.

3h7691

BOOKMARKS1RETWEETS1REPLIES2

Rohan Arun@RohanArun

I co-founded the first startup approved by OpenAI to sell GPT-3 for automation in August 2021, we invented many of the early primitives everyone uses, and have deep insight on this problem :) Basically models hallucinate at 90%, and popular services update frequently, so models themselves are fundamentally flawed to solve edge cases reliably. However the 80/20 is mostly deterministic use-cases.

We open sourced a solution to this 3 days before Poetic raised 50mm for a similar solution. Happy to chat :)

https://github.com/rohanarun/computer-use-cache

3h18951

LIKES14

sensho@sensho

@dwarkesh_sp 5.5 / 5.6 computer use is pretty good on codex but your broader point for unverifiable domains just sounds like a compute problem? if we had 10x or 100x the compute we have today then the rollouts on more long horizon simulators would be trivial

4h63214

vibebuilder@vibebuild

@dwarkesh_sp Progress on Computer Use was slow a year ago, have you tried GPT 5.5 Computer Use?

It's really really good. Not perfect, but very good.

Main bottleneck is vision and mouse control. Writes too much code to operate a computer (we don't write code to operate).

5h3448

callum@SharrockCallum

@dwarkesh_sp does this not pass the bottleneck back to sample efficiency (or realistic sims)? since real world feedback is sparse and, in the case of eg building a company, costly if wrong?

5h1521

Collin Lysford@CollinLysford

@dwarkesh_sp Yeah, this is why I can be really bullish on AI for software and math but bearish on world domination: without the clarifying power of death simulation only gets you so far. https://desystemize.substack.com/p/if-youre-so-smart-why-cant-you-die

4h511

Ming Dynasty@MFU007

@dwarkesh_sp Through "digital twins" perhaps, with the known challenge of making them realistic enough?

5h4871

Brandon McKinzie@mckbrando

this is bait right

Dwarkesh Patel@dwarkesh_sp

Here's a question I find confusing and interesting and which actually tells us a lot about the nature of current AI progress:

Why has progress on computer use been so slow? Computer use is so clearly verifiable.

I think the answer is that it is not enough for a domain to be verifiable.

It also has to be very grindable—in the sense that you can run lots of parallel rollouts against a deterministic and replayable simulator.

What is the RL environment to make an AI as good at politics as Lyndon Johnson, or as good at building a space launch business as Elon Musk?

2h28340

alth0u🧶@alth0u

@dwarkesh_sp its bc of this

3h2493

James Gish@jamesgish_

@dwarkesh_sp Google Deepmind is partnering with EVE Online to train within its sandbox universe. Its economy is a decent first-order approximation to the real economy, so there may be good isomorphisms between the real world and that one...

4h2112

Tim Tyler@tim_tyler

@dwarkesh_sp Encoding a video of your screen takes a lot of tokens. It's an expensive thing to do. For most applications there are other approaches.

3h722

Kavin Karthik@KavinIK

@dwarkesh_sp Have you tried Codex?

4h602

Abdella Ali@ngMachina

@dwarkesh_sp How much of the issue is just not having good native computer use APIs inside of popular OS's? I feel like the accessibility hacks we are seeing for computer use are.... Lossy?

5h499

Joseph Warren@nsdjoe

@dwarkesh_sp Can you clone Amazon and train on it vs the production site?

5h397

Stephan Hoyer@shoyer

Science is not grindable

Dwarkesh Patel@dwarkesh_sp

Here's a question I find confusing and interesting and which actually tells us a lot about the nature of current AI progress:

Why has progress on computer use been so slow? Computer use is so clearly verifiable.

I think the answer is that it is not enough for a domain to be verifiable.

It also has to be very grindable—in the sense that you can run lots of parallel rollouts against a deterministic and replayable simulator.

What is the RL environment to make an AI as good at politics as Lyndon Johnson, or as good at building a space launch business as Elon Musk?

34m10710

Zac Gottschall@gott_zac

@dwarkesh_sp Why can’t I just simulate the Amazon checkout flow …just vibe code it

2h223

James Moughan@jamougha

@dwarkesh_sp This is a really good explanation of one of the reasons why AI destroying or taking over the world is very unlikely.

5h217

noob trader@ishwors46091536

@dwarkesh_sp but why does it have to be amazon or a web application that can be rate limited or blocked? why not using adobe, using mspaint, using desktop application?

4h190

Phi Browser@phibrowser

@dwarkesh_sp From the inside it's worse than the bot-bans: the site itself isn't stationary. The checkout flow I ran yesterday is a different DOM today, so you can't even replay one clean trajectory, let alone a thousand parallel ones. The simulator keeps rewriting itself.

4h183

deep Manifold@BetaTomorrow

@dwarkesh_sp What if AI models are inherently 'ill-posed' because learning is fundamentally an inverse problem? https://en.wikipedia.org/wiki/Well-posed_problem

3h162