/AI9h ago

PufferLib Delivers 10000x Faster RL Simulations for Real-World Transfer

93422220432.1K
Original post
Joseph Suarez 🐡@jsuarez#1268inAI

A little perspective: RL as a field spent 10 years making algorithms slower and slower. If you look at the original ALE, it actually can sim a few thousand frames per second per core. If you look at some of the last big env releases before a ton of people moved over to LLMs, you'll find several at dozens to hundreds of steps per second with such bad engineering that they don't even scale with vectorization.

The field did this exactly because they presumed they would have to train directly in the real world. In reality, what we got out of this is a bunch of brittle off-pol and model-based algorithms that burn a ton of compute and don't work outside of the benchmarks shown in the original pubs. There's a clear gap between on-pol and other methods. You don't simply switch and scale up compute to save data. You have to spend a TON more compute to match the perf of on-pol, and then you spend even more compute to gain in sample efficiency.

Our whole core realization with PufferLib is that we can write good sims for a lot of problems 10000x faster. Good doesn't even mean accurate. It means accurate enough with domain randomization and other tricks that our agents can implicitly sysid their current setting and act robustly. So far, this has worked across several different industries. I'd love to give examples here, but this is unfortunately where exact client details get confidential. We need to be better about negotiating publicity, and we're starting to do that as Puffer gets bigger.

Another major flaw with slower and slower algorithms is that the core research loop also gets slower and slower. We sim mazes and 2048 at 10+m steps per second. Big deal right, those are easy. Wrong: algorithmic improvements on those envs have consistently predicted performance improvement on every single env in our test suite. Without this, we wouldn't have been able to release so many core breakthroughs in the last 2 years with a grand total of ~20 GPUs. We ran 20,000 experiments on ~12 of them in the 3 weeks leading up to Puffer 4 launch. At traditional speeds, it would have taken Google scale compute and an infra team.

So no, we're not going to step the real world at 20m sps, but assuming that matters (or at least that it is the only thing that matters) is where the field went wrong. /rant.

8:16 AM · Jun 7, 2026 · 26.5K Views
Sentiment

Positive users praised PufferLib's faster RL simulations and the team's work, while negative users dismissed robotics and world-model claims and highlighted its professional uses in finance, gaming, and defense.

Pos
50.0%
Neg
50.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS5.4KBOOKMARKS21LIKES38REPLIES1RETWEETS2
Spencer Cheng@spenccheng

Most new clients don’t think we can do sim to real on their complex problem. They buy hoping 30% of what we pitched is true.

The “oh shit, it actually worked” moment is a fun call we get to have. These calls turn into imagine if we could do X conversations.

A little perspective: RL as a field spent 10 years making algorithms slower and slower. If you look at the original ALE, it actually can sim a few thousand frames per second per core. If you look at some of the last big env releases before a ton of people moved over to LLMs, you'll find several at dozens to hundreds of steps per second with such bad engineering that they don't even scale with vectorization.

The field did this exactly because they presumed they would have to train directly in the real world. In reality, what we got out of this is a bunch of brittle off-pol and model-based algorithms that burn a ton of compute and don't work outside of the benchmarks shown in the original pubs. There's a clear gap between on-pol and other methods. You don't simply switch and scale up compute to save data. You have to spend a TON more compute to match the perf of on-pol, and then you spend even more compute to gain in sample efficiency.

Our whole core realization with PufferLib is that we can write good sims for a lot of problems 10000x faster. Good doesn't even mean accurate. It means accurate enough with domain randomization and other tricks that our agents can implicitly sysid their current setting and act robustly. So far, this has worked across several different industries. I'd love to give examples here, but this is unfortunately where exact client details get confidential. We need to be better about negotiating publicity, and we're starting to do that as Puffer gets bigger.

Another major flaw with slower and slower algorithms is that the core research loop also gets slower and slower. We sim mazes and 2048 at 10+m steps per second. Big deal right, those are easy. Wrong: algorithmic improvements on those envs have consistently predicted performance improvement on every single env in our test suite. Without this, we wouldn't have been able to release so many core breakthroughs in the last 2 years with a grand total of ~20 GPUs. We ran 20,000 experiments on ~12 of them in the 3 weeks leading up to Puffer 4 launch. At traditional speeds, it would have taken Google scale compute and an infra team.

So no, we're not going to step the real world at 20m sps, but assuming that matters (or at least that it is the only thing that matters) is where the field went wrong. /rant.

8hViews 5.4KLikes 38Bookmarks 21
An Eevee@rw_eevee

Appreciate your perspective and yes, this is a largely accurate history of the field. We’ve been all-in on fast sim + DR + RNN-style networks for the past ~2 years at least. It works great and we’ve gotten some spectacular results.

But sadly some problems have refused to yield to sim as easily as locomotion. That’s why every company suddenly declared world models the new hype. It’s supposed to be a totally general AI-based sim. You can at least put it on thousands of GPUs to get 20 million sps if you want.

Will it work? idk maybe. VLAs kinda remain a joke. Hand-crafted sims aren’t accurate or diverse enough yet. Offline RL has never worked once in history. But you have to put your eggs in some basket 🤷‍♂️

9hViews 147Likes 3

We're not primarily robotics. PufferLib has been used professionally in finance, commerce, gaming, defense, and animation. Has worked great. The world model stuff straight up doesn't work. Some surrounding lit is borderline scientific fraud. I believe it works better where you have a good amount of static data to train it like in robotics. But in our case, we simply won't need to scale up because we can hit the throughputs of the largest RL projects from OAI/DM on a single node, and it seems like the models can be much smaller than previously expected

8hViews 69Likes 2
Spencer Cheng@spenccheng

@EmanueleUngaro_ We just write our own env physics for specific problems with DR and a couple env tricks. Results transfer fairly well.

@yacineMTB is playing with Puffer and Mujoco Warp and getting some fun results too.

5hViews 36Likes 2
Spencer Cheng@spenccheng

@EmanueleUngaro_ @yacineMTB @finlay_sanders and Sam wrote the drone env as an example and has sim 2 real working there.

5hViews 30Likes 2

@rw_eevee If you have your own fast rnn pipeline, try Muon and retune if you're still on Adam. Was a major boost. Our other things are harder to integrate. PufferNet rocks but needs kernels

8hViews 27Likes 2
An Eevee@rw_eevee

@jsuarez Love it. You’re doing great work man 🫡

8hViews 23Likes 2
Emanuele@EmanueleUngaro_

@spenccheng what do you guys use for physics? that pairs nicely with pufferlib. Mujoco?

7hViews 55
Sciumo@SciumoInc

@jsuarez “In reality, what we got out of this is a bunch of brittle off-pol and model-based algorithms that burn a ton of compute and don't work”

A bit verbose for a tshirt, but still worthy

8hViews 28
Emanuele@EmanueleUngaro_

@spenccheng @yacineMTB @finlay_sanders so in this field people just write their own simulation engine from scratch everytime? because you can make it super barebone and more efficient?

5hViews 19
gfodor.id@gfodor

@jsuarez @rw_eevee as a total RL outsider, but someone who just started dabbling with pufferlib - what's the best technique for leveraging the existing VR content ecosystem for this? i have been working on OpenXR for a while now and can't help but think there's a huge overhang from this.

8hViews 14