1d ago

PufferAI founder Joseph Suarez demonstrates RL training reaching 20 million steps per second on a single GPU

Most demo environments complete training within seconds to minutes

4694138.7K

——0——

Original post

#488@YACINEMTBOP

#1251Joseph Suarez 🐡@JSUAREZ

@epichrisis Play with the demos. Training up to 20M steps/second on a single GPU. Most envs training in seconds to minutes, including our client envs. Turns out mazes and 2048 without exploiting domain knowledge are just harder than many real world problems

6:22 AM · May 28, 2026

Reposted by

#488@YACINEMTB

#1251Joseph Suarez 🐡@JSUAREZ

But very small model RL was actually running at <0.001% efficiency in many high profile papers. I've seen double digit steps/second. 100s-1000s is quite common. PufferLib is >10M.

Joseph Suarez 🐡@jsuarez

To all those very clever people pointing out that nothing should be running <10% efficiency: "even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C"

10:54 PM · May 28, 2026 · 16.7K Views

10:54 PM · May 28, 2026 · 5.2K Views

PufferAI founder Joseph Suarez demonstrates RL training reaching 20 million steps per second on a single GPU

Cluster engagement

Sentiment