1d ago

PufferAI founder Joseph Suarez demonstrates RL training reaching 20 million steps per second on a single GPU

Most demo environments complete training within seconds to minutes

0
Original post

@epichrisis Play with the demos. Training up to 20M steps/second on a single GPU. Most envs training in seconds to minutes, including our client envs. Turns out mazes and 2048 without exploiting domain knowledge are just harder than many real world problems

6:22 AM · May 28, 2026 View on X
Reposted by

But very small model RL was actually running at <0.001% efficiency in many high profile papers. I've seen double digit steps/second. 100s-1000s is quite common. PufferLib is >10M.

Joseph Suarez 🐡Joseph Suarez 🐡@jsuarez

To all those very clever people pointing out that nothing should be running <10% efficiency: "even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C"

10:54 PM · May 28, 2026 · 16.7K Views
10:54 PM · May 28, 2026 · 5.2K Views