PufferAI founder Joseph Suarez demonstrates RL training reaching 20 million steps per second on a single GPU
Most demo environments complete training within seconds to minutes
——0——
But very small model RL was actually running at <0.001% efficiency in many high profile papers. I've seen double digit steps/second. 100s-1000s is quite common. PufferLib is >10M.
To all those very clever people pointing out that nothing should be running <10% efficiency: "even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C"
10:54 PM · May 28, 2026 · 16.7K Views
10:54 PM · May 28, 2026 · 5.2K Views