/AI23h ago

Systems engineer Yacine uses Pufferlib to train MuJoCo Cartpole at 18 million steps per second

Developer Joseph Suarez optimized the task to 0.14 seconds.

--0--
Original post
kache@yacineMTB#488inAI

I trained this with pufferlib. Pufferlib is absurdly fast RL training. Like, absurdly, absurdly fast. You're often only limited by your env speed.

There's something called mujoco warp. It's a joint project between NVidia & google (judging by the commits coming from both orgs)

kache@yacineMTB

I just trained cartpole in mujoco at 18 million steps per second. This policy learned in **less than 3 seconds**

rollout policy batch size was 8192 agents

6:15 PM 路 Jun 2, 2026 路 4.1K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS321BOOKMARKS1LIKES5

@yacineMTB It's 0.14 seconds with the standard pufferlib env

kache@yacineMTB

I just trained cartpole in mujoco at 18 million steps per second. This policy learned in **less than 3 seconds**

rollout policy batch size was 8192 agents

5hViews 321Likes 5Bookmarks 1