/AI23h ago

Systems engineer Yacine uses Pufferlib to train MuJoCo Cartpole at 18 million steps per second

Developer Joseph Suarez optimized the task to 0.14 seconds.

31193554.4K

Comments

#488

Original post

kache@yacineMTB#488inAI

I trained this with pufferlib. Pufferlib is absurdly fast RL training. Like, absurdly, absurdly fast. You're often only limited by your env speed.

There's something called mujoco warp. It's a joint project between NVidia & google (judging by the commits coming from both orgs)

kache@yacineMTB

I just trained cartpole in mujoco at 18 million steps per second. This policy learned in **less than 3 seconds**

rollout policy batch size was 8192 agents

6:15 PM · Jun 2, 2026 · 4.1K Views

/AI23h ago

Systems engineer Yacine uses Pufferlib to train MuJoCo Cartpole at 18 million steps per second

Developer Joseph Suarez optimized the task to 0.14 seconds.

--0--

Comments

#488

Original post

kache@yacineMTB#488inAI

I trained this with pufferlib. Pufferlib is absurdly fast RL training. Like, absurdly, absurdly fast. You're often only limited by your env speed.

There's something called mujoco warp. It's a joint project between NVidia & google (judging by the commits coming from both orgs)

kache@yacineMTB

I just trained cartpole in mujoco at 18 million steps per second. This policy learned in **less than 3 seconds**

rollout policy batch size was 8192 agents

6:15 PM · Jun 2, 2026 · 4.1K Views

Sentiment

Users are excited about Pufferlib training a Cartpole policy in MuJoCo at 18 million steps per second or in 0.14 seconds because they find the performance incredible and plan to explore the library themselves.

Pos

100.0%

Neg

0.0%

4 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS321BOOKMARKS1LIKES5

Joseph Suarez 🐡@jsuarez

@yacineMTB It's 0.14 seconds with the standard pufferlib env

kache@yacineMTB

I just trained cartpole in mujoco at 18 million steps per second. This policy learned in **less than 3 seconds**

rollout policy batch size was 8192 agents

5h32151