/Tech6h ago

Builder Runs MuJoCo RL Training Loop For Unitree Robot At 200k SPS In Browser

42574118034.6K
Original post
kache@yacineMTB#487inTech

I've gotten a mujoco sim RL training loop for a unitree robot at 200k SPS. I'm looking into the physics for friction, contact dynamics. My goal: can I reproduce & beat the mujoco playground RL baselines

This is running in my web browser with raylib. Its the baseline

6:54 AM · Jun 10, 2026 · 19.3K Views
Sentiment

Positive users praise the browser MuJoCo RL demo for Unitree robots and its clear analogies, while negative users criticize the robot's awkward movement and voice distrust of related AI tools like Fable.

Pos
62.5%
Neg
37.5%
24 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS10.6KBOOKMARKS29LIKES245RETWEETS3REPLIES14
kache@yacineMTB

I unfortunately can't use any anthropic models for this, because there is no way for me to know whether I'm silently getting nerfed & sandbagged. Not allowed to make the world a better place

Not enough that they're closed, open source devs need to be slowed down too

kache@yacineMTB

I've gotten a mujoco sim RL training loop for a unitree robot at 200k SPS. I'm looking into the physics for friction, contact dynamics. My goal: can I reproduce & beat the mujoco playground RL baselines

This is running in my web browser with raylib. Its the baseline

6hViews 10.6KLikes 245Bookmarks 29
kache@yacineMTB

My suspicion is that the contact dynamics, code, is not at all optimized for ultra fast RL. The only reason it is as slow as it is is because no one really tried, and I'm going to try

This is all going to be open source. It's a project. My goal is to accelerate robotics

kache@yacineMTB

I've gotten a mujoco sim RL training loop for a unitree robot at 200k SPS. I'm looking into the physics for friction, contact dynamics. My goal: can I reproduce & beat the mujoco playground RL baselines

This is running in my web browser with raylib. Its the baseline

6hViews 2.9KLikes 76Bookmarks 1
kache@yacineMTB

The problem with the "frontier" distinction is that it is laughably easy to be frontier - there are so many green fields not yet explored. It's like saying you're at the frontier in the middle of kansas. There's fertile ground everywhere.

kache@yacineMTB

I unfortunately can't use any anthropic models for this, because there is no way for me to know whether I'm silently getting nerfed & sandbagged. Not allowed to make the world a better place

Not enough that they're closed, open source devs need to be slowed down too

6hViews 1.8KLikes 32Bookmarks 0
kache@yacineMTB

@ChaseBrowe32432 silently giving people horse shit information is poisoning the well for everyone. why don't they make it clearly refuse?

3hViews 120Likes 14
kache@yacineMTB

Particularly, reinforcement learning for robotics, that anyone can do with a single GPU at home. It's going to be CUDA only (thanks jensen..). I am using 3x 4090s, but each training baseline will run on a single 4090.

0 python will be used

6hViews 801Likes 12
kache@yacineMTB

I'm going to base my work on top of pufferlib and follow along their development. They are scarily good at making environments for all kinds of tasks - I've done remarkable things with their RL loop. They're taking contracts if you think you have a problem they can help with

6hViews 349Likes 8
lycaon@lyc_aon

@yacineMTB this guy walkin dumb as hayl

5hViews 36Likes 3
e@yihyunCS

@yacineMTB @ChaseBrowe32432 active deception is the key issue here that people keep missing, and i have to wonder if they're missing the point on purpose

3hViews 32Likes 2
tooz@adarshsolanki

@yacineMTB bro lets move the families to shenzhen for a few years

4hViews 94Likes 1
kache@yacineMTB

@LouisLeLay4 Yes! I'm going to follow in homemadegarbage's foot steps and make home made garbage move. The goal is to give people a reproducible RL loop they can use

2hViews 69Likes 1
Louis Le Lay@LouisLeLay4

@yacineMTB Will you deploy on real robot afterwards?

2hViews 65Likes 1

@yacineMTB Wouldn't a better sim randomize some elements of the physics parameters so that your model learns to adapt to differences in physics so when it gets to the real world it just works?

3hViews 105

@yacineMTB Convex decomposition sucks. But probably not going to be an issue for basic humanoid stuff. Maybe could approximate some collisions with spheres and make it go faster than convex shapes would.

2hViews 21Likes 1
someguy AI@NOTfunnyparanR

Race one vs a “ghost time”.

“ghost” = moving average of the average time per section of the race track. Reward is distance ahead of ghost applied for the time it sustains lead. Negative if behind.

Robustify it… Add uneven surfaces and barriers or track and penalize for how much it strays from center.

2hViews 38
Omar@kouhxp

@yacineMTB you go to a restaurant order a ribeye, but chef swaps it for the cheapest cut because it's good for you, but still pay for a ribeye, and the kitchen has tinted windows

5hViews 32
Chase Brower@ChaseBrowe32432

@yihyunCS @yacineMTB really? they could have easily done this without notifying anyone, and no would would have ever known. they mentioned that they're doing this in the spirit of honesty

3hViews 21
Santi@__selewaut__

@yacineMTB You're cooking

6hViews 5Likes 1
Rajesh Poddar@rajeshpod

@yacineMTB If you only simulate the foot/ground contact, the sparsity structure of the constraint hessian, i.e JtDJ, is the same as the sparsity structure of the JSIM. As far as I'm aware none of the official mujoco implementations exploit this fact.

3hViews 86Likes 1

@yacineMTB I can already train locomotion very very fast 200K sps is fast but it changes nothong. If you could 10x the speed, maybe thats relevant. But even then, this isnt where help is needed.

Where we need help is in high contact physics envs.

1hViews 78Likes 1
Load more posts