/Tech1d ago

Experts Call for Robotics MMLU Benchmark to Test Generalist Models

1244310.3K

Original post

We need Robotics MMLU for generalist benchmarking, each of these bench reveal some properties and people can always model / inject data in a way to “win” the artificial waypoints, while simple disturbance will make policies not work

Chris Paxton@chris_j_paxton

The robolab leaderboard is interesting -- still fairly noisy (i.e. not the same as other leaderboards like RoboArena or MolmoSpaces). Suggests we're pretty far from a truly general-purpose robotics model, IMO. the data it's trained on is still a huge differentiator.

9:09 PM · Jun 4, 2026 · 6.7K Views

Sentiment

Users support calls for a robotics MMLU benchmark and standard low-cost humanoid platforms like Unitree G1 because they enable accessible home-built research tools via 3D printing.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3.6KBOOKMARKS1RETWEETS1

Jie Wang @ CVPR@JieWang_ZJUI

If eval is bound to real world, then we need useful hardware, humanoid has achieved huge success with G1 as standard platform

Where is that robot for manipulation?

23h3.6K31

LIKES8REPLIES3

kache@yacineMTB

@JieWang_ZJUI what I dream of: a simple low cost humanoid robot that you can build at home w/ a 3d printer or purchase that acts as a research platform

1d4038

kache@yacineMTB

@JieWang_ZJUI yes

1d5524

kache@yacineMTB

@JieWang_ZJUI if you reduce the size to the limit, you reduce the cost and kinetic energy and they last much longer. You can build one using the micro servos that they use for model aircraft

23h61

kache@yacineMTB

@JieWang_ZJUI because right now unitree academic version is effectively the MMLU

1d4711

Jie Wang @ CVPR@JieWang_ZJUI

@yacineMTB I think it’s hard because tech is not there for durable DIY humanoids, maybe 5 yrs later we can distill insights into cheap and reliable custom robots

23h31

Azad@AbhishekAzad77

@yacineMTB @JieWang_ZJUI

1d7

Jie Wang @ CVPR@JieWang_ZJUI

@yacineMTB Model aircraft took years of development, as aircraft industry is mature enough to produce know how about what works what’s not. The solution is definitely feasible but I will say for certain skill, humanoid gets long way to go

(I can’t image a cheap robot doing sim2real well

23h3