Hot take: robots should not dream in pixels.
Pixels are too low-level. Latents are too opaque.
μ₀ predicts a third thing: 3D motion traces.
On real robots, it beats π₀.₅ — with ~1/100 the data scale and no action labels for world-model pretraining. 🧵
https://mu0-wm.github.io/
(this video features voiceover narration)
