13h ago

Researcher Releases Open-Source WALL-WM World Model for Embodied AI

0
Original post

Introducing WALL-WM, our open-source World Model for embodied AI and the next piece of our open-source robotics stack. Carving World Action Modeling at the Event Joints Read the blog: https://x2robot.com/en/pages/wm Why it matters WALL-WM shifts robot world modeling from fixed-length action chunks to event-grounded video-action pretraining. It learns around events like reaching, contact, grasping, lifting, moving, and placing, so language, vision, and action align more naturally. Why you should care WALL-WM brings together: •Event-grounded VLA pretraining •Prior-aligned video-action architecture •Wan-based video tower + randomly initialized action DiT •Multi-view perception with sight-cone masking, tube patch masking, and Camera RoPE •Event Mode for variable-length execution •Unified Mode with Staircase Decoding •DMuon for large-scale training The goal: help robots learn what physically matters, not just what happens in the next fixed slice of time. Code (coming soon): https://github.com/X-Square-Robot/wall-x #opensource #EmbodiedAI

10:55 PM · May 28, 2026 View on X