/AI12h ago

VLA-JEPA World Model Launches in LeRobot for Efficient Robot Training

236298537253.5K
Original postThomas Wolf#17
LeRobot@LeRobotHF

VLA-JEPA just dropped in LeRobot 🤖

What makes this model special is that it does not just learn what action to take from a given observation, it also leverages a JEPA world model to learn action-relevant dynamics.

During training, the VLA leverages V-JEPA2 by conditioning its predictor. This clever trick adds a world modeling objective to the training, which also allows pretraining on human videos. At inference, the world model is dropped entirely, keeping only a standard VLA architecture: Qwen backbone and action head.

The demo here was only fine-tuned on 13 examples, showing great pretraining capability and running in real time on @NVIDIARobotics DGX Spark!

VLA-JEPA is the first world model to be ported to LeRobot, and I feel like it won't be the last 🚀

@Thom_Wolf @ClementDelangue

1:08 AM · Jun 6, 2026 · 52.2K Views
Sentiment

Users praise LeRobot's VLA-JEPA integration for efficient robot action learning via the world model in training only, though one regrets prior extensive episode recordings.

Pos
75.0%
Neg
25.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS442LIKES4
LeRobot@LeRobotHF

Blog: https://ginwind.github.io/VLA-JEPA/ Docs: https://huggingface.co/docs/lerobot/main/en/vla_jep Models: https://huggingface.co/collections/lerobot/vla-jepa

12hViews 442Likes 4

@LeRobotHF @_lilkm_ neat

11hViews 177Likes 1
Guilherme O'Tina@guilhermeotina

jepa makes sense: skip pixel reconstruction, learn latent dynamics directly. but i keep wondering how much of the gain is better feature alignment vs actual world model reasoning. libero is controlled lighting, objects, camera. the interesting failure mode is whether the latent world model helps in messy ood scenarios where standard vlAs hallucinate

11hViews 56Likes 1
KuphDev@KuphDev

@LeRobotHF So I record nearly 800 duck episodes only to find there is a model that could have done it with 13 examples? 😭

12hViews 135

@LeRobotHF The 13 example fine tune is the most interesting part.

V-JEPA2 helps the robot learn how actions affect the world before training on the real task. Then during use the world model is removed so the system stays simple and fast.

That is a smart setup for real time robotics.

11hViews 81
Predict Jensen@PredictJensen

@LeRobotHF The smart bit is using the world model during learning, then dropping it at inference. Robotics needs that kind of practical asymmetry.

11hViews 11