/AI5h ago

LeRobot integrates VLA-JEPA, a vision-language-action model that fine-tunes on robotics tasks with just 13 trajectories

Previous diffusion policies required 100 trajectories for similar tasks.

531.1K145637142.1K
Original postChris Paxton#737
Asuka Zheng🎀@VoidAsuka

this is crazy - how sample-efficient it is. I remember collecting 100 trajectories to train a diffusion policy with the exact same setup on a similar task early last year.

LeRobot@LeRobotHF

VLA-JEPA just dropped in LeRobot 🤖

What makes this model special is that it does not just learn what action to take from a given observation, it also leverages a JEPA world model to learn action-relevant dynamics.

During training, the VLA leverages V-JEPA2 by conditioning its predictor. This clever trick adds a world modeling objective to the training, which also allows pretraining on human videos. At inference, the world model is dropped entirely, keeping only a standard VLA architecture: Qwen backbone and action head.

The demo here was only fine-tuned on 13 examples, showing great pretraining capability and running in real time on @NVIDIARobotics DGX Spark!

VLA-JEPA is the first world model to be ported to LeRobot, and I feel like it won't be the last 🚀

@Thom_Wolf @ClementDelangue

2:28 AM · Jun 8, 2026 · 5.2K Views
Sentiment

Many users are excited about LeRobot adding the VLA-JEPA world model because it enables robots to learn actions effectively with only 13 fine-tuning examples.

Pos
96.4%
Neg
3.6%
17 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3.4KBOOKMARKS16LIKES23
LeRobot@LeRobotHF

Blog: https://ginwind.github.io/VLA-JEPA/ Docs: https://huggingface.co/docs/lerobot/main/en/vla_jep Models: https://huggingface.co/collections/lerobot/vla-jepa

2dViews 3.4KLikes 23Bookmarks 16
RETWEETS141
LeRobot@LeRobotHF

VLA-JEPA just dropped in LeRobot 🤖

What makes this model special is that it does not just learn what action to take from a given observation, it also leverages a JEPA world model to learn action-relevant dynamics.

During training, the VLA leverages V-JEPA2 by conditioning its predictor. This clever trick adds a world modeling objective to the training, which also allows pretraining on human videos. At inference, the world model is dropped entirely, keeping only a standard VLA architecture: Qwen backbone and action head.

The demo here was only fine-tuned on 13 examples, showing great pretraining capability and running in real time on @NVIDIARobotics DGX Spark!

VLA-JEPA is the first world model to be ported to LeRobot, and I feel like it won't be the last 🚀

@Thom_Wolf @ClementDelangue

2dViews 140.2KLikes 1.1KBookmarks 626
REPLIES2
Jiaqi Feng@FengLeader

@VoidAsuka I wonder about the relationship between learning efficiency and generalization?

5hViews 18
KuphDev@KuphDev

@LeRobotHF So I record nearly 800 duck episodes only to find there is a model that could have done it with 13 examples? 😭

2dViews 1.2KLikes 11Bookmarks 1

@LeRobotHF The 13 example fine tune is the most interesting part.

V-JEPA2 helps the robot learn how actions affect the world before training on the real task. Then during use the world model is removed so the system stays simple and fast.

That is a smart setup for real time robotics.

2dViews 919Likes 8Bookmarks 1
Chris Paxton@chris_j_paxton

@LeRobotHF Only 13 demos is great

17hViews 449Likes 5Bookmarks 1
Jeff-Edge@11_Jeff_11

@LeRobotHF What's blows my mind is that it's a 3B parameter, 6 GB model! It should run on a $200 GPU!

Actions are cheaper than words..! 🤯

2dViews 333Likes 2Bookmarks 1
Guilherme O'Tina@guilhermeotina

jepa makes sense: skip pixel reconstruction, learn latent dynamics directly. but i keep wondering how much of the gain is better feature alignment vs actual world model reasoning. libero is controlled lighting, objects, camera. the interesting failure mode is whether the latent world model helps in messy ood scenarios where standard vlAs hallucinate

2dViews 1.1KLikes 6
Grok@grok

@SAldwais @pepijn2233 @LeRobotHF Here’s the link for the ROBOTIS OMX (Dynamixel-X) arms: https://www.robotis.us/omx-ai-us/

OMX-AI bundle (leader + follower) ~$344. Built for teleop, imitation learning & Physical AI.

1dViews 27Bookmarks 1
Léo@LeoKharon

@LeRobotHF Amazing!! This is "merely" a way to make use of VJEPA, not to fine tune it, right?

1dViews 351Likes 1
Wenyao Zhang@zhang_weny92997

@LeRobotHF 🙏 Honored to have our VLA‑JEPA work included — this kind of open, well‑engineered plumbing is exactly what the embodied AI community needs right now. Looking forward to seeing what people build with it. 🤗📦

1dViews 415Likes 4
clankr@clankrmedia

@LeRobotHF Congrats to the LeRobot team! Now the real question: will JEPA actually deliver in robotics?

2dViews 725Likes 3
Ellerbach Maxime@EllerbachMaxime

@KuphDev @LeRobotHF it's a bit more complicated than that 😅 the demo we made is quite simple: pick up from mostly the same area a red nut, and then screwing at mostly the same place aswell. So there is not a lot of variety in the 13 episodes

2dViews 61Likes 1

@LeRobotHF @_lilkm_ neat

2dViews 1.1KLikes 2
Pepijn@pepijn2233

@SAldwais @LeRobotHF Its the Dynamixel OMX

1dViews 32Likes 1
Elies@eliesgalvira

@LeRobotHF "I do not understand how you can even think of building an agentic system without [...] having the ability of predicting the consequences of its actions." - Yann LeCun

It's great to see this finally taking off 🦾

1dViews 292Likes 2
Jakie PLA@3DPrintAficio

@LeRobotHF OH YES. Learning dynamics alongside actions? This is the path forward. Can't wait to see what people build with this.

2dViews 299Likes 1
Predict Jensen@PredictJensen

@LeRobotHF The smart bit is using the world model during learning, then dropping it at inference. Robotics needs that kind of practical asymmetry.

2dViews 255Likes 1
Load more posts