/Tech3h ago

OSCAR, an open-source 2B video world model, generalizes across robot embodiments using 2D skeleton conditioning

Story Overview

OSCAR delivers a compact 2B-parameter video world model fine-tuned from Cosmos-Predict2.5 that uses 2D skeleton conditioning to simulate robot actions across different bodies and even human hands, letting researchers preview policy outcomes without running every test on hardware.

223455030664.2K
Original post
Jun Gao@JunGao33210520

1/5 🚀 Thrilled to open-source OSCAR 🤖 — an action-conditioned world model for robotics, led by the visiting student in my group @wuzy2115! It generalizes across different robot embodiments with precise action controllability. All trained on a single GH200 GPU, and outperforms existing open-sourced baselines, which have larger model capacity and need more compute.

Everything is public, including training data. 📄 Paper: https://arxiv.org/abs/2606.04463 🌐 Project: https://wuzy2115.github.io/oscar-project-page/ 💻 Code: https://github.com/wuzy2115/oscar-public 🤗 Robot data: https://huggingface.co/datasets/zywu2115/OSCAR_robot 🤗 Human data: https://huggingface.co/datasets/zywu2115/OSCAR_human 🤗 Weights: https://huggingface.co/zywu2115/OSCAR-2B

#Robotics #WorldModels #AI #OpenSource

1:56 PM · Jun 10, 2026 · 55K Views
Developer Impact

All weights, data, and code dropped together

The team released the full model on Hugging Face, paired robotics and egocentric-human datasets, and inference code on GitHub so anyone can try the same single-GPU training recipe or run rollouts immediately.

Open Question

Benchmarks show close real-to-sim alignment

Reported policy-evaluation numbers line up between model-generated videos and actual robot trials, though longer-term adoption outside the paper's setups remains to be seen.

Sentiment

Positive users praise the open-sourced OSCAR robotics world model's cross-embodiment results, reproducibility, and single-GPU training for making advanced work more accessible, while some criticize it for skipping rigorous comparisons.

Pos
83.3%
Neg
16.7%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.5KBOOKMARKS5LIKES11
Jun Gao@JunGao33210520

2/5 📊 Data.

Data is the key to success🔑 We built a large-scale, standardized data pipeline that curates, filters, and dedups broad public robotics and egocentric human datasets. This gives us a clean training dataset with diverse tasks, scenarios, actions, and embodiments.

16hViews 1.5KLikes 11Bookmarks 5
RETWEETS5
Eric Jang@ericjang11

impressive WM evaluation alignment! real ones know that this is the most impressive result

Jun Gao@JunGao33210520

1/5 🚀 Thrilled to open-source OSCAR 🤖 — an action-conditioned world model for robotics, led by the visiting student in my group @wuzy2115! It generalizes across different robot embodiments with precise action controllability. All trained on a single GH200 GPU, and outperforms existing open-sourced baselines, which have larger model capacity and need more compute.

Everything is public, including training data. 📄 Paper: https://arxiv.org/abs/2606.04463 🌐 Project: https://wuzy2115.github.io/oscar-project-page/ 💻 Code: https://github.com/wuzy2115/oscar-public 🤗 Robot data: https://huggingface.co/datasets/zywu2115/OSCAR_robot 🤗 Human data: https://huggingface.co/datasets/zywu2115/OSCAR_human 🤗 Weights: https://huggingface.co/zywu2115/OSCAR-2B

#Robotics #WorldModels #AI #OpenSource

15hViews 10.1KLikes 47Bookmarks 37
REPLIES1
Chris Paxton@chris_j_paxton

I think one of the cool things about world models is just how much more efficient they seem to be in both data and compute.

Jun Gao@JunGao33210520

1/5 🚀 Thrilled to open-source OSCAR 🤖 — an action-conditioned world model for robotics, led by the visiting student in my group @wuzy2115! It generalizes across different robot embodiments with precise action controllability. All trained on a single GH200 GPU, and outperforms existing open-sourced baselines, which have larger model capacity and need more compute.

Everything is public, including training data. 📄 Paper: https://arxiv.org/abs/2606.04463 🌐 Project: https://wuzy2115.github.io/oscar-project-page/ 💻 Code: https://github.com/wuzy2115/oscar-public 🤗 Robot data: https://huggingface.co/datasets/zywu2115/OSCAR_robot 🤗 Human data: https://huggingface.co/datasets/zywu2115/OSCAR_human 🤗 Weights: https://huggingface.co/zywu2115/OSCAR-2B

#Robotics #WorldModels #AI #OpenSource

47mViews 662Likes 10Bookmarks 1
Jun Gao@JunGao33210520

3/5 🦾 Method.

We leverage a 2D kinematic skeleton rendering as a unified control signal. From robot arms to human hands ✋, every embodiment is rendered as an image-aligned 2D skeleton. This enables high-precision action conditioning for video models. 🎯

16hViews 976Likes 6Bookmarks 1
Jun Gao@JunGao33210520

4/5 📈 Results.

With our cleaned data and kinematic skeleton conditioning, we finetune the Cosmos2.5-2B model on a single GH200 GPU ⚡. Our method delivers superior quality on action following, appearance, and motion consistency, pushing SOTA performance with a fraction of the compute. 🏆

16hViews 894Likes 4Bookmarks 1
Jun Gao@JunGao33210520

@Haotianxue_GT @wuzy2115 No new modules is one part of the efficiency. The most useful things are: (i) curating a balanced and diverse dataset, (ii) skeleton conditioning as a unified action representation, and (iii) the base model is only a 2B model (Cosmos 2.5)

13hViews 390Likes 4Bookmarks 1
Jun Gao@JunGao33210520

5/5 🧪 Robot policy evaluation.

Finally, we deploy OSCAR for real-world robot policy evaluation using the episodes from RoboArena. Our extensive experiments show a high correlation ✅ between virtual assessment in our world model and real-world results, paving the way for a future where robot policies are evaluated entirely within generated environments!!

16hViews 704Likes 4
Jun Gao@JunGao33210520

@ericjang11 Thank you for the nice words, Eric! Yes, we believe the most immediate application of world models in robotics is to evaluate different policies at scale, accelerating development speed before cumbersome (and easy to break) real-world deployment.

14hViews 537Likes 2
Haotian Xue@Haotianxue_GT

@JunGao33210520 @wuzy2115 Impressive, it seems much more efficient. Is that because it introduces no new modules?

14hViews 458Likes 1
Eric Jang@ericjang11

@JunGao33210520 would you mind uploading the real world policy eval rollouts to huggingface datasets as well?

13hViews 199
Albert Zhai@alb_zhai

@JunGao33210520 @wuzy2115 the skeleton is an interesting representation, i wonder if there are any ambiguity problems when the three red arrows are colinear (their plane intersects camera center)?

6hViews 87
Ferbin@Ferbin08

@ericjang11 what's the evaluation actually testing? i've watched too many impressive lab benchmarks fail when you hit real-world edge cases. curious if this one avoids that.

13hViews 70
Virgil Maro@_virgil19

@JunGao33210520 @wuzy2115 a world model is the agent rehearsing what happens if i do x before committing. video prediction is just what that rehearsal looks like from outside

7hViews 3Likes 1
Praveen Koka@praveenkoka

@ericjang11 'real ones know' is tech's most efficient shortcut: declare something the best without all that tedious comparing

26mViews 2
Ferbin@Ferbin08

@ericjang11 licensing is the real wall. you can run evals on anything. but sharing it? ip/privacy/liability questions pop up and labs can't afford the legal review. seen this kill so many promising datasets.

3hViews 1
EB1A Experts@eb1aexperts

@JunGao33210520 @wuzy2115 The combination of cross-embodiment generalization and full reproducibility stands out here. It's encouraging to see strong robotics results paired with open data, open weights, and a compute budget that remains accessible to the broader research community.

8hViews 1
Blissy@BlissyOnX

@chris_j_paxton the compute part is the real standout to me, single GH200 doing the lifting

44m