/Tech3h ago

OSCAR, an open-source 2B video world model, generalizes across robot embodiments using 2D skeleton conditioning

Story Overview

OSCAR delivers a compact 2B-parameter video world model fine-tuned from Cosmos-Predict2.5 that uses 2D skeleton conditioning to simulate robot actions across different bodies and even human hands, letting researchers preview policy outcomes without running every test on hardware.

223455030664.2K

#787

Original post

Jun Gao@JunGao33210520

1/5 🚀 Thrilled to open-source OSCAR 🤖 — an action-conditioned world model for robotics, led by the visiting student in my group @wuzy2115! It generalizes across different robot embodiments with precise action controllability. All trained on a single GH200 GPU, and outperforms existing open-sourced baselines, which have larger model capacity and need more compute.

Everything is public, including training data. 📄 Paper: https://arxiv.org/abs/2606.04463 🌐 Project: https://wuzy2115.github.io/oscar-project-page/ 💻 Code: https://github.com/wuzy2115/oscar-public 🤗 Robot data: https://huggingface.co/datasets/zywu2115/OSCAR_robot 🤗 Human data: https://huggingface.co/datasets/zywu2115/OSCAR_human 🤗 Weights: https://huggingface.co/zywu2115/OSCAR-2B

#Robotics #WorldModels #AI #OpenSource

1:56 PM · Jun 10, 2026 · 55K Views

/Tech3h ago

OSCAR, an open-source 2B video world model, generalizes across robot embodiments using 2D skeleton conditioning

Story Overview

223455030664.2K

#787

Original post

Jun Gao@JunGao33210520

#Robotics #WorldModels #AI #OpenSource

1:56 PM · Jun 10, 2026 · 55K Views

Developer Impact

All weights, data, and code dropped together

The team released the full model on Hugging Face, paired robotics and egocentric-human datasets, and inference code on GitHub so anyone can try the same single-GPU training recipe or run rollouts immediately.

Open Question

Benchmarks show close real-to-sim alignment

Reported policy-evaluation numbers line up between model-generated videos and actual robot trials, though longer-term adoption outside the paper's setups remains to be seen.

Sentiment

Positive users praise the open-sourced OSCAR robotics world model's cross-embodiment results, reproducibility, and single-GPU training for making advanced work more accessible, while some criticize it for skipping rigorous comparisons.

Pos

83.3%

Neg

16.7%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.5KBOOKMARKS5LIKES11

Jun Gao@JunGao33210520

2/5 📊 Data.

Data is the key to success🔑 We built a large-scale, standardized data pipeline that curates, filters, and dedups broad public robotics and egocentric human datasets. This gives us a clean training dataset with diverse tasks, scenarios, actions, and embodiments.

16h1.5K115

RETWEETS5

Eric Jang@ericjang11

impressive WM evaluation alignment! real ones know that this is the most impressive result

Jun Gao@JunGao33210520

#Robotics #WorldModels #AI #OpenSource

15h10.1K4737

REPLIES1

Chris Paxton@chris_j_paxton

I think one of the cool things about world models is just how much more efficient they seem to be in both data and compute.

Jun Gao@JunGao33210520

#Robotics #WorldModels #AI #OpenSource

47m662101

Jun Gao@JunGao33210520

3/5 🦾 Method.

We leverage a 2D kinematic skeleton rendering as a unified control signal. From robot arms to human hands ✋, every embodiment is rendered as an image-aligned 2D skeleton. This enables high-precision action conditioning for video models. 🎯

16h97661

Jun Gao@JunGao33210520

4/5 📈 Results.

With our cleaned data and kinematic skeleton conditioning, we finetune the Cosmos2.5-2B model on a single GH200 GPU ⚡. Our method delivers superior quality on action following, appearance, and motion consistency, pushing SOTA performance with a fraction of the compute. 🏆

16h89441

Jun Gao@JunGao33210520

@Haotianxue_GT @wuzy2115 No new modules is one part of the efficiency. The most useful things are: (i) curating a balanced and diverse dataset, (ii) skeleton conditioning as a unified action representation, and (iii) the base model is only a 2B model (Cosmos 2.5)

13h39041

Jun Gao@JunGao33210520

5/5 🧪 Robot policy evaluation.

Finally, we deploy OSCAR for real-world robot policy evaluation using the episodes from RoboArena. Our extensive experiments show a high correlation ✅ between virtual assessment in our world model and real-world results, paving the way for a future where robot policies are evaluated entirely within generated environments!!

16h7044

Jun Gao@JunGao33210520

@ericjang11 Thank you for the nice words, Eric! Yes, we believe the most immediate application of world models in robotics is to evaluate different policies at scale, accelerating development speed before cumbersome (and easy to break) real-world deployment.

14h5372

Haotian Xue@Haotianxue_GT

@JunGao33210520 @wuzy2115 Impressive, it seems much more efficient. Is that because it introduces no new modules?

14h4581

Eric Jang@ericjang11

@JunGao33210520 would you mind uploading the real world policy eval rollouts to huggingface datasets as well?

13h199

Albert Zhai@alb_zhai

@JunGao33210520 @wuzy2115 the skeleton is an interesting representation, i wonder if there are any ambiguity problems when the three red arrows are colinear (their plane intersects camera center)?

6h87

Loïck Chambon (PhD in CV) - 🇺🇦🇮🇷🇪🇺@LoickCh

@JunGao33210520 @wuzy2115 Have you tried to compare to the famous JEPA? I am looking for independent people that has tried it and said it works better than diffusion

6h72

Ferbin@Ferbin08

@ericjang11 what's the evaluation actually testing? i've watched too many impressive lab benchmarks fail when you hit real-world edge cases. curious if this one avoids that.

13h70

Virgil Maro@_virgil19

@JunGao33210520 @wuzy2115 a world model is the agent rehearsing what happens if i do x before committing. video prediction is just what that rehearsal looks like from outside

7h31

Praveen Koka@praveenkoka

@ericjang11 'real ones know' is tech's most efficient shortcut: declare something the best without all that tedious comparing

26m2

Ferbin@Ferbin08

@ericjang11 licensing is the real wall. you can run evals on anything. but sharing it? ip/privacy/liability questions pop up and labs can't afford the legal review. seen this kill so many promising datasets.

3h1

EB1A Experts@eb1aexperts

@JunGao33210520 @wuzy2115 The combination of cross-embodiment generalization and full reproducibility stands out here. It's encouraging to see strong robotics results paired with open data, open weights, and a compute budget that remains accessible to the broader research community.

8h1

Blissy@BlissyOnX

@chris_j_paxton the compute part is the real standout to me, single GH200 doing the lifting

44m