8h ago

Students Explore Video-Only Data for Robotics Foundation Models

8450104.7K

——0——

Original post

I am so proud of my students — and maybe even more than proud, I am really touched by them. They have this real spirit of craftsmanship. They are patient — well, way more patient than me. They care about the tiny details. And they have this stubborn commitment to doing things right, even when nobody else may ever see how much work went into it. Robotics research often looks like nice demo videos in the end. But honestly, the real process is usually not that glamorous. It is messy. It is moving data around, collecting more data, evaluating again and again, fixing the robot when it suddenly breaks, fixing the code when it breaks in some mysterious way, and spending long days in the lab with real hardware that definitely does not care about our deadlines. My students chose to take on one of the hardest problems in this space: can we change the way robot learning data is collected? Instead of relying only on tele-operated, labeled data, we explore learning from video-only data. The dream is simple, but also very hard: can we clean up videos in the wild and turn them into the fuel for robotics foundation models, just like the Internet became the fuel for language models? Could there be a GPT-3 moment for robotics? I don’t know. When we see scaling starting to work at small scales, I feel hopeful. Maybe even optimistic. But would it still work when we go much bigger? I don’t know. Would some emergent behaviors show up? I don’t know either. But I keep thinking about this. Especially when I am doing boring, repetitive, and sometimes chaotic daily chores, and I get annoyed by how inefficient everything still is. Then I think: okay, I really need to work harder to get robotics working. We don’t know whether we will get there. And even if we do, we don’t know when. But there is this drive. This hope. This curiosity. We spent hours and hours in meetings discussing every detail of the pipeline. And that is not even counting all the time the students spent alone in the lab, or in front of their laptops, trying to make everything actually work. Sometimes one tiny detail becomes just one sentence in the paper. But behind that one sentence, there may be many failed attempts, long discussions, careful decisions, and a lot of quiet persistence. A reviewer may say the novelty is limited. But to me, the real breakthrough is: it actually worked zero-shot. And making that happen took a tremendous amount of grinding, careful engineering. This kind of work is not always easy to appreciate from the outside. It is not always flashy. But I believe this unseen work — the work of really making things work — can be transformative for the field.

7:05 AM · May 29, 2026

POST

#465Furong Huang@FURONGH

🤔 Question for robotics / physical AI folks: Can a model pretrained purely on video data only — with no action labels during pretraining — reach π0.5-level zero-shot robot performance?

What’s your bet, and why?

6:35 PM · May 29, 2026 · 1.6K Views

#465Furong Huang@FURONGH

Most people would bet for video generation models?

Furong Huang@furongh

🤔 Question for robotics / physical AI folks: Can a model pretrained purely on video data only — with no action labels during pretraining — reach π0.5-level zero-shot robot performance? What’s your bet, and why?

6:35 PM · May 29, 2026 · 1.6K Views

8:07 PM · May 29, 2026 · 651 Views

Students Explore Video-Only Data for Robotics Foundation Models

Sentiment

Cluster engagement