Chris, 10k hours (approx 1 year) is roughly 0.001 % of internet scale data. Plus it’s not clear how to merge data from different embodiments.
I think this means you can collect ~10k hours just from open source datasets, which means basically anyone should be able to build a decent robot foundation model:
500 from the new BitRobot dataset 500 from galaxea https://huggingface.co/datasets/OpenGalaxea/Galaxea-Open-World-Dataset 3000 from agibot https://huggingface.co/datasets/agibot-world/AgiBotWorld2026 ~3000 from Open X embodiment (though it's mostly pretty bad data) https://robotics-transformer-x.github.io/ 830 from EgoDex https://github.com/apple/ml-egodex ~30 from humanoid-everyday (but good quality) https://huggingface.co/datasets/USC-PSI-Lab/humanoid-everyday 3500 from ABC https://huggingface.co/datasets/XDOF/ABC-130k











