/Tech1d ago

Chris Paxton, Agility Robotics AI lead, outlines seven robotics data collection methods while engineer kache argues scaling relies on simulation

Story Overview

Frontier robotics models cannot rely on passive web scraping the way language models did. Every trajectory, torque reading, and tactile signal must be captured through actual hardware interaction, turning data collection into the central scaling constraint and forcing teams to weigh cost, fidelity, and embodiment gaps across multiple deliberate methods.

26429921040.3K
Original post
Chris Paxton@chris_j_paxton#787inTech

A good summary

7:01 AM · Jun 9, 2026 · 24K Views
Open Question

Blending heterogeneous sources still lacks proven recipes

Even after data arrives, stitching teleoperation logs, fleet runs, simulation rollouts, and video sources into one coherent training signal remains the harder unsolved step, with no public benchmarks yet showing how to balance quality against scale.

Developer Impact

Deployment scale alone won't unlock the next leap

Fleet data only becomes useful once robots are already operating at volume, creating a chicken-and-egg limit that pure simulation or video approaches have not yet cleared for contact-rich tasks.

Sentiment

Positive users endorse simulation and hybrid approaches for solving the robotics data bottleneck because they generate abundant efficient training data, while negative users dismiss simulation as low-fidelity.

Pos
70.8%
Neg
29.2%
13 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS13.7KBOOKMARKS56LIKES234RETWEETS4REPLIES15
kache@yacineMTB

wrong. it's simulators

1dViews 13.7KLikes 234Bookmarks 56
kache@yacineMTB

it's literally just simulators

kache@yacineMTB

wrong. it's simulators

1dViews 2.7KLikes 57Bookmarks 3
chidubem@ChidubemNdukwe

@yacineMTB Closing sim to real gap ?

1dViews 7
Humanoid Investing@HumanoidInvest

@chris_j_paxton Would serve robotics data collection apply here? I believe they are already selling data if i’m not mistaken

1dViews 47Likes 1
🎯🔫👌@gurgle_io

@yacineMTB I agree that it's just simulators, but at the same time, why don't humans, or bugs for that matter, need simulators? Are the weights already in the genes? They would have to be, wouldn't they. This would mean evolution is a sim.

1dViews 11Likes 2
Aaron@aaronnagy1987

@yacineMTB I fully endorse this view. The robot needs a "mind's eye" and that doesn't need to be terribly "high-token-resolution." A "simple abstraction" layer seems to be missing.

1dViews 17Likes 1
Kinvert@KinvertOG

@yacineMTB simulation creates the data at millions of SPS, it creates such a firehose of information smaller and smaller models can not only learn to do it, they can do it better because inference is 10X the speed of larger models.

1dViews 8Likes 1
R@SpikeRiser

@yacineMTB Rl directly to hardware, skip the sim https://arxiv.org/abs/2206.14176

1dViews 22
ControlWiz@Control_wiz

@yacineMTB *Accurate simulators.

1dViews 6Likes 1
Entropy Lapse@EntropyLapse

@yacineMTB @ChidubemNdukwe are you sober bro 😭?

1dViews 18
Daniel Terrero@DanielTerreroC

@yacineMTB @ChidubemNdukwe 100%, there are limited ways of moving through 3d space, which you can actually simulate

1dViews 5Likes 1
Prathm@trailformer

@yacineMTB Aren't they good enough? I mean look as IsaacSim

1dViews 4Likes 1
NTJ@Anteejay

@yacineMTB IRL is the ultimate simulator.

1dViews 4Likes 1
Bart Trzynadlowski@BartronPolygon

@yacineMTB Sim is always a low fidelity model of the world and therefore retarded by definition.

Large scale deployed systems mostly rely on real data collection flywheels.

1dViews 3Likes 1
HDP@HDPbilly

@Anteejay @yacineMTB hi

1dViews 3Likes 1
Ⓓⓐⓣⓐ@DataDeLaurier

@yacineMTB always has been

that's why Dr Jim Fan at vidya is gonna win

1dViews 9
kache@yacineMTB

@ChidubemNdukwe sim2real is trivial

1dViews 9
Erwin@PrimeErwin

@chris_j_paxton Always was wondering why there was such a little focus on hybrid approaches. I'm confident the best outcome is finding the right mix of the different types of data.

1dViews 9
chidubem@ChidubemNdukwe

@yacineMTB Hmm... what's your threshold for trivial? are you saying the gap is closed for specific task classes, or broadly?

1dViews 5
cemkaya@CEMKAYA2704

@yacineMTB There are people who think RL is just imitation learning with extra steps.

20hViews 4
Load more posts