/AI7h ago

Chris Paxton, Agility Robotics AI lead, outlines seven robotics data collection methods while engineer kache argues scaling relies on simulation

Story Overview

Frontier robotics models cannot rely on passive web scraping the way language models did. Every trajectory, torque reading, and tactile signal must be captured through actual hardware interaction, turning data collection into the central scaling constraint and forcing teams to weigh cost, fidelity, and embodiment gaps across multiple deliberate methods.

2228488121.3K
Original post
Chris Paxton@chris_j_paxton#737inAI

A good summary

7:01 AM · Jun 9, 2026 · 8.9K Views
Open Question

Blending heterogeneous sources still lacks proven recipes

Even after data arrives, stitching teleoperation logs, fleet runs, simulation rollouts, and video sources into one coherent training signal remains the harder unsolved step, with no public benchmarks yet showing how to balance quality against scale.

Developer Impact

Deployment scale alone won't unlock the next leap

Fleet data only becomes useful once robots are already operating at volume, creating a chicken-and-egg limit that pure simulation or video approaches have not yet cleared for contact-rich tasks.

Sentiment

Some users endorse simulation and hybrid methods for the robotics data bottleneck because they generate abundant training data, while others argue hardware is the real problem and dismiss simulation-heavy approaches.

Pos
70.8%
Neg
29.2%
13 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS10.2KBOOKMARKS35LIKES178RETWEETS4REPLIES14
kache@yacineMTB

wrong. it's simulators

6hViews 10.2KLikes 178Bookmarks 35
kache@yacineMTB

it's literally just simulators

kache@yacineMTB

wrong. it's simulators

6hViews 2.5KLikes 54Bookmarks 2
chidubem@ChidubemNdukwe

@yacineMTB Closing sim to real gap ?

5hViews 7
Humanoid Investing@HumanoidInvest

@chris_j_paxton Would serve robotics data collection apply here? I believe they are already selling data if i’m not mistaken

7hViews 47Likes 1
🎯🔫👌@gurgle_io

@yacineMTB I agree that it's just simulators, but at the same time, why don't humans, or bugs for that matter, need simulators? Are the weights already in the genes? They would have to be, wouldn't they. This would mean evolution is a sim.

5hViews 11Likes 2
Aaron@aaronnagy1987

@yacineMTB I fully endorse this view. The robot needs a "mind's eye" and that doesn't need to be terribly "high-token-resolution." A "simple abstraction" layer seems to be missing.

5hViews 17Likes 1
Kinvert@KinvertOG

@yacineMTB simulation creates the data at millions of SPS, it creates such a firehose of information smaller and smaller models can not only learn to do it, they can do it better because inference is 10X the speed of larger models.

5hViews 8Likes 1
R@SpikeRiser

@yacineMTB Rl directly to hardware, skip the sim https://arxiv.org/abs/2206.14176

6hViews 22
ControlWiz@Control_wiz

@yacineMTB *Accurate simulators.

4hViews 6Likes 1
Entropy Lapse@EntropyLapse

@yacineMTB @ChidubemNdukwe are you sober bro 😭?

4hViews 18
Daniel Terrero@DanielTerreroC

@yacineMTB @ChidubemNdukwe 100%, there are limited ways of moving through 3d space, which you can actually simulate

3hViews 5Likes 1
Prathm@trailformer

@yacineMTB Aren't they good enough? I mean look as IsaacSim

4hViews 4Likes 1
NTJ@Anteejay

@yacineMTB IRL is the ultimate simulator.

6hViews 4Likes 1
Bart Trzynadlowski@BartronPolygon

@yacineMTB Sim is always a low fidelity model of the world and therefore retarded by definition.

Large scale deployed systems mostly rely on real data collection flywheels.

5hViews 3Likes 1
HDP@HDPbilly

@Anteejay @yacineMTB hi

6hViews 3Likes 1
Ⓓⓐⓣⓐ@DataDeLaurier

@yacineMTB always has been

that's why Dr Jim Fan at vidya is gonna win

3hViews 9
kache@yacineMTB

@ChidubemNdukwe sim2real is trivial

5hViews 9
Erwin@PrimeErwin

@chris_j_paxton Always was wondering why there was such a little focus on hybrid approaches. I'm confident the best outcome is finding the right mix of the different types of data.

7hViews 9
chidubem@ChidubemNdukwe

@yacineMTB Hmm... what's your threshold for trivial? are you saying the gap is closed for specific task classes, or broadly?

5hViews 5
Rawlala@Rawlala1

@yacineMTB simulator + architecture (+ training framework) the "we need more data" gang of robotics is so retarded they just look at the gpt story and slap it onto robotics without a hint of thoughts

3hViews 4
Load more posts