/AI255d ago

Trask Disputes Sutskever Claim of Peak Data for AI Pre-Training

15598878685268.9K

#357

Original post

⿻ Andrew Trask#357

⿻ Andrew Trask@iamtrask#357inAI

IMO — Ilya is wrong

- Frontier LLMs are are trained on ~200 TBs of text - There's ~200 Zettabytes of data out there - That's about 1 billion times more data - It doubles every 2 years

The problem is the data is private. Can't scrape it.

The problem is not data scarcity, it's data access.

The solution is attribution-based control (article below)

"Unlocking a Million Times More Data For AI"

9:45 AM · Sep 24, 2025 · 268.9K Views

/AI255d ago

Trask Disputes Sutskever Claim of Peak Data for AI Pre-Training

15598878685268.9K

#357

Original post

⿻ Andrew Trask#357

⿻ Andrew Trask@iamtrask#357inAI

IMO — Ilya is wrong

- Frontier LLMs are are trained on ~200 TBs of text - There's ~200 Zettabytes of data out there - That's about 1 billion times more data - It doubles every 2 years

The problem is the data is private. Can't scrape it.

The problem is not data scarcity, it's data access.

The solution is attribution-based control (article below)

"Unlocking a Million Times More Data For AI"

9:45 AM · Sep 24, 2025 · 268.9K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

No ranked X posts are available for this story yet.