/AI255d ago

Trask Disputes Sutskever Claim of Peak Data for AI Pre-Training

15598878685268.9K
⿻ Andrew Trask@iamtrask#357inAI

IMO — Ilya is wrong

- Frontier LLMs are are trained on ~200 TBs of text - There's ~200 Zettabytes of data out there - That's about 1 billion times more data - It doubles every 2 years

The problem is the data is private. Can't scrape it.

The problem is not data scarcity, it's data access.

The solution is attribution-based control (article below)

"Unlocking a Million Times More Data For AI"

9:45 AM · Sep 24, 2025 · 268.9K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
No ranked X posts are available for this story yet.