/AI3h ago

Samip of Q! unveils q0, a population-based system that scales multi-epoch pretraining to 960 epochs without performance saturation

The approach outperforms single-model and naive ensembling baselines.

--0--
Original posts
Reposts
Samip@industriaalist

1/ Now that we're running out of data, how do you optimally scale multi-epoch pretraining to hundreds of epochs?

Our first paper from Q! q0 trains a population of models, instead of single model that saturates fast, reaching a dramatically lower loss at *every* epoch budget.

w/ @bishmdl76 @akshayvegesna @ShmuelBerman

8:43 AM · Jun 4, 2026 · 10.9K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
RETWEETS16
Samip@industriaalist

1/ Now that we're running out of data, how do you optimally scale multi-epoch pretraining to hundreds of epochs?

Our first paper from Q! q0 trains a population of models, instead of single model that saturates fast, reaching a dramatically lower loss at *every* epoch budget.

w/ @bishmdl76 @akshayvegesna @ShmuelBerman

3hViews 10.9KLikes 115Bookmarks 91
Samip of Q! unveils q0, a population-based system that scales multi-epoch pretraining to 960 epochs without performance saturation · Digg