2d ago

Apple researchers release scaling laws for mixture pretraining

0

Apple researchers published the paper Scaling Laws for Mixture Pretraining Under Data Constraints on arXiv. It derives scaling laws that tie allowable repetition rates of small datasets in pretraining mixtures to model scale, total tokens, and compute budget. Experiments tested models from 101M to 539M parameters on French and Swahili corpora under data constraints of 50M, 100M, and 500M tokens, with training runs reaching up to 10B tokens.

Original post

https://arxiv.org/abs/2605.12715 How many repetitions could be allowed for a small dataset in pretraining mixtures? Naturally it would be a function of model scale and data size (and compute budget). But it could be larger than expected. https://arxiv.org/abs/2603.16177

1:02 AM · May 14, 2026 View on X

Very relevant research for low resource language. Controlled high repetition repetition rate with large parameters range might be enough to unlock viable synthetic pipelines.

RosinalityRosinality@rosinality

https://arxiv.org/abs/2605.12715 How many repetitions could be allowed for a small dataset in pretraining mixtures? Naturally it would be a function of model scale and data size (and compute budget). But it could be larger than expected. https://arxiv.org/abs/2603.16177

8:02 AM · May 14, 2026 · 28.9K Views
8:51 AM · May 14, 2026 · 7.9K Views

@Dorialexander This is a huge deal, thanks for the link.

Alexander DoriaAlexander Doria@Dorialexander

Very relevant research for low resource language. Controlled high repetition repetition rate with large parameters range might be enough to unlock viable synthetic pipelines.

8:51 AM · May 14, 2026 · 7.9K Views
5:25 AM · May 15, 2026 · 108 Views
Apple researchers release scaling laws for mixture pretraining · Digg