Apple researchers release scaling laws for mixture pretraining
Apple researchers published the paper Scaling Laws for Mixture Pretraining Under Data Constraints on arXiv. It derives scaling laws that tie allowable repetition rates of small datasets in pretraining mixtures to model scale, total tokens, and compute budget. Experiments tested models from 101M to 539M parameters on French and Swahili corpora under data constraints of 50M, 100M, and 500M tokens, with training runs reaching up to 10B tokens.
Very relevant research for low resource language. Controlled high repetition repetition rate with large parameters range might be enough to unlock viable synthetic pipelines.

https://arxiv.org/abs/2605.12715 How many repetitions could be allowed for a small dataset in pretraining mixtures? Naturally it would be a function of model scale and data size (and compute budget). But it could be larger than expected. https://arxiv.org/abs/2603.16177
@Dorialexander This is a huge deal, thanks for the link.
Very relevant research for low resource language. Controlled high repetition repetition rate with large parameters range might be enough to unlock viable synthetic pipelines.