2d ago

Apple researchers release scaling laws for mixture pretraining

0400108

——0——

Apple researchers published the paper Scaling Laws for Mixture Pretraining Under Data Constraints on arXiv. It derives scaling laws that tie allowable repetition rates of small datasets in pretraining mixtures to model scale, total tokens, and compute budget. Experiments tested models from 101M to 539M parameters on French and Swahili corpora under data constraints of 50M, 100M, and 500M tokens, with training runs reaching up to 10B tokens.

Original post

#362@QUANQUANGU @ROSINALITY

Rosinality@ROSINALITY

https://arxiv.org/abs/2605.12715 How many repetitions could be allowed for a small dataset in pretraining mixtures? Naturally it would be a function of model scale and data size (and compute budget). But it could be larger than expected. https://arxiv.org/abs/2603.16177

1:02 AM · May 14, 2026

Cluster engagement

26 snapshots

QUOTE POST

#897Alexander Doria@DORIALEXANDER

Very relevant research for low resource language. Controlled high repetition repetition rate with large parameters range might be enough to unlock viable synthetic pipelines.

Rosinality@rosinality

8:02 AM · May 14, 2026 · 28.9K Views

8:51 AM · May 14, 2026 · 7.9K Views

#1612xlr8harder@XLR8HARDER

@Dorialexander This is a huge deal, thanks for the link.

Alexander Doria@Dorialexander

Very relevant research for low resource language. Controlled high repetition repetition rate with large parameters range might be enough to unlock viable synthetic pipelines.

8:51 AM · May 14, 2026 · 7.9K Views

5:25 AM · May 15, 2026 · 108 Views