2h ago

Apple ML Study Offers Recipe For Data-Constrained LLM Pre-Training

1244211.5K

——0——

Original post

Pre-training is increasingly data-constrained: compute outruns text, models repeat tokens many times, and how much repetition you can afford is an open question. In "Mix, Don't Tune" 🎶 (my @Apple MLR internship), we run ~1000 pre-training runs from 150M to 1.43B params with full HP grids at every scale, to figure out what actually drives performance when target-language data is scarce, and land on a concrete recipe for the data-constrained regime. (1/3) 📃: https://arxiv.org/abs/2605.13225

8:03 AM · May 20, 2026

Apple ML Study Offers Recipe For Data-Constrained LLM Pre-Training

Cluster engagement

Sentiment