2h ago

Apple ML Study Offers Recipe For Data-Constrained LLM Pre-Training

——0——
Original post

Pre-training is increasingly data-constrained: compute outruns text, models repeat tokens many times, and how much repetition you can afford is an open question. In "Mix, Don't Tune" šŸŽ¶ (my @Apple MLR internship), we run ~1000 pre-training runs from 150M to 1.43B params with full HP grids at every scale, to figure out what actually drives performance when target-language data is scarce, and land on a concrete recipe for the data-constrained regime. (1/3) šŸ“ƒ: https://arxiv.org/abs/2605.13225

8:03 AM Ā· May 20, 2026 View on X
Apple ML Study Offers Recipe For Data-Constrained LLM Pre-Training Ā· Digg