14h ago

Researchers introduce On-Policy Mix algorithm for optimal data proportions in continual training

0

A team of researchers introduced On-Policy Mix (OP-MIX), an algorithm that determines optimal data proportions as datasets change during training. It targets data mixing in continual learning and applies across pretraining, midtraining, and instruction tuning stages without task-specific adjustments. The work appears in the paper “Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time,” with authors including Michael Y. Hu, Apurva Gandhi, Kyunghyun Cho, Tal Linzen, and Pratyusha Sharma. Tests show it Pareto-dominates baselines on the performance-efficiency frontier.

Original post

What is the right data mix, and how do we find it as the data keeps changing? This is a core, unsolved problem in continual learning. To tackle it, we built a data mixing algo that works everywhere — pretraining, midtraining, instruction tuning Introducing: On-Policy Mix 🧵1/6

8:14 AM · May 18, 2026 View on X
Reposted by

"Data mixing is fundamentally an online decision making problem—one that recurs throughout training and demands a single, unified solution across pre-training and continual learning."

Super excited about @michahu8's new algorithm, OP-Mix, which delivers exactly that! 🚀

Some fun findings: 🪄 Linearly interpolating between LoRAs trained on subsets of datasets gives you a cheap, accurate proxy for the loss surface of full data mixing & allows estimating the optimal data mix!

🏆 OP-Mix Pareto-dominates the performance vs. efficiency frontier across pretraining, midtraining, and continual instruction tuning!

⚡ SFT+OP-Mix matches the performance of on-policy self-distillation / SDFT with 95% less compute!

Michael HuMichael Hu@michahu8

What is the right data mix, and how do we find it as the data keeps changing? This is a core, unsolved problem in continual learning. To tackle it, we built a data mixing algo that works everywhere — pretraining, midtraining, instruction tuning Introducing: On-Policy Mix 🧵1/6

3:14 PM · May 18, 2026 · 24.5K Views
8:23 PM · May 18, 2026 · 7.9K Views