13h ago

OPUS improves LLM pre-training efficiency by aligning iteration-by-iteration data selection with the optimizer's geometry

It reduces misalignment with optimizers like AdamW or Muon.

0
Original post

There is now a smarter way to pick data for training LLMs! Enter OPUS! This is an ICML Oral paper from SJTU, Alibaba, UW–Madison, UIUC, and Mila - Quebec AI Institute. The proposed method dynamically and intelligently selects the most impactful data for LLM pre-training in every single training iteration, bringing principled, continuous data optimization to the forefront. This approach aims to significantly boost training efficiency and yield higher-quality LLMs, outperforming conventional static data selection methods across diverse language tasks. OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper: https://arxiv.org/pdf/2602.05400 Our report: https://mp.weixin.qq.com/s/xzmjviMMwX20tcjwutNmxQ 📬 #PapersAccepted by Jiqizhixin

2:23 AM · May 24, 2026 View on X

very ambitious

机器之心 JIQIZHIXIN机器之心 JIQIZHIXIN@jiqizhixin

There is now a smarter way to pick data for training LLMs! Enter OPUS! This is an ICML Oral paper from SJTU, Alibaba, UW–Madison, UIUC, and Mila - Quebec AI Institute. The proposed method dynamically and intelligently selects the most impactful data for LLM pre-training in every single training iteration, bringing principled, continuous data optimization to the forefront. This approach aims to significantly boost training efficiency and yield higher-quality LLMs, outperforming conventional static data selection methods across diverse language tasks. OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper: https://arxiv.org/pdf/2602.05400 Our report: https://mp.weixin.qq.com/s/xzmjviMMwX20tcjwutNmxQ 📬 #PapersAccepted by Jiqizhixin

9:23 AM · May 24, 2026 · 44.6K Views
11:37 AM · May 24, 2026 · 34.9K Views
OPUS improves LLM pre-training efficiency by aligning iteration-by-iteration data selection with the optimizer's geometry · Digg