18h ago

New Schedule-Free Spectral Optimizer SF-NorMuon Beats Tuned AdamW on Language Models

0
Original post

🚨New paper: Anytime Training with Schedule-Free Spectral Optimization🚨 We introduce SF-NorMuon, a schedule-free spectral method that outperforms or matches heavily tuned AdamW across 125M and 772M parameter language models.

8:31 AM · May 25, 2026 View on X