New Schedule-Free Spectral Optimizer SF-NorMuon Beats Tuned AdamW on Language Models
——0——
Sentiment
Pos100%
Neg0%
Users find the new SF-NorMuon Optimizer promising because it outperforms AdamW on 125M-772M language models and integrates with schedule-free methods to simplify tuning.