13h ago

AMUSE optimizer merges Muon and schedule-free gradient evaluation to train models without learning rate decay

Lucas Nestler released a code implementation called SFMuon.

4111106312.5K

——0——

Original post

#604Simo Ryu@CLONEOFSIMO

Cool + Prof. Chulhee was my defense's committee and he is really kind

12:30 AM · May 26, 2026

QUOTE POST

#1709Lucas Nestler@CLASHLUKE

class SFMuon(C.ScheduleFree): def __init__(self, params, **kw): d = dict(lr=.02, beta=.95, weight_decay=0, warmup_steps=0, weight_lr_power=2., r=0.) d.update(kw) super().__init__(params, d, fns=(C.nesterov_ema, C.orthogonalize_update, C.update_by_schedule_free))

Jueun Kim@jueunkim_0525

🚨New Optimizer Paper AMUSE: Anytime MUon with Stable gradient Evaluation AMUSE combines Muon with Schedule-Free-style gradient evaluation for stable anytime training without LR decay. • Stronger 124M / 720M / 1B pretraining • Strong ImageNet / ViT fine-tuning performance.

4:20 AM · May 26, 2026 · 28.3K Views

8:19 AM · May 26, 2026 · 7.8K Views

AMUSE optimizer merges Muon and schedule-free gradient evaluation to train models without learning rate decay

Cluster engagement

Sentiment