Cool + Prof. Chulhee was my defense's committee and he is really kind
🚨New Optimizer Paper AMUSE: Anytime MUon with Stable gradient Evaluation
AMUSE combines Muon with Schedule-Free-style gradient evaluation for stable anytime training without LR decay.
• Stronger 124M / 720M / 1B pretraining • Strong ImageNet / ViT fine-tuning performance.



