22h ago

New AMUSE Optimizer Combines Muon With Schedule-Free For Stable Training

0
Original post

🚨New Optimizer Paper AMUSE: Anytime MUon with Stable gradient Evaluation AMUSE combines Muon with Schedule-Free-style gradient evaluation for stable anytime training without LR decay. • Stronger 124M / 720M / 1B pretraining • Strong ImageNet / ViT fine-tuning performance.

9:20 PM · May 25, 2026 View on X