AMUSE optimizer merges Muon and schedule-free gradient evaluation to train models without learning rate decay · Digg