Aaron Defazio submits ScheduleFree+ paper to arXiv extending schedule-free optimization to large language models with lower final loss than linear decay and WSD baselines · Digg