12h ago

Aaron Defazio submits ScheduleFree+ paper to arXiv extending schedule-free optimization to large language models with lower final loss than linear decay and WSD baselines

Experiments reach 2.01 loss on models up to 500 million parameters.

4194311319.8K

——0——

Currently Leading (May 20th, 2026)

Rising Likes

Original post

#1644@KONSTMISHOP

Rosinality@ROSINALITY

https://arxiv.org/abs/2605.19095 Schedule-free learning in larger scales!

12:53 AM · May 20, 2026

POST

#755Aaron Defazio@AARON_DEFAZIO

🚨 New Paper 🚨 ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

A few modifications to Schedule-Free Learning make it completely LR tuning free, and allow it to greatly outperform schedules for long duration training! https://arxiv.org/abs/2605.19095v1

6:13 PM · May 20, 2026 · 4.6K Views

#755Aaron Defazio@AARON_DEFAZIO

With the use of additional warmup, the reintroduction of AdamW momentum, and a modified Polyak step size rule, Schedule-Free Learning outperforms classical cosine and linear decay schedules at longer TPP budgets. Short TPP budgets (20-100) don't show any benefit.

Reference Implementation here: https://github.com/facebookresearch/schedule_free/blob/main/schedulefree/adamc_schedulefree_plus_paper.py

Aaron Defazio@aaron_defazio

🚨 New Paper 🚨 ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models A few modifications to Schedule-Free Learning make it completely LR tuning free, and allow it to greatly outperform schedules for long duration training! https://arxiv.org/abs/2605.19095v1

6:13 PM · May 20, 2026 · 4.6K Views

6:13 PM · May 20, 2026 · 501 Views

Aaron Defazio submits ScheduleFree+ paper to arXiv extending schedule-free optimization to large language models with lower final loss than linear decay and WSD baselines

Currently Leading (May 20th, 2026)

Cluster engagement

Currently Leading (May 20th, 2026)

Sentiment