/Tech13h ago

Rohan Anil Questions Learning Rate Decay Impact on Training Horizon

230004.2K
Original post
rohan anil@_arohan_#86inTech

@kellerjordan0 You get a different horizon with learning rate decay?

Keller Jordan@kellerjordan0

@_arohan_ This is an interesting result. I'm just saying compare against the real baseline please! There's no rerunning necessary.

2:09 AM · Jun 10, 2026 · 1.9K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS2.3KLIKES16REPLIES1
rohan anil@_arohan_

@kellerjordan0 I ran these with 3375 end steps. Because thats what I saw in from the logs I was using. I am okay with this.

Keller Jordan@kellerjordan0

@_arohan_ Sorry wdym? The main difference is a better-tuned learning rate compared to the old baseline.

Both your logfile above and the proper Muon baseline would qualify as 3325-step runs, since your logfile hits <3.28 at that time. Tho yours would need more stat sig, but that's trivial.

13hViews 2.3KLikes 16Bookmarks 0
Keller Jordan@kellerjordan0

@_arohan_ Ok, well your comparison is between an undertuned Muon and a fully-tuned Shampoo.

I guess you used result #6 logs instead of #12? Not a massive deal, but I'll make the fair comparison when I do a post

13hViews 144Likes 7
Keller Jordan@kellerjordan0

@_arohan_ Sorry wdym? The main difference is a better-tuned learning rate compared to the old baseline.

Both your logfile above and the proper Muon baseline would qualify as 3325-step runs, since your logfile hits <3.28 at that time. Tho yours would need more stat sig, but that's trivial.

13hViews 89