/AI2h ago

Rohan Anil Questions Learning Rate Decay Impact on Training Horizon

21200870
Original post
rohan anil@_arohan_#79inAI

@kellerjordan0 You get a different horizon with learning rate decay?

Keller Jordan@kellerjordan0

@_arohan_ This is an interesting result. I'm just saying compare against the real baseline please! There's no rerunning necessary.

2:09 AM · Jun 10, 2026 · 417 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS453REPLIES1
rohan anil@_arohan_

@kellerjordan0 I ran these with 3375 end steps. Because thats what I saw in from the logs I was using. I am okay with this.

Keller Jordan@kellerjordan0

@_arohan_ Sorry wdym? The main difference is a better-tuned learning rate compared to the old baseline.

Both your logfile above and the proper Muon baseline would qualify as 3325-step runs, since your logfile hits <3.28 at that time. Tho yours would need more stat sig, but that's trivial.

2hViews 453Likes 6Bookmarks 0
LIKES7
Keller Jordan@kellerjordan0

@_arohan_ Ok, well your comparison is between an undertuned Muon and a fully-tuned Shampoo.

I guess you used result #6 logs instead of #12? Not a massive deal, but I'll make the fair comparison when I do a post

2hViews 144Likes 7
Keller Jordan@kellerjordan0

@_arohan_ Sorry wdym? The main difference is a better-tuned learning rate compared to the old baseline.

Both your logfile above and the proper Muon baseline would qualify as 3325-step runs, since your logfile hits <3.28 at that time. Tho yours would need more stat sig, but that's trivial.

2hViews 89