/Tech13h ago

Rohan Anil Questions Learning Rate Decay Impact on Training Horizon

230004.2K

#86

Original post

rohan anil@_arohan_#86inTech

@kellerjordan0 You get a different horizon with learning rate decay?

Keller Jordan@kellerjordan0

@_arohan_ This is an interesting result. I'm just saying compare against the real baseline please! There's no rerunning necessary.

2:09 AM · Jun 10, 2026 · 1.9K Views

/Tech13h ago

Rohan Anil Questions Learning Rate Decay Impact on Training Horizon

230004.2K

#86

Original post

rohan anil@_arohan_#86inTech

@kellerjordan0 You get a different horizon with learning rate decay?

Keller Jordan@kellerjordan0

@_arohan_ This is an interesting result. I'm just saying compare against the real baseline please! There's no rerunning necessary.

2:09 AM · Jun 10, 2026 · 1.9K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS2.3KLIKES16REPLIES1

rohan anil@_arohan_

@kellerjordan0 I ran these with 3375 end steps. Because thats what I saw in from the logs I was using. I am okay with this.

Keller Jordan@kellerjordan0

@_arohan_ Sorry wdym? The main difference is a better-tuned learning rate compared to the old baseline.

Both your logfile above and the proper Muon baseline would qualify as 3325-step runs, since your logfile hits <3.28 at that time. Tho yours would need more stat sig, but that's trivial.

13h2.3K160

Keller Jordan@kellerjordan0

@_arohan_ Ok, well your comparison is between an undertuned Muon and a fully-tuned Shampoo.

I guess you used result #6 logs instead of #12? Not a massive deal, but I'll make the fair comparison when I do a post

13h1447

Keller Jordan@kellerjordan0

@_arohan_ Sorry wdym? The main difference is a better-tuned learning rate compared to the old baseline.

Both your logfile above and the proper Muon baseline would qualify as 3325-step runs, since your logfile hits <3.28 at that time. Tho yours would need more stat sig, but that's trivial.

13h89