/AI2h ago

Rohan Anil Questions Learning Rate Decay Impact on Training Horizon

21200870

#79

Original post

rohan anil@_arohan_#79inAI

@kellerjordan0 You get a different horizon with learning rate decay?

Keller Jordan@kellerjordan0

@_arohan_ This is an interesting result. I'm just saying compare against the real baseline please! There's no rerunning necessary.

2:09 AM · Jun 10, 2026 · 417 Views

/AI2h ago

Rohan Anil Questions Learning Rate Decay Impact on Training Horizon

21200870

#79

Original post

rohan anil@_arohan_#79inAI

@kellerjordan0 You get a different horizon with learning rate decay?

Keller Jordan@kellerjordan0

@_arohan_ This is an interesting result. I'm just saying compare against the real baseline please! There's no rerunning necessary.

2:09 AM · Jun 10, 2026 · 417 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS453REPLIES1

rohan anil@_arohan_

@kellerjordan0 I ran these with 3375 end steps. Because thats what I saw in from the logs I was using. I am okay with this.

Keller Jordan@kellerjordan0

@_arohan_ Sorry wdym? The main difference is a better-tuned learning rate compared to the old baseline.

Both your logfile above and the proper Muon baseline would qualify as 3325-step runs, since your logfile hits <3.28 at that time. Tho yours would need more stat sig, but that's trivial.

2h45360

LIKES7

Keller Jordan@kellerjordan0

@_arohan_ Ok, well your comparison is between an undertuned Muon and a fully-tuned Shampoo.

I guess you used result #6 logs instead of #12? Not a massive deal, but I'll make the fair comparison when I do a post

2h1447

Keller Jordan@kellerjordan0

@_arohan_ Sorry wdym? The main difference is a better-tuned learning rate compared to the old baseline.

Both your logfile above and the proper Muon baseline would qualify as 3325-step runs, since your logfile hits <3.28 at that time. Tho yours would need more stat sig, but that's trivial.

2h89