/Tech4h ago

Rohan Anil Suggests Per-Layer Learning Rates Derived From Adam Or Muon

31005504

Original post

rohan anil@_arohan_#102inTech

@DimitrisPapail What about alpha:

https://arxiv.org/pdf/2002.11803

Use per layer learning rates. Derive it from Adam or Muon runs.

@_arohan_ are you offering me credits? 🥲 i do not accept payment in optimizers

10:21 AM · Jun 11, 2026 · 127 Views

/Tech4h ago

31005504

Original post

rohan anil@_arohan_#102inTech

@DimitrisPapail What about alpha:

https://arxiv.org/pdf/2002.11803

Use per layer learning rates. Derive it from Adam or Muon runs.

@_arohan_ are you offering me credits? 🥲 i do not accept payment in optimizers

10:21 AM · Jun 11, 2026 · 127 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

@DimitrisPapail

@DimitrisPapail Sorry this one https://openreview.net/pdf?id=FpKgG31Z_i9

4h25831

BOOKMARKS2LIKES3REPLIES1

@DimitrisPapail Sorry this one https://openreview.net/pdf?id=FpKgG31Z_i9

@DimitrisPapail What about alpha:

https://arxiv.org/pdf/2002.11803

Use per layer learning rates. Derive it from Adam or Muon runs.

4h12832