/Tech4h ago

Rohan Anil Suggests Per-Layer Learning Rates Derived From Adam Or Muon

31005504
Original post
rohan anil@_arohan_#102inTech

@DimitrisPapail What about alpha:

https://arxiv.org/pdf/2002.11803

Use per layer learning rates. Derive it from Adam or Muon runs.

@_arohan_ are you offering me credits? 馃ゲ i do not accept payment in optimizers

10:21 AM 路 Jun 11, 2026 路 127 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS258
rohan anil@_arohan_

@DimitrisPapail

rohan anil@_arohan_

@DimitrisPapail Sorry this one https://openreview.net/pdf?id=FpKgG31Z_i9

4hViews 258Likes 3Bookmarks 1
BOOKMARKS2LIKES3REPLIES1
rohan anil@_arohan_

@DimitrisPapail Sorry this one https://openreview.net/pdf?id=FpKgG31Z_i9

rohan anil@_arohan_

@DimitrisPapail What about alpha:

https://arxiv.org/pdf/2002.11803

Use per layer learning rates. Derive it from Adam or Muon runs.

4hViews 128Likes 3Bookmarks 2