@DimitrisPapail What about alpha:
https://arxiv.org/pdf/2002.11803
Use per layer learning rates. Derive it from Adam or Muon runs.
Dimitris Papailiopoulos@DimitrisPapail
@_arohan_ are you offering me credits? 馃ゲ i do not accept payment in optimizers
10:21 AM 路 Jun 11, 2026 路 127 Views