here's the Muon-OGD paper's claimed numbers for sequenced task domain training on Adam vs their Muon variant btw it's uh, not at all a small gap.
kalomaze@kalomaze
>claim: these results are "general and fundamental" to the architecture >they exclusively tested with Adam hmm, interesting work, but... you really can't study this purely in the diagonal preconditioning regime.
12:26 AM · Jun 25, 2026 · 561 Views