12h ago

Zhehang Du's Newton-Muon optimizer hits the Modded-NanoGPT target in 3,275 steps, beating baseline Muon

It trails NorMuon and the 3,125-step benchmark record.

1
Original post

Modded-NanoGPT optimization result #15: A Newton-Muon based setup has achieved a step count which is below Muon and slightly above NorMuon. This result was submitted by Zhehang Du, one of the Newton-Muon authors.

12:20 PM · May 26, 2026 View on X
Reposted by

Modded-NanoGPT optimization result #18: @zhanpeng_zhou has achieved a step count of 3225 with a preconditioned Muon variant called PMuon. This non-SOTA result is notable because it doesn't use the other SOTA-track techniques like update clamping and contra-muon.

4:21 AM · May 27, 2026 · 2K Views