@zacharynado Shampoo β1=β2=0 is Muon with β1=0, i.e. SSD. When Muon has β1>0, you can't use Shampoo with the same β1 but β2=0: it uses current gradient to compute preconditioner and applies it to momentum. Muon uses momentum to compute preconditioner and applies it to momentum.
@YouJiacheng what is the reasoning for muon being b1=b2 and not b2=0? I think Runa and leloy put it nicely here
and
but I might be missing something 🙂