/Tech2h ago

Researcher Clarifies Muon Optimizer Link To Shampoo With Beta Settings

1412239
Original post
You Jiacheng@YouJiacheng#903inTech

@zacharynado Shampoo β1=β2=0 is Muon with β1=0, i.e. SSD. When Muon has β1>0, you can't use Shampoo with the same β1 but β2=0: it uses current gradient to compute preconditioner and applies it to momentum. Muon uses momentum to compute preconditioner and applies it to momentum.

Zachary Nado@zacharynado

@YouJiacheng what is the reasoning for muon being b1=b2 and not b2=0? I think Runa and leloy put it nicely here

and

but I might be missing something 🙂

6:21 AM · Jun 11, 2026 · 195 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS44LIKES1
You Jiacheng@YouJiacheng

@zacharynado with β1=β2, the difference is EMA(G)EMA(G.T) vs. EMA(G@G.T). I made an analog to E[g]² vs. E[g²], where the difference is Var[g], which is "variance-damping".

You Jiacheng@YouJiacheng

@zacharynado Shampoo β1=β2=0 is Muon with β1=0, i.e. SSD. When Muon has β1>0, you can't use Shampoo with the same β1 but β2=0: it uses current gradient to compute preconditioner and applies it to momentum. Muon uses momentum to compute preconditioner and applies it to momentum.

2hViews 44Likes 1Bookmarks 0