/Tech21h ago

Google DeepMind's Zachary Nado argues the Muon optimizer may be a special case of Shampoo with $\beta_2$ set to zero

Solo researcher Keller created Muon, which is used by DeepSeek.

39532621.9K
Original post
Zachary Nado@zacharynado#589inTech

not sure how you can say the first major innovation since adam was muon and not shampoo. although I'll admit that muon is much simpler to implement in practice!

unless you are saying muon is a special case of shampoo with b2=0?

Yaroslav Bulatov@yaroslavvb

Keller's approach (ultra-fast iteration) is promising because it lead to the first major innovation since Adam (Muon). CIFAR was only 2 seconds to train end-to-end which meant he could try many ideas fast. His first unoptimized Muon run was something like 30 seconds but it was clear it was onto something due to large drop in steps

6:30 PM · Jun 9, 2026 · 18.7K Views
Sentiment

Users praise Keller's solo work on optimizers like Shampoo for its major impact on labs and TPU implementations beyond Adam.

Pos
100.0%
Neg
0.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS2.4KBOOKMARKS3LIKES40RETWEETS1REPLIES1
Yaroslav Bulatov@yaroslavvb

@zacharynado Credit assignment can be tricky ... I think Keller deserves special recognition because he was a junior solo researcher without a lab to advise him or promote his work, so the fact that Kimi/Qwen/DeepSeek picked it up says something

Zachary Nado@zacharynado

not sure how you can say the first major innovation since adam was muon and not shampoo. although I'll admit that muon is much simpler to implement in practice!

unless you are saying muon is a special case of shampoo with b2=0?

21hViews 2.4KLikes 40Bookmarks 3
Zachary Nado@zacharynado

@yaroslavvb 100% agree. I suffered through efficient TPU implementations of shampoo making algoperf, so I know what he did was very impactful because he found something very DL framework and hardware friendly!

lineage of ideas is a silly thing to squabble over anyways :)

Yaroslav Bulatov@yaroslavvb

@zacharynado Credit assignment can be tricky ... I think Keller deserves special recognition because he was a junior solo researcher without a lab to advise him or promote his work, so the fact that Kimi/Qwen/DeepSeek picked it up says something

21hViews 884Likes 14Bookmarks 1