/AI40d ago

Rohan Anil traces Shampoo optimizer origins in X thread

Rohan Anil at Anthropic responded on X to a thread spotlighting the 2018 Shampoo optimizer paper amid mentions of ShampooLinear from Scale ML sessions. He cited the original paper, outlined its development from initial implementations to later references, and highlighted a footnote revealing the name as a pun on 'pre-conditioning' for hair shampoo. Shampoo preserves gradient tensor structure using separate preconditioning matrices per parameter dimension.

107333110.4K
Original post
rohan anil@_arohan_#79inAI

Being a bit pedantic and finally got a few mins to run this. kthx bye.

Tony S.F.@tonysilveti

I remember in August 2024 (before the anthology paper) when I was attending the modula-in-numpy sessions of Scale ML it was even called ShampooLinear

6:50 PM · Apr 29, 2026 · 14.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS8.8KBOOKMARKS31LIKES69RETWEETS3REPLIES9
rohan anil@_arohan_

The nesterov momentum is pretty good. I believe origin might be from James Martens 2011 paper on importance of momentum in neural networks. The derivation requires an approximation for the half step.

rohan anil@_arohan_

Being a bit pedantic and finally got a few mins to run this. kthx bye.

5hViews 8.8KLikes 69Bookmarks 31
davidad 🎇@davidad

somehow the QT’d meme caused me to suddenly realize why the Shampoo algorithm is called that (indeed, i had never read the original paper)

39dViews 2.2KLikes 21Bookmarks 4
rohan anil@_arohan_

Shampoo 2018 if you want a citation.

rohan anil@_arohan_

Being a bit pedantic and finally got a few mins to run this. kthx bye.

40dViews 1.7KLikes 8Bookmarks 1
rohan anil@_arohan_

If there is a better citation let me know?

rohan anil@_arohan_

The nesterov momentum is pretty good. I believe origin might be from James Martens 2011 paper on importance of momentum in neural networks. The derivation requires an approximation for the half step.

5hViews 1.7KLikes 4Bookmarks 0