/AI40d ago

Rohan Anil traces Shampoo optimizer origins in X thread

Rohan Anil at Anthropic responded on X to a thread spotlighting the 2018 Shampoo optimizer paper amid mentions of ShampooLinear from Scale ML sessions. He cited the original paper, outlined its development from initial implementations to later references, and highlighted a footnote revealing the name as a pun on 'pre-conditioning' for hair shampoo. Shampoo preserves gradient tensor structure using separate preconditioning matrices per parameter dimension.

107333110.4K

#79

Original post

rohan anil@_arohan_#79inAI

Being a bit pedantic and finally got a few mins to run this. kthx bye.

Tony S.F.@tonysilveti

I remember in August 2024 (before the anthology paper) when I was attending the modula-in-numpy sessions of Scale ML it was even called ShampooLinear

6:50 PM · Apr 29, 2026 · 14.5K Views

/AI40d ago

Rohan Anil traces Shampoo optimizer origins in X thread

107333110.4K

#79

Original post

rohan anil@_arohan_#79inAI

Being a bit pedantic and finally got a few mins to run this. kthx bye.

Tony S.F.@tonysilveti

I remember in August 2024 (before the anthology paper) when I was attending the modula-in-numpy sessions of Scale ML it was even called ShampooLinear

6:50 PM · Apr 29, 2026 · 14.5K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS8.8KBOOKMARKS31LIKES69RETWEETS3REPLIES9

rohan anil@_arohan_

The nesterov momentum is pretty good. I believe origin might be from James Martens 2011 paper on importance of momentum in neural networks. The derivation requires an approximation for the half step.

rohan anil@_arohan_

Being a bit pedantic and finally got a few mins to run this. kthx bye.

5h8.8K6931

davidad 🎇@davidad

somehow the QT’d meme caused me to suddenly realize why the Shampoo algorithm is called that (indeed, i had never read the original paper)

39d2.2K214

rohan anil@_arohan_

Shampoo 2018 if you want a citation.

rohan anil@_arohan_

Being a bit pedantic and finally got a few mins to run this. kthx bye.

40d1.7K81

rohan anil@_arohan_

If there is a better citation let me know?

rohan anil@_arohan_

The nesterov momentum is pretty good. I believe origin might be from James Martens 2011 paper on importance of momentum in neural networks. The derivation requires an approximation for the half step.

5h1.7K40