Ilya (sorry for name dropping) met with me in 2024 and said in a meeting that we can do better than Shampoo family (renamed to Muon and friends).
Now I can say this is very true, there exist an optimizer that shows the same scale of improvements that shampoo shows over adamw on deep learning models.
Fundamental optimization continues to drive progress. Fascinatingly the work to get there is all or nothing. Nothing works under everything works.



