Original post
Zachary Nado#589
Evan Walters@evaninwords
This was such a fun period in ML when Boris was making dalle-mini
Boris Dayma ๐๏ธ@borisdayma
Amazed to see the importance of selecting correctly Distributed Shampoo configuration for training the ViT-VQGAN ๐คฏ
TLDR: ๐ Nesterov momentum brings more stability ๐ Optimal settings are problem specific
6:47 PM ยท Jun 9, 2026 ยท 6.2K Views
