/Tech18h ago

Nesterov Momentum Boosts Stability In ViT-VQGAN Training With Distributed Shampoo

131166.2K
Original postZachary Nado#589
Evan Walters@evaninwords

This was such a fun period in ML when Boris was making dalle-mini

Amazed to see the importance of selecting correctly Distributed Shampoo configuration for training the ViT-VQGAN ๐Ÿคฏ

TLDR: ๐Ÿ‘‰ Nesterov momentum brings more stability ๐Ÿ‘‰ Optimal settings are problem specific

6:47 PM ยท Jun 9, 2026 ยท 6.2K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS515LIKES1REPLIES1
rohan anil@_arohan_

@evaninwords It was during my pat leave too :D so special time

17hViews 515Likes 1