/Tech11h ago

Simo Ryu, creator of Stable Diffusion's LoRA implementation, says the Shampoo optimizer consistently outperforms Adam in LLM scaling laws

Scaling curves show predictable efficiency gains at larger scales

41123249.8K
Original post
Simo Ryu@cloneofsimo#957inTech

Looking back i was so incredibly early

Simo Ryu@cloneofsimo

Shampoo Scaling law for language model Plot taste of Kaplan et al, but comparing shampoo and adam. Shampoo is literally such a free lunch, in large scale, in predictable manner.

3:50 AM · Jun 11, 2026 · 9.5K Views
Sentiment

Users express enthusiasm for the claim that Shampoo Optimizer beats Adam in large-scale LLM scaling laws because it suggests a meaningful advance in training efficiency.

Pos
100.0%
Neg
0.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS773BOOKMARKS1LIKES2
tensorqt@tensorqt

@cloneofsimo real

9hViews 773Likes 2Bookmarks 1
Zachary Nado@zacharynado

@cloneofsimo hell yeah

Simo Ryu@cloneofsimo

Looking back i was so incredibly early

3hViews 407Likes 1Bookmarks 0
Strata@ChainZenit

@cloneofsimo the realization always hits different after a while.

10hViews 43
Jesser@Jesser_xbt

@cloneofsimo Being early is only half the equation Most early entrants rotate out before the real move happens

The actual skill is conviction through the boring middle That is where most people lose their position and miss the payoff they were positioned for

11hViews 24