/Tech11h ago

Simo Ryu, creator of Stable Diffusion's LoRA implementation, says the Shampoo optimizer consistently outperforms Adam in LLM scaling laws

Scaling curves show predictable efficiency gains at larger scales

41123249.8K

#957

Original post

Simo Ryu@cloneofsimo#957inTech

Looking back i was so incredibly early

Simo Ryu@cloneofsimo

Shampoo Scaling law for language model Plot taste of Kaplan et al, but comparing shampoo and adam. Shampoo is literally such a free lunch, in large scale, in predictable manner.

3:50 AM · Jun 11, 2026 · 9.5K Views

/Tech11h ago

Simo Ryu, creator of Stable Diffusion's LoRA implementation, says the Shampoo optimizer consistently outperforms Adam in LLM scaling laws

Scaling curves show predictable efficiency gains at larger scales

41123249.8K

#957

Original post

Simo Ryu@cloneofsimo#957inTech

Looking back i was so incredibly early

Simo Ryu@cloneofsimo

Shampoo Scaling law for language model Plot taste of Kaplan et al, but comparing shampoo and adam. Shampoo is literally such a free lunch, in large scale, in predictable manner.

3:50 AM · Jun 11, 2026 · 9.5K Views

Sentiment

Users express enthusiasm for the claim that Shampoo Optimizer beats Adam in large-scale LLM scaling laws because it suggests a meaningful advance in training efficiency.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS773BOOKMARKS1LIKES2

tensorqt@tensorqt

@cloneofsimo real

9h77321

Zachary Nado@zacharynado

@cloneofsimo hell yeah

Simo Ryu@cloneofsimo

Looking back i was so incredibly early

3h40710

Strata@ChainZenit

@cloneofsimo the realization always hits different after a while.

10h43

Jesser@Jesser_xbt

@cloneofsimo Being early is only half the equation Most early entrants rotate out before the real move happens

The actual skill is conviction through the boring middle That is where most people lose their position and miss the payoff they were positioned for

11h24