Looking back i was so incredibly early
Shampoo Scaling law for language model Plot taste of Kaplan et al, but comparing shampoo and adam. Shampoo is literally such a free lunch, in large scale, in predictable manner.
Scaling curves show predictable efficiency gains at larger scales
Looking back i was so incredibly early
Shampoo Scaling law for language model Plot taste of Kaplan et al, but comparing shampoo and adam. Shampoo is literally such a free lunch, in large scale, in predictable manner.
Users express enthusiasm for the claim that Shampoo Optimizer beats Adam in large-scale LLM scaling laws because it suggests a meaningful advance in training efficiency.

@cloneofsimo real
@cloneofsimo hell yeah
Looking back i was so incredibly early

@cloneofsimo the realization always hits different after a while.

@cloneofsimo Being early is only half the equation Most early entrants rotate out before the real move happens
The actual skill is conviction through the boring middle That is where most people lose their position and miss the payoff they were positioned for
Scaling curves show predictable efficiency gains at larger scales
Looking back i was so incredibly early
Shampoo Scaling law for language model Plot taste of Kaplan et al, but comparing shampoo and adam. Shampoo is literally such a free lunch, in large scale, in predictable manner.