1d ago

Researchers Introduce SODA Optimizer Wrapper That Eliminates Weight Decay Tuning

0
Original post

1/ We introduce SODA: a simple optimizer wrapper that improves a base optimizer, adds no hyperparameters, and removes the need to tune weight decay. The wrapper provides consistent improvement. Most notably, SODA(Muon) beats Muon even when Muon gets a tuned weight decay sweep.

5:57 AM · May 15, 2026 View on X
Researchers Introduce SODA Optimizer Wrapper That Eliminates Weight Decay Tuning · Digg