spent half of my PhD working on optimization research, I only published one negative result paper showing beating SGD with momentum is hard especially when noise dominated regime 🤣
xAI co-founder Guodong Zhang highlights 2019 paper showing alternative optimizers fail to outperform SGD with momentum in noise-dominated regimes
Exceeding critical batch sizes limits scaling gains for alternative algorithms.
Most Activity
@Guodzh this one right? https://arxiv.org/abs/1907.04164
spent half of my PhD working on optimization research, I only published one negative result paper showing beating SGD with momentum is hard especially when noise dominated regime 🤣

and it ended up in 2022/2023 many OAI/DM ppl told me they learnt most things about neural network training from that paper
@Guodzh the OG optimizers crew 💙
spent half of my PhD working on optimization research, I only published one negative result paper showing beating SGD with momentum is hard especially when noise dominated regime 🤣

@Guodzh 😭 how ml theory usually goes