Meta's Lucas Beyer questions whether the "Stochastic Grandpa" SGD baseline relies on momentum to match modern optimizers · Digg