Microsoft Research's Dimitris Papailiopoulos questions whether training optimizers like SGD and Muon create distinct model behaviors at equivalent validation loss · Digg