Distributed Shampoo developer Rohan Anil says Schatten-p schedule-free optimization requires per-layer grafting for deep learning convergence · Digg