@konstmish This applies even to random data augmentation https://arxiv.org/abs/2404.00498
Quite enjoyed reading the proofs in this paper.
For anyone who doesn't know, you're no longer using SGD when doing multiple passes over your data. Most papers studying SGD essentially assume infinite amount of data, which isn't always the case in practice. The real algo is RR.

