/Tech17h ago

Zijian Liu mathematically proves Random Reshuffling dominates Stochastic Gradient Descent for multi-epoch training on finite datasets

Story Overview

Zijian Liu's new proof shows that random reshuffling beats standard SGD assumptions once training loops over a fixed dataset multiple times, closing a gap between textbook analysis and how optimizers actually run on real finite data.

52683925219.7K

#111

Original post

Keller Jordan@kellerjordan0#703inTech

@konstmish This applies even to random data augmentation https://arxiv.org/abs/2404.00498

Konstantin Mishchenko@konstmish

Quite enjoyed reading the proofs in this paper.

For anyone who doesn't know, you're no longer using SGD when doing multiple passes over your data. Most papers studying SGD essentially assume infinite amount of data, which isn't always the case in practice. The real algo is RR.

12:36 PM · Jul 1, 2026 · 950 Views

Theory Win

Why the math finally lines up

Classic SGD bounds assumed infinite data streams, but every practical multi-epoch run already permutes the dataset each pass. The new rate holds for any reasonable step size and any number of epochs without the old restrictions that forced worse bounds.

Open Question

What stays unproven

The result covers smooth convex finite-sum problems only; non-convex deep-learning cases and concrete speed-ups on models like those tested on CIFAR-10 remain outside the current guarantees.

Sentiment

Users expressed interest in the paper showing random reshuffling outperforms SGD for finite data, noting it was new information to them.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement