/AI7h ago

Grad, a modded-nanogpt contributor, details MAI's RL training hyperparameters, finding final-stage batch sizes scaled to nearly 1 billion tokens

RL batch sizes consistently equaled or exceeded pretraining sizes.

--0--
Quote posts
Comments
Original post
Grad@Grad62304977#987inAI

In general RL seems to have a very different batch size scaling than pretraining Also even in their previous shorter context stages, the batch sizes were always bigger or equal to the pretraining batch size

Grad@Grad62304977

Interestingly didn’t see anyone talking abt this but MAI used a batch size of almost 1B tokens during their final RL stage

4:39 AM · Jun 4, 2026 · 2.8K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS37LIKES2
Ethan@torchcompiled

@Grad62304977 @yacineMTB I’d guess this is a variance problem characteristic to a good bit of RL in general?

Ethan@torchcompiled

There is a wealth of variance reduction literature, control variants, approximate/surogate gradients through a learned model of environment, that I’ve been waiting to see if it makes its way into modern RL, like REBAR, RELAX, “backpropagation through the void”

2hViews 37Likes 2Bookmarks 0