/AI7h ago

Grad, a modded-nanogpt contributor, details MAI's RL training hyperparameters, finding final-stage batch sizes scaled to nearly 1 billion tokens

RL batch sizes consistently equaled or exceeded pretraining sizes.

141322.8K

Quote posts

#1859

Comments

#987

Original post

Grad@Grad62304977#987inAI

In general RL seems to have a very different batch size scaling than pretraining Also even in their previous shorter context stages, the batch sizes were always bigger or equal to the pretraining batch size

Grad@Grad62304977

Interestingly didn’t see anyone talking abt this but MAI used a batch size of almost 1B tokens during their final RL stage

4:39 AM · Jun 4, 2026 · 2.8K Views

/AI7h ago

Grad, a modded-nanogpt contributor, details MAI's RL training hyperparameters, finding final-stage batch sizes scaled to nearly 1 billion tokens

RL batch sizes consistently equaled or exceeded pretraining sizes.

--0--

Quote posts

#1859

Comments

#987

Original post

Grad@Grad62304977#987inAI

Grad@Grad62304977

Interestingly didn’t see anyone talking abt this but MAI used a batch size of almost 1B tokens during their final RL stage

4:39 AM · Jun 4, 2026 · 2.8K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

clement@clementtisseau

@Grad62304977 for training stability I guess

7h57

LIKES2

Ethan@torchcompiled

@Grad62304977 @yacineMTB I’d guess this is a variance problem characteristic to a good bit of RL in general?

Ethan@torchcompiled

There is a wealth of variance reduction literature, control variants, approximate/surogate gradients through a learned model of environment, that I’ve been waiting to see if it makes its way into modern RL, like REBAR, RELAX, “backpropagation through the void”

Posts from X

Most Activity

VIEWS37LIKES2

Ethan@torchcompiled

@Grad62304977 @yacineMTB I’d guess this is a variance problem characteristic to a good bit of RL in general?

Ethan@torchcompiled

2h3720