Midtraining is pretraining with better data
what even is midtraining?
This stage refines models before supervised fine-tuning begins.
Midtraining is pretraining with better data
what even is midtraining?
Positive users praise the framing of midtraining as pretraining with better data because it makes sense, captures elements like context extension, and stands out as the best definition.
No Digg Deeper questions have been answered for this story yet.

@gabriberton still learning on all tokens or is there masking? Or is that when you are officially into SFT?

@gabriberton I see midtraining as a hot fix so post-training RL actually works https://arxiv.org/abs/2512.04072

@gabriberton Or with longer context
Honestly this is the best definition
Midtraining is pretraining with better data

@cthorrez Still learning on all tokens

@nv_pavlichenko True, context extension is often a big part of midtraining

@gabriberton can it also be called mid-pretraining ?

@gabriberton The best definition

@gabriberton that framing actually makes a lot of sense, honestly.

@gabriberton 🙏

@nirmalpatel_ @gabriberton Final-pretraining would be a better choice.

@gabriberton @Dorialexander

@gabriberton MRI quality in 60 seconds?