/Tech14h ago

AI2's Nathan Lambert argues supervised fine-tuning lacks structured literature as new study tests models up to 235B parameters

The experiments isolated training factors using real-world customer datasets.

43672928740.5K

Original post

Not enough people studying SFT methods. It’s a foundation of post training with limited literature that seems very serious in an empirical sense.

Charlie O'Neill@oneill_c

1/ We fine-tune a lot of customer models, so we decided to systematically try and figure out some best practices for finetuning. SFT isn't sexy, but it's still important. We vary one SFT lever at a time across 2 model families, dense + MoE to 235B, on 4 real-world customer datasets.

What makes this clean is that each dataset is paired with an eval that took weeks to build with the customer, and the training outputs were generated to pass that eval. So the supervised target and the thing we measure downstream are the same criterion, which strips out the usual confounders

12:02 PM · Jun 19, 2026 · 38K Views

Sentiment

Users praise the systematic SFT best practices research on models up to 235B because it advances AI safety and offers valuable guidance for high-level studies.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

Md sayfullah haidar@sovon_haidar

@natolambert Great sharing! Any recommended resources to study to do high level research on this topic? Thanks

13h107

Diari@diari_cc

@natolambert @oneill_c Also very important for AI safety!

13h94