5h ago

Annealing Offers Simpler Alternative To CPT For Model Training

11913641

——0——

Original post

Annealing tends to be more "idiot proof". Just set it and forget it. But requires you to have access to all the intermediate assets (you already trained a model), and you tend to need to do longer runs overall. But you kind of just rewind, change the mix, and see what happens.

10:51 AM · May 30, 2026

#999Cody Blakeney@CODE_STAR

A lot has changed in LLM data and training since we wrote it, but it's still often the "least bad" approach.

Check it out if you want your data to spark joy, too

arxiv.org

Does your data spark joy? Performance gains from domain upsampling...

Pretraining datasets for large language models (LLMs) have grown to trillions of tokens composed of large amounts of CommonCrawl (CC) web scrape along with smaller, domain-specific datasets. It is...

Cody Blakeney@code_star

5:51 PM · May 30, 2026 · 173 Views

5:56 PM · May 30, 2026 · 468 Views