Emmy Liu shows large language models acquire skills in consistent sequence during pretraining, progressing from copying and morphology to arithmetic and complex reasoning across Pythia, OLMo, and Amber models
Graham Neubig shared the preprint on cross-family results.
Check out our new work on examining what LLMs learn and when!
We posit that LLMs have an implicit curriculum where they learn gradually more complex skills, and attempt to uncover some details of how this curriculum develops over time across model families.
Copying → morphology/translation → basic arithmetic → complex reasoning & math. Across every model family we tested, LLMs acquire skills in roughly the same order during pretraining. Can we use this to predict what a model will learn next, just from its internals? 🧵