Pavel Izmailov and Andrew Gordon Wilson find LLMs can prevent catastrophic forgetting using self-generated replay data
The method kept task evaluation accuracies near 100%.
We view forgetting as drift in the model's predictions on old data. So the fix is simple: use a KL penalty on past (pretraining) data to keep old outputs fixed while the model fits the new data. 2/8

How much does a language model forget when finetuned on new tasks? We show both model size and optimization matter and forgetting can be nearly eliminated with self-generated replay! https://arxiv.org/abs/2605.26097 w/@mrtnm @dongkyucho @ShikaiQiu @rumichunara @Pavel_Izmailov 1/8
New paper: https://arxiv.org/abs/2605.26097
The main idea is that we can use an LLM to generate its own replay data to prevent forgetting, as long as we have spare capacity. Very overtrained models have to forget to learn new information.

Plot shows the loss values achievable on modeling English and Spanish. For a model trained with 20 tokens per param (Chinchilla), we can finetune on Spanish without forgetting English. But a model that's been heavily overtrained has to forget as it cannot cross the frontier.

New paper: https://arxiv.org/abs/2605.26097 The main idea is that we can use an LLM to generate its own replay data to prevent forgetting, as long as we have spare capacity. Very overtrained models have to forget to learn new information.
With amazing collaborators @mrtnm @dongkyucho @ShikaiQiu @rumichunara @andrewgwils
See also Andrew's detailed thread:
How much does a language model forget when finetuned on new tasks? We show both model size and optimization matter and forgetting can be nearly eliminated with self-generated replay! https://arxiv.org/abs/2605.26097 w/@mrtnm @dongkyucho @ShikaiQiu @rumichunara @Pavel_Izmailov 1/8