feels like the most noteworthy paper about continual learning in quite some time? most other work branded as such is essentially fancy RL-flavored context distillation, this is some real physics of language models shit
Zyphra is sharing our first work in continual learning where we study: Can LLMs learn forever from new data?
Many see continual learning as a path to AGI through recursive self-improvement (RSI).
The first obstacle is plasticity loss. We derive a scaling law for its onset 🧵





