Sergey Levine argues LLMs develop emergent capabilities by composing simpler skills in novel ways instead of imitating training data
Anirudh Goyal's co-authored paper provides a mathematical framework.
Check out the full interview with one of the top robotics researchers: https://www.dwarkesh.com/p/sergey-levine
We pre-train LLMs on the whole of the internet. You might think this explains how they learn so many emergent capabilities: the knowledge is implicit in the training data. But in fact models can do things that were never demonstrated anywhere in training! @svlevine argues that the real source of emergent capabilities is compositionality:
@dwarkesh_sp
This is the phenomenon our paper (with @prfsanjeevarora) tried to formalize: as models scale, basic skills can compose into complex skills.
That gives a theory for emergence beyond direct imitation of training data.
We pre-train LLMs on the whole of the internet. You might think this explains how they learn so many emergent capabilities: the knowledge is implicit in the training data. But in fact models can do things that were never demonstrated anywhere in training! @svlevine argues that the real source of emergent capabilities is compositionality:
I'm willing to believe this is true, but I also think people who make statements like this haven't seen the amount of crazy shit that is actually on the internet. It might be very hard for you to track down, but there is almost always an example for anything.
We pre-train LLMs on the whole of the internet. You might think this explains how they learn so many emergent capabilities: the knowledge is implicit in the training data. But in fact models can do things that were never demonstrated anywhere in training! @svlevine argues that the real source of emergent capabilities is compositionality: