We pre-train LLMs on the whole of the internet. You might think this explains how they learn so many emergent capabilities: the knowledge is implicit in the training data.
But in fact models can do things that were never demonstrated anywhere in training!
@svlevine argues that the real source of emergent capabilities is compositionality:










