9h ago

ICL Suite Shows LLMs Lack Monotonic Generalization Gains During Pretraining

2244274.4K

——0——

Original post

Very interesting ICL suite that reveals that LLMs don't just monotonically increase generalization during pretraining. The eval suite is also helpful in selecting pretraining ckpts and data measured by OOD tasks downstream. The test aims to idenitfy two different modes: 1 Parrot (shortcut from model prior or shallow features in context) 2 Intelligence (true ICL) The blog also addresses some of my concerns like general evaluation noise, optimization dynamics, data specificity etc under Null hypotheses. It also found simple metrics below on activation/grad gram matrics that aim to measure solution complexity don't show strong correlation with this eval.

12:14 PM · May 30, 2026