Jiaxin Wen, UC Berkeley CS PhD student and part-time Anthropic researcher, argues models acquire little meaningful knowledge early in pre-training and says this invalidates strong generalization claims
viemccoy replied that the link to the paper's findings remained unclear.
ββ0ββ
@jiaxinwen22 I'm confused how this follows from the paper?
@viemccoy i guess your read is because 1) it's learned early in pre-training 2) it transfers to instruct models my read is because models learn nothing deep early in pre-training, so this invalidates all "generalization" results
6:44 PM Β· May 22, 2026 Β· 138 Views
7:10 PM Β· May 22, 2026 Β· 36 Views