3h ago

Jiaxin Wen, UC Berkeley CS PhD student and part-time Anthropic researcher, argues models acquire little meaningful knowledge early in pre-training and says this invalidates strong generalization claims

viemccoy replied that the link to the paper's findings remained unclear.

β€”β€”0β€”β€”
Original post

@viemccoy i guess your read is because 1) it's learned early in pre-training 2) it transfers to instruct models my read is because models learn nothing deep early in pre-training, so this invalidates all "generalization" results

11:44 AM Β· May 22, 2026 View on X

@jiaxinwen22 I'm confused how this follows from the paper?

Jiaxin WenJiaxin Wen@jiaxinwen22

@viemccoy i guess your read is because 1) it's learned early in pre-training 2) it transfers to instruct models my read is because models learn nothing deep early in pre-training, so this invalidates all "generalization" results

6:44 PM Β· May 22, 2026 Β· 138 Views
7:10 PM Β· May 22, 2026 Β· 36 Views