3h ago

Jiaxin Wen, UC Berkeley CS PhD student and part-time Anthropic researcher, argues models acquire little meaningful knowledge early in pre-training and says this invalidates strong generalization claims

viemccoy replied that the link to the paper's findings remained unclear.

1400136

——0——

Original post

@viemccoy i guess your read is because 1) it's learned early in pre-training 2) it transfers to instruct models my read is because models learn nothing deep early in pre-training, so this invalidates all "generalization" results

11:44 AM · May 22, 2026

#1753𝚟𝚒𝚎 ⟢@VIEMCCOY

@jiaxinwen22 I'm confused how this follows from the paper?

Jiaxin Wen@jiaxinwen22

6:44 PM · May 22, 2026 · 138 Views

7:10 PM · May 22, 2026 · 36 Views