1d ago

Jiaxin Wen, CS PhD student at UC Berkeley and part-time researcher at Anthropic, publishes “Generalization Dynamics of LM Pre-training” showing abrupt shifts between modes in language model pre-training

Charts compare large-model and small-model curves with Chinchilla Optimal points marked.

0
Original post

New post: "Generalization Dynamics of LM Pre-training" Most people (including me) assume that LMs smoothly mature from pattern-matching to generalizing. This mental model is wrong. The true dynamics are stranger, and far more fascinating! We call it Mode-Hopping.

8:19 AM · May 18, 2026 View on X
Reposted by

There's still a lot that we don't understand yet about the science of pretraining.

Jiaxin WenJiaxin Wen@jiaxinwen22

New post: "Generalization Dynamics of LM Pre-training" Most people (including me) assume that LMs smoothly mature from pattern-matching to generalizing. This mental model is wrong. The true dynamics are stranger, and far more fascinating! We call it Mode-Hopping.

3:19 PM · May 18, 2026 · 44.7K Views
5:32 PM · May 18, 2026 · 4.2K Views

Some AI research is really fun. I enjoy every result in this post. I will call your mom if you only look at the math-to-GPQA generalization experiments in Sec 5.1.

link: http://jiaxin-wen.github.io/blog/generalization-dynamics

joint work with @ZhengxuanZenWu @dawnsongtweets @wjmzbmr1

Jiaxin WenJiaxin Wen@jiaxinwen22

New post: "Generalization Dynamics of LM Pre-training" Most people (including me) assume that LMs smoothly mature from pattern-matching to generalizing. This mental model is wrong. The true dynamics are stranger, and far more fascinating! We call it Mode-Hopping.

3:19 PM · May 18, 2026 · 44.7K Views
3:19 PM · May 18, 2026 · 2.8K Views

also thank you all for the helpful feedback! @wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_

Jiaxin WenJiaxin Wen@jiaxinwen22

Some AI research is really fun. I enjoy every result in this post. I will call your mom if you only look at the math-to-GPQA generalization experiments in Sec 5.1. link: http://jiaxin-wen.github.io/blog/generalization-dynamics joint work with @ZhengxuanZenWu @dawnsongtweets @wjmzbmr1

3:19 PM · May 18, 2026 · 2.8K Views
3:19 PM · May 18, 2026 · 1.9K Views

@wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_ + @ZhengZhan13 LOL

Jiaxin WenJiaxin Wen@jiaxinwen22

also thank you all for the helpful feedback! @wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_

3:19 PM · May 18, 2026 · 1.9K Views
3:31 PM · May 18, 2026 · 415 Views

@wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_ + @xiangyuqi_pton!

Jiaxin WenJiaxin Wen@jiaxinwen22

also thank you all for the helpful feedback! @wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_

3:19 PM · May 18, 2026 · 1.9K Views
3:58 PM · May 18, 2026 · 652 Views

@wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_ Damn I just keep forgetting friends who give me valuable feedback thank you @peterbhase !

Jiaxin WenJiaxin Wen@jiaxinwen22

also thank you all for the helpful feedback! @wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_

3:19 PM · May 18, 2026 · 1.9K Views
7:26 PM · May 18, 2026 · 834 Views