Jiaxin Wen, CS PhD student at UC Berkeley and part-time researcher at Anthropic, publishes “Generalization Dynamics of LM Pre-training” showing abrupt shifts between modes in language model pre-training
Charts compare large-model and small-model curves with Chinchilla Optimal points marked.
There's still a lot that we don't understand yet about the science of pretraining.
New post: "Generalization Dynamics of LM Pre-training" Most people (including me) assume that LMs smoothly mature from pattern-matching to generalizing. This mental model is wrong. The true dynamics are stranger, and far more fascinating! We call it Mode-Hopping.
Some AI research is really fun. I enjoy every result in this post. I will call your mom if you only look at the math-to-GPQA generalization experiments in Sec 5.1.
link: http://jiaxin-wen.github.io/blog/generalization-dynamics
joint work with @ZhengxuanZenWu @dawnsongtweets @wjmzbmr1

New post: "Generalization Dynamics of LM Pre-training" Most people (including me) assume that LMs smoothly mature from pattern-matching to generalizing. This mental model is wrong. The true dynamics are stranger, and far more fascinating! We call it Mode-Hopping.
also thank you all for the helpful feedback! @wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_
Some AI research is really fun. I enjoy every result in this post. I will call your mom if you only look at the math-to-GPQA generalization experiments in Sec 5.1. link: http://jiaxin-wen.github.io/blog/generalization-dynamics joint work with @ZhengxuanZenWu @dawnsongtweets @wjmzbmr1
@wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_ + @ZhengZhan13 LOL
also thank you all for the helpful feedback! @wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_
@wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_ + @xiangyuqi_pton!
also thank you all for the helpful feedback! @wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_
@wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_ Damn I just keep forgetting friends who give me valuable feedback thank you @peterbhase !
also thank you all for the helpful feedback! @wen_kaiyue @fjzzq2002 @liangqiu_1994 @deepcohen @jesse_hoogland @wenhaocha1 @YichuanM @YIFENGLIU_AI @ericjmichaud_ @yidingjiang @JoshuaRenyi @industriaalist @bingruili_