/Tech5h ago

Jaime Sevilla of Epoch AI argues pretraining scaling, not in-context learning, drives out-of-distribution AI performance gains

Researcher Andreas Kirsch says upcoming Mythos and Fable evaluations will test the theory

741185.4K

#271

Original post

Andreas Kirsch 🇺🇦@BlackHC#271inTech

@Jsevillamol Seeing the evals for Mythos/Fable will be interesting

Jaime Sevilla@Jsevillamol

My pet theory is that the driving force of improvements for out-of-distribution tasks is mostly pure pretraining scaling, which if you squint this is broadly consistent with.

11:32 AM · Jul 2, 2026 · 129 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS883LIKES3REPLIES2

Lisan al Gaib@scaling01

@Jsevillamol I have two ideas: - larger/deeper models have better in-context-learning - simply more, and more diverse training data

Jaime Sevilla@Jsevillamol

My pet theory is that the driving force of improvements for out-of-distribution tasks is mostly pure pretraining scaling, which if you squint this is broadly consistent with.

3h88330

VinceK 🏳️🇫🇷🇺🇦🏳️@WesternMishima

@Jsevillamol You will test Mythos ?

4h18

Rota 🚪🧎‍♂️@pli_cachete

@Jsevillamol Say more?

4h195

Gideon Futerman@GFuterman

@Jsevillamol What effect do you think this belief has on your timelines (in a sense that "timelines" is meaningful)? If you thought it came from post-training, would this change your timelines/predictions

5h421

VinceK 🏳️🇫🇷🇺🇦🏳️@WesternMishima

@Jsevillamol It will be a good hint for your take.

4h7