I'm training a big model w/ P parameters and D tokens. Before I do so, I train smaller models to estimate the performance of the big model. Which of the following scaling regimes should I use to get the best predictions? 1. Fix D across all models 2. Fix P/D 3. Something else
Researcher Queries Best Scaling Regime For Large Model Performance Prediction
4232193.3K
5:59 PM · Jul 4, 2026 · 3.8K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Digg Deeper
No Digg Deeper questions have been answered for this story yet.
Posts from X
Most Activity
Most Activity
VIEWS651BOOKMARKS1LIKES4REPLIES2
Stella Biderman@BlancheMinerva
I expect the answer to be #2, but I haven't been able to find any evidence for it.
GPT-5.5 says #1 Opus 4.8 says #2 for "understanding scaling" but #1 for "predicting the final loss" Fable says #2
Stella Biderman@BlancheMinerva
I'm training a big model w/ P parameters and D tokens. Before I do so, I train smaller models to estimate the performance of the big model. Which of the following scaling regimes should I use to get the best predictions? 1. Fix D across all models 2. Fix P/D 3. Something else
2hViews 651Likes 4Bookmarks 1

Sean Cantrell@ThePremiseOfIt
@BlancheMinerva Didn't Chinchilla answer this? The real answer is nonlinear as the D/P=20 they propose had a cliff at some smaller P where D should be bigger, but generally for P>~1B 20 seems fine.
1hViews 7