/Tech2h ago

Researcher Queries Best Scaling Regime For Large Model Performance Prediction

4232193.3K

Original post

Stella Biderman@BlancheMinerva#289inTech

I'm training a big model w/ P parameters and D tokens. Before I do so, I train smaller models to estimate the performance of the big model. Which of the following scaling regimes should I use to get the best predictions? 1. Fix D across all models 2. Fix P/D 3. Something else

5:59 PM · Jul 4, 2026 · 3.8K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS651BOOKMARKS1LIKES4REPLIES2

Stella Biderman@BlancheMinerva

I expect the answer to be #2, but I haven't been able to find any evidence for it.

GPT-5.5 says #1 Opus 4.8 says #2 for "understanding scaling" but #1 for "predicting the final loss" Fable says #2

Stella Biderman@BlancheMinerva

2h65141

Sean Cantrell@ThePremiseOfIt

@BlancheMinerva Didn't Chinchilla answer this? The real answer is nonlinear as the D/P=20 they propose had a cliff at some smaller P where D should be bigger, but generally for P>~1B 20 seems fine.

1h7