/Tech3h ago

Expert Advises Fixing P/D Ratio When Using Smaller Models To Predict Large AI Performance

21501468

#1678

Original post

Cody Blakeney@code_star#1678inTech

@BlancheMinerva If you already know P and D, then do 2.

If you are trying to decide P and D, then it's slightly more complicated.

Stella Biderman@BlancheMinerva

I'm training a big model w/ P parameters and D tokens. Before I do so, I train smaller models to estimate the performance of the big model. Which of the following scaling regimes should I use to get the best predictions? 1. Fix D across all models 2. Fix P/D 3. Something else

9:34 AM · Jul 5, 2026 · 255 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS213BOOKMARKS1LIKES8REPLIES1

Cody Blakeney@code_star

@BlancheMinerva The one caveat I would throw in here is that you should downsample your unique tokens to simulate the epoching that will be done at the full scale of P and D.

Cody Blakeney@code_star

@BlancheMinerva If you already know P and D, then do 2.

If you are trying to decide P and D, then it's slightly more complicated.

3h21381