Industry Shift
Chinchilla reset the recipe
DeepMind’s 2022 work varied both parameters and tokens together, revealing that roughly twenty tokens per parameter delivers better results for the same compute. The 70 B Chinchilla model beat the 280 B Gopher despite using identical training resources, flipping the earlier guidance on its head.
Open Question
Exact waste remains uncounted
Former Google engineers recall the flaw was spotted internally before it became public, yet no one has tallied the total compute hours spent on oversized, undertrained runs between 2020 and early 2022.