2d ago

SuperBPE study shows tokenizer compression shapes LLM scaling laws

0

SuperBPE research finds that language model scaling laws depend on tokenizer compression rates. Higher compression reduces the compute-optimal ratio of training tokens to model parameters. The relationship stays consistent when measured as bytes per parameter across levels. The work concludes that tokenizers must be designed deliberately rather than treated as fixed components. Established LLM scaling laws prove sensitive to these tokenization choices.

Original post

In SuperBPE we found: as tokenizer compression increases, the compute-optimal ratio of train tokens to model params decreases — and remarkably, corresponds to the same underlying ratio of train *bytes* / param! Our new work makes it official: scaling laws depend on compression.

4:23 PM · May 14, 2026 View on X
Reposted by

In SuperBPE we found: as tokenizer compression increases, the compute-optimal ratio of train tokens to model params decreases — and remarkably, corresponds to the same underlying ratio of train *bytes* / param! Our new work makes it official: scaling laws depend on compression.

Tomasz LimisiewiczTomasz Limisiewicz@TomLimi

We present Compute Optimal Tokenization! 🔡 Common in LLM scaling works stick to one tokenizer, sweeping data/model size. But what happens when we control the tokenizer’s compression rate (bytes/token)? Here we sweep tokenizers, params, and data across compute budgets: [1/N]

3:10 PM · May 4, 2026 · 53.5K Views
11:23 PM · May 14, 2026 · 23.9K Views

Please see @TomLimi's thread & paper for all the cool findings. 🔍 Rather than being a static object, the tokenizer is something we can & should deliberately design as we scale up our models and runs!

Alisa LiuAlisa Liu@alisawuffles

In SuperBPE we found: as tokenizer compression increases, the compute-optimal ratio of train tokens to model params decreases — and remarkably, corresponds to the same underlying ratio of train *bytes* / param! Our new work makes it official: scaling laws depend on compression.

11:23 PM · May 14, 2026 · 23.9K Views
11:23 PM · May 14, 2026 · 72 Views
SuperBPE study shows tokenizer compression shapes LLM scaling laws · Digg