2d ago

SuperBPE study shows tokenizer compression shapes LLM scaling laws

31962311423.9K

——0——

SuperBPE research finds that language model scaling laws depend on tokenizer compression rates. Higher compression reduces the compute-optimal ratio of training tokens to model parameters. The relationship stays consistent when measured as bytes per parameter across levels. The work concludes that tokenizers must be designed deliberately rather than treated as fixed components. Established LLM scaling laws prove sensitive to these tokenization choices.

Original post

Alisa Liu#1011@ALISAWUFFLES

In SuperBPE we found: as tokenizer compression increases, the compute-optimal ratio of train tokens to model params decreases — and remarkably, corresponds to the same underlying ratio of train *bytes* / param! Our new work makes it official: scaling laws depend on compression.

4:23 PM · May 14, 2026

Cluster engagement

132 snapshots

Reposted by

#180@_ALBERTGU

QUOTE POST

#1011Alisa Liu@ALISAWUFFLES

Tomasz Limisiewicz@TomLimi

We present Compute Optimal Tokenization! 🔡 Common in LLM scaling works stick to one tokenizer, sweeping data/model size. But what happens when we control the tokenizer’s compression rate (bytes/token)? Here we sweep tokenizers, params, and data across compute budgets: [1/N]

3:10 PM · May 4, 2026 · 53.5K Views

11:23 PM · May 14, 2026 · 23.9K Views

#1011Alisa Liu@ALISAWUFFLES

Please see @TomLimi's thread & paper for all the cool findings. 🔍 Rather than being a static object, the tokenizer is something we can & should deliberately design as we scale up our models and runs!

Alisa Liu@alisawuffles

11:23 PM · May 14, 2026 · 23.9K Views

11:23 PM · May 14, 2026 · 72 Views