8h ago

Perplexity open-sources a rebuilt Unigram tokenizer that reduces CPU utilization by 5x to 6x

It resolves latency bottlenecks in small rerankers and embedders

0
Original post

We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x. Small rerankers and embedders run in single-digit milliseconds on GPU, making CPU tokenization a meaningful share of total latency. http://github.com/perplexityai/pplx-garden

8:55 AM · May 27, 2026 View on X

Every millisecond matters. We’re open sourcing the tokenizer we built and deployed on production; that’s far efficient than huggingface and sentencepiece.

PerplexityPerplexity@perplexity_ai

We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x. Small rerankers and embedders run in single-digit milliseconds on GPU, making CPU tokenization a meaningful share of total latency. http://github.com/perplexityai/pplx-garden

3:55 PM · May 27, 2026 · 58.6K Views
5:34 PM · May 27, 2026 · 19.1K Views

@AravSrinivas Super cool

Aravind SrinivasAravind Srinivas@AravSrinivas

Every millisecond matters. We’re open sourcing the tokenizer we built and deployed on production; that’s far efficient than huggingface and sentencepiece.

5:34 PM · May 27, 2026 · 19.1K Views
5:44 PM · May 27, 2026 · 3.2K Views