Perplexity open-sources a rebuilt Unigram tokenizer that reduces CPU utilization by 5x to 6x
It resolves latency bottlenecks in small rerankers and embedders
Every millisecond matters. We’re open sourcing the tokenizer we built and deployed on production; that’s far efficient than huggingface and sentencepiece.
We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x. Small rerankers and embedders run in single-digit milliseconds on GPU, making CPU tokenization a meaningful share of total latency. http://github.com/perplexityai/pplx-garden
@AravSrinivas Super cool
Every millisecond matters. We’re open sourcing the tokenizer we built and deployed on production; that’s far efficient than huggingface and sentencepiece.