9h ago

Erik Kaunismäki SWE at Hugging Face releases MaxSim kernel for ColBERT retrieval

0

Erik Kaunismäki SWE at Hugging Face released the MaxSim kernel on Hugging Face under erikkaum/maxsim. The kernel replaces full similarity matrix materialization in ColBERT and PyLate models with tiled scoring that uses Metal simdgroup_matrix and WMMA instructions. It delivers 3–5× speedup over naive PyTorch baselines. Perplexity AI simultaneously open-sourced the related pplx-embed-v1-late-0.6b multilingual ColBERT-style model, which ships usage instructions for the new kernel.

Original post

Releasing my first kernel on @huggingface: MaxSim Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA. Result is 3–5× speedup compared to naive PyTorch. Try it out 👇

4:38 AM · May 18, 2026 View on X
Reposted by

oh! cool to see @perplexity_ai train late interaction (colbert) models

BoBo@bo_wangbo

okay maybe it's a good time? We have a small colbert model trained at pplx, it is a continue-training of pplx-embed-0.6b, so native multilingual, just made it open and added a section how to use MaxSim kernel: https://huggingface.co/perplexity-ai/pplx-embed-v1-late-0.6b

5:07 PM · May 18, 2026 · 10K Views
8:12 PM · May 18, 2026 · 1.5K Views
Erik Kaunismäki SWE at Hugging Face releases MaxSim kernel for ColBERT retrieval · Digg