ColBERT creator Omar Khattab highlights retrieval over 600 million vectors in 10 milliseconds using a single CPU core
It uses optimized Product Quantization for late-interaction models.
——0——
@lateinteraction okay so this is cool and all but I really wanna use Qwen3-8B-Embedding…
this is such an impressive result: search over ~600,000,000 colbert vectors in 10 milliseconds, with a *single* CPU core. and since this algorithm has sub-linear latency, there’s no excuse for anyone up to tens of billions of tokens
1:11 PM · May 28, 2026 · 20.7K Views
4:13 PM · May 28, 2026 · 275 Views