5h ago

ColBERT creator Omar Khattab highlights retrieval over 600 million vectors in 10 milliseconds using a single CPU core

It uses optimized Product Quantization for late-interaction models.

0
Original post

this is such an impressive result: search over ~600,000,000 colbert vectors in 10 milliseconds, with a *single* CPU core. and since this algorithm has sub-linear latency, there’s no excuse for anyone up to tens of billions of tokens

6:11 AM · May 28, 2026 View on X

@lateinteraction okay so this is cool and all but I really wanna use Qwen3-8B-Embedding…

Omar KhattabOmar Khattab@lateinteraction

this is such an impressive result: search over ~600,000,000 colbert vectors in 10 milliseconds, with a *single* CPU core. and since this algorithm has sub-linear latency, there’s no excuse for anyone up to tens of billions of tokens

1:11 PM · May 28, 2026 · 20.7K Views
4:13 PM · May 28, 2026 · 275 Views
ColBERT creator Omar Khattab highlights retrieval over 600 million vectors in 10 milliseconds using a single CPU core · Digg