ColBERT creator Omar Khattab demonstrates cache-optimized Product Quantization retrieving 600 million vectors in 10 milliseconds on a single CPU core · Digg
5h ago
ColBERT creator Omar Khattab demonstrates cache-optimized Product Quantization retrieving 600 million vectors in 10 milliseconds on a single CPU core
The technique scales to tens of billions of tokens.