Common failure modes I've seen with embedding-only search: 1. Many redundant results 2. Empty docs 3. Glitch text getting a high score 4. Matching on topic, style, form yet completely irrelevant
But, we need embeddings. Here's why:
b) We need to capture matches with documents that have little to no text overlap with the query. You can't do that with BM25 alone.
a) We need ranking models that can be supervised for our task, otherwise the chance of good results is wishful thinking. Embeddings are easily supervised.
a) We need ranking models that can be supervised for our task, otherwise the chance of good results is wishful thinking. Embeddings are easily supervised.
Common failure modes I've seen with embedding-only search: 1. Many redundant results 2. Empty docs 3. Glitch text getting a high score 4. Matching on topic, style, form yet completely irrelevant But, we need embeddings. Here's why:
c) We need to design technology that's future proof. LLMs can be converted into embeddings, so embeddings will capture the benefits of future pre-trained models.
b) We need to capture matches with documents that have little to no text overlap with the query. You can't do that with BM25 alone.
d) We need search to be fast. Agents are powerful but slow. It's hard to imagine a high quality single-step search that isn't based on embeddings.
c) We need to design technology that's future proof. LLMs can be converted into embeddings, so embeddings will capture the benefits of future pre-trained models.
Whether you use single or multi-vector is a separate question. Having spent a lot of time modeling compositionality and ambiguity, multi-vector approaches make a lot of sense. If you want to go deep in this space check out Gaussian Embeddings and Multi-Sense/Facet Embeddings.
d) We need search to be fast. Agents are powerful but slow. It's hard to imagine a high quality single-step search that isn't based on embeddings.
Of course, using late interaction models is a no brainer. The community is active, the toolchain is there, and the results speak for themselves.
Whether you use single or multi-vector is a separate question. Having spent a lot of time modeling compositionality and ambiguity, multi-vector approaches make a lot of sense. If you want to go deep in this space check out Gaussian Embeddings and Multi-Sense/Facet Embeddings.
s/embeddings/dense embeddings/g
BM25 is an approach that leverages sparse embeddings!
Of course, using late interaction models is a no brainer. The community is active, the toolchain is there, and the results speak for themselves.