Multi-Vector Embeddings are Provably More Expressive than Single Vector Embeddings
@Raj_Jayaram_ proves that approximating multi-vector similarity with single vectors requires exponentially more dimensions.
📝 https://arxiv.org/abs/2606.23475
A fresh theoretical result from Google Research shows that representing each item as a small cloud of vectors captures similarity relationships that single-vector embeddings cannot match unless the single vectors grow exponentially larger in dimension.
Multi-Vector Embeddings are Provably More Expressive than Single Vector Embeddings
@Raj_Jayaram_ proves that approximating multi-vector similarity with single vectors requires exponentially more dimensions.
📝 https://arxiv.org/abs/2606.23475
The separation proof closes a long-standing intuition gap in retrieval theory, confirming that multi-vector methods like late interaction cannot be collapsed into single-vector approximations at comparable total size.
The work supplies no empirical checks or production-system measurements, so the practical payoff for specific models or datasets remains an open variable.
Users dismissed the paper claiming multi-vector embeddings are exponentially more expressive, sarcastically implying that more papers will not produce real understanding of the techniques.
No Digg Deeper questions have been answered for this story yet.
This is a good paper.
Multi-Vector Embeddings are Provably More Expressive than Single Vector Embeddings
@Raj_Jayaram_ proves that approximating multi-vector similarity with single vectors requires exponentially more dimensions.
📝 https://arxiv.org/abs/2606.23475

@_reachsumit @Raj_Jayaram_ This is not a huge issue if you make your approximation sparse. What matters is the number of non-zeros.
https://www.topk.io/blog/20260311-smve-multi-vector-retrieval

@mrdrozdov We are only so many papers away from understanding what we are doing