23h ago

Ben Clavié extracts BM25-ready sparse vocabularies from frozen dense retrievers using Sparse Autoencoders

The technique requires no model fine-tuning or modification.

0
Original post

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies @bclavie et al. extract indexable, BM25-ready sparse features from frozen dense retrievers using reconstruction-trained Sparse Autoencoders. 📝 https://arxiv.org/abs/2605.29384

9:20 PM · May 28, 2026 View on X
Reposted by