I think @xueguang_ma is being too modest, so I'll provide context: he along with @rpradeep42 and a UWaterloo ugrad (Kai Sun) popularized hybrid search in its current form.
So, if you're using hybrid search today, thank them. 🙏
Yes, this is clickbait-y, so I'll support my claims 🧵
This plot reminds me of my first IR work reproducing DPR in Pyserini, where we found BM25 is amazingly helpful when hybrid with a dense retriever. BM25 is never just a simple baseline -- used the right way, it can easily outperform many fancy methods. BM25 was the most robust method shown in BEIR, the most effective and efficient method for long-context search shown in LongEmbed, and now @mattjustram and @xuzihuan4 show that BM25 can push the search agents into the best efficiency frontier.
p.s. Pyserini and pi-serini are two different repos.


