/Tech6h ago

PyLate Launches MaxSim Kernels And TACHIOM For GPU And CPU Search

910424255.7K
Original post
Antoine Chaffin@antoine_chaffin

Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU

7:22 AM · Jun 11, 2026 · 4.8K Views
Sentiment

Users are excited about PyLate's MaxSim Kernels and TACHIOM releases because they deliver strong ecosystem benefits from research and impressive GPU/CPU search performance gains.

Pos
100.0%
Neg
0.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS267
Antoine Chaffin@antoine_chaffin

MaxSim fused kernels: At the core of MaxSim is a very large |Q| * |D| matrix multiplication This is very similar to the original attention quadratic cost, that got lifted by fused kernels in FlashAttention So... let's do the same, but for MaxSim?

6hViews 267Likes 9
BOOKMARKS1
Antoine Chaffin@antoine_chaffin

@PonyRoi @tonywu_71 @Aurelien_L_ Usual cc to my fellow late interaction enjoyer gang @helloiamleonie @17Ahmetyucel @doesdatmaksense @MehdiAllahyari @jobergum @vishal_learner @trillarnie @CShorten30 @tonywu_71 (cheater) @ManuelFaysse @din0s_ @Robro612

6hViews 151Likes 4Bookmarks 1
LIKES9REPLIES2
Antoine Chaffin@antoine_chaffin

I am very happy about this release, because it shows how a good ecosystem directly benefit from ongoing research and ultimately the users A big thanks to @PonyRoi, @tonywu_71 and @Aurelien_L_ for creating such cool kernels and engaging in discussions around design and performance, it was really cool to see! I can't wait to launch big trainings with those beauties

Also big thanks to @SilvioMartinico for bearing with me with all my refactors and nits and also making changes to TACHIOM's internal to ease the merge, really appreciate it and I believe TACHIOM will be very useful for a lot of ressource-limited users (and also open new avenue of research)

Finally, as per usual, big thanks to my co-maintainer @raphaelsrty for helping through the whole process (and essentially handling the kernels discussions after I just said "yeah, we should find a nice way to merge" 😇)

6hViews 148Likes 9Bookmarks 1
RETWEETS2
Rohan Jha@Robro612

Blazing fast training kernels and blazing fast (sometimes beating GPU in my exp.) CPU indices?

Never been a better time to be a late interactor.

Antoine Chaffin@antoine_chaffin

Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU

6hViews 984Likes 18Bookmarks 0
Antoine Chaffin@antoine_chaffin

As explained in @SilvioMartinico's thread, they "explicitly allocates centroids based on token frequency and semantic variance, partitioning the workload"

This allows to cluster 600M vectors into 262K centroids in just 8 minutes on a CPU and 10 ms single-CPU search on MS MARCO (8M documents)

6hViews 101Likes 7Bookmarks 1
Antoine Chaffin@antoine_chaffin

We benched both implementations (as well as @ErikKaum's HF kernel, PR soon? 😇) and you can find a lot of discussions from here: https://github.com/lightonai/pylate/issues/224

Ultimately, we decided to merge the different kernels and make them interchangeable You can leverage them by simply installing the corresponding package https://lightonai.github.io/pylate/documentation/backends/#scoring-backends

6hViews 100Likes 8
Antoine Chaffin@antoine_chaffin

@PonyRoi was the first one to open a pull request to PyLate to merge FlashMaxSim It was followed later by @tonywu_71's and @Aurelien_L_'s PR to merge LIK https://x.com/tonywu_71/status/2064701365318767100?s=20 (very nice explanations/visualizations!)

6hViews 241Likes 7
Antoine Chaffin@antoine_chaffin

They are all very strong solutions that speed-up the training workloads while reducing memory pressure Give them a shot and do not hesitate to give feedback, I am sure these kernels will grow and become even better in the future!

6hViews 76Likes 7
Antoine Chaffin@antoine_chaffin

TACHIOM index is very easy to use in PyLate, as it shares the exact same API as all of the other indexes They're just a twist, because you have to also get (and send) the token ids corresponding to embeddings

Else it's as simple as usual: insert and search! https://lightonai.github.io/pylate/documentation/retrieval/#tachiom-retrieval

6hViews 129Likes 6
Antoine Chaffin@antoine_chaffin

TACHIOM: The most used late interaction indexes relies on k-means to compute centroids used for ANN The issue is that, at scale, it becomes very costly to run on CPU... Enters TACHIOM, that cleverly exploits the **tokens ids** to speed up the process!

6hViews 74Likes 6
Silvio Martinico@SilvioMartinico

@Robro612 @antoine_chaffin That is amazing to hear! I actually haven't even tested it against GPU indexes myself yet, so that is a huge surprise win 😅 Excited for how late-interaction is evolving!

5hViews 43Likes 3
Antaripa Saha@doesdatmaksense

@antoine_chaffin back to back banger releases (yesterday Tony, today you)!

6hViews 133Likes 1
Antoine Chaffin@antoine_chaffin

@SilvioMartinico @Robro612 Now tell the world we need a version of LateOn that works well with TACHIOM And tell them it's coming on Tuesday in the mean time 😇

5hViews 31Likes 4
Antoine Chaffin@antoine_chaffin

@doesdatmaksense Well Tony's is essentially part of this one, he just could not wait to brag, couldn't he? @tonywu_71 But I'll double down soon 😇

6hViews 76Likes 1
Silvio Martinico@SilvioMartinico

@antoine_chaffin @PonyRoi @tonywu_71 @Aurelien_L_ Thanks to you and the whole team! Loved collaborating on this, and I'm incredibly excited for the future of late-interaction! 🚀

6hViews 19Likes 2
Mnemosyne@mnemosyne_oss

@antoine_chaffin Buen punto. Si trabajas con agentes de IA, la memoria persistente local es clave. Mnemosyne implementa retrieval hibrido (semantico + temporal + entidades) sobre sqlite-vec. Sin dependencias en la nube. Todo open-source en GitHub. @mnemosyne_oss

5h