/Tech6h ago

PyLate Launches MaxSim Kernels And TACHIOM For GPU And CPU Search

910424255.7K

Original post

Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU

7:22 AM · Jun 11, 2026 · 4.8K Views

/Tech6h ago

PyLate Launches MaxSim Kernels And TACHIOM For GPU And CPU Search

910424255.7K

#865

Original post

Antoine Chaffin@antoine_chaffin

7:22 AM · Jun 11, 2026 · 4.8K Views

Sentiment

Users are excited about PyLate's MaxSim Kernels and TACHIOM releases because they deliver strong ecosystem benefits from research and impressive GPU/CPU search performance gains.

Pos

100.0%

Neg

0.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Antoine Chaffin@antoine_chaffin

MaxSim fused kernels: At the core of MaxSim is a very large |Q| * |D| matrix multiplication This is very similar to the original attention quadratic cost, that got lifted by fused kernels in FlashAttention So... let's do the same, but for MaxSim?

6h2679

BOOKMARKS1

Antoine Chaffin@antoine_chaffin

@PonyRoi @tonywu_71 @Aurelien_L_ Usual cc to my fellow late interaction enjoyer gang @helloiamleonie @17Ahmetyucel @doesdatmaksense @MehdiAllahyari @jobergum @vishal_learner @trillarnie @CShorten30 @tonywu_71 (cheater) @ManuelFaysse @din0s_ @Robro612

6h15141

LIKES9REPLIES2

Antoine Chaffin@antoine_chaffin

I am very happy about this release, because it shows how a good ecosystem directly benefit from ongoing research and ultimately the users A big thanks to @PonyRoi, @tonywu_71 and @Aurelien_L_ for creating such cool kernels and engaging in discussions around design and performance, it was really cool to see! I can't wait to launch big trainings with those beauties

Also big thanks to @SilvioMartinico for bearing with me with all my refactors and nits and also making changes to TACHIOM's internal to ease the merge, really appreciate it and I believe TACHIOM will be very useful for a lot of ressource-limited users (and also open new avenue of research)

Finally, as per usual, big thanks to my co-maintainer @raphaelsrty for helping through the whole process (and essentially handling the kernels discussions after I just said "yeah, we should find a nice way to merge" 😇)

6h14891

RETWEETS2

Rohan Jha@Robro612

Blazing fast training kernels and blazing fast (sometimes beating GPU in my exp.) CPU indices?

Never been a better time to be a late interactor.

Antoine Chaffin@antoine_chaffin

6h984180

Antoine Chaffin@antoine_chaffin

As explained in @SilvioMartinico's thread, they "explicitly allocates centroids based on token frequency and semantic variance, partitioning the workload"

This allows to cluster 600M vectors into 262K centroids in just 8 minutes on a CPU and 10 ms single-CPU search on MS MARCO (8M documents)

6h10171

Antoine Chaffin@antoine_chaffin

We benched both implementations (as well as @ErikKaum's HF kernel, PR soon? 😇) and you can find a lot of discussions from here: https://github.com/lightonai/pylate/issues/224

Ultimately, we decided to merge the different kernels and make them interchangeable You can leverage them by simply installing the corresponding package https://lightonai.github.io/pylate/documentation/backends/#scoring-backends

6h1008

Antoine Chaffin@antoine_chaffin

@PonyRoi was the first one to open a pull request to PyLate to merge FlashMaxSim It was followed later by @tonywu_71's and @Aurelien_L_'s PR to merge LIK https://x.com/tonywu_71/status/2064701365318767100?s=20 (very nice explanations/visualizations!)

6h2417

Antoine Chaffin@antoine_chaffin

They are all very strong solutions that speed-up the training workloads while reducing memory pressure Give them a shot and do not hesitate to give feedback, I am sure these kernels will grow and become even better in the future!

6h767

Antoine Chaffin@antoine_chaffin

TACHIOM index is very easy to use in PyLate, as it shares the exact same API as all of the other indexes They're just a twist, because you have to also get (and send) the token ids corresponding to embeddings

Else it's as simple as usual: insert and search! https://lightonai.github.io/pylate/documentation/retrieval/#tachiom-retrieval

6h1296

Antoine Chaffin@antoine_chaffin

TACHIOM: The most used late interaction indexes relies on k-means to compute centroids used for ANN The issue is that, at scale, it becomes very costly to run on CPU... Enters TACHIOM, that cleverly exploits the **tokens ids** to speed up the process!

6h746

Silvio Martinico@SilvioMartinico

@Robro612 @antoine_chaffin That is amazing to hear! I actually haven't even tested it against GPU indexes myself yet, so that is a huge surprise win 😅 Excited for how late-interaction is evolving!

5h433

Antaripa Saha@doesdatmaksense

@antoine_chaffin back to back banger releases (yesterday Tony, today you)!

6h1331

Antoine Chaffin@antoine_chaffin

@SilvioMartinico @Robro612 Now tell the world we need a version of LateOn that works well with TACHIOM And tell them it's coming on Tuesday in the mean time 😇

5h314

Antoine Chaffin@antoine_chaffin

@doesdatmaksense Well Tony's is essentially part of this one, he just could not wait to brag, couldn't he? @tonywu_71 But I'll double down soon 😇

6h761

Silvio Martinico@SilvioMartinico

@antoine_chaffin @PonyRoi @tonywu_71 @Aurelien_L_ Thanks to you and the whole team! Loved collaborating on this, and I'm incredibly excited for the future of late-interaction! 🚀

6h192

Mnemosyne@mnemosyne_oss

@antoine_chaffin Buen punto. Si trabajas con agentes de IA, la memoria persistente local es clave. Mnemosyne implementa retrieval hibrido (semantico + temporal + entidades) sobre sqlite-vec. Sin dependencias en la nube. Todo open-source en GitHub. @mnemosyne_oss