20M downloads / month is a new record for colbertv2
but people should probably migrate from this ancient October 2021 model to the LateOn colbert model from @raphaelsrty @antoine_chaffin et al (@LightOnIO)
20M downloads / month is a new record for colbertv2
but people should probably migrate from this ancient October 2021 model to the LateOn colbert model from @raphaelsrty @antoine_chaffin et al (@LightOnIO)
Users are impressed by ColBERTv2 reaching record downloads on Hugging Face, praising the team's long-term maintenance of the older model along with the clever puns in the LateOn name.
No Digg Deeper questions have been answered for this story yet.
At 140 million parameters, our LateOn model yield strong results 😉
Unrelated to LateOn, I'm really excited by what's happenning with multi-vector models right now
- New kind of indexes running on cpu - New multilingual models - Anisotropie being solved - Sparse multi-vector
20M downloads / month is a new record for colbertv2
but people should probably migrate from this ancient October 2021 model to the LateOn colbert model from @raphaelsrty @antoine_chaffin et al (@LightOnIO)
migrate if only to appreciate the fantastic puns in LateOn (lighton, late interaction, and maybe a call to [late?] action somehow?)
20M downloads / month is a new record for colbertv2
but people should probably migrate from this ancient October 2021 model to the LateOn colbert model from @raphaelsrty @antoine_chaffin et al (@LightOnIO)

@lateinteraction @raphaelsrty @LightOnIO legends never die

@lateinteraction @raphaelsrty @LightOnIO also at least this one isn't anisotropic

@antoine_chaffin @raphaelsrty @LightOnIO what causes other models to be anisotropic? i haven’t been following this line of stuff

@lateinteraction @raphaelsrty @antoine_chaffin @LightOnIO 20m on that ancient build is wild, but the real flex is yall kept it running long enough for ppl to realize they need the upgrade

@lateinteraction @raphaelsrty @antoine_chaffin @LightOnIO 20M is impressive no matter how old the model
but yeah the upgrade path is right there lol

There are a few theories and nothing is definitive but it seems that ModernBERT somehow has pretty anisotropic embeddings by itself in the first place Then I think that because we only train a few tokens, the others are only indirectly trained and are pulled by the dynamic And finally I think that in all generality, a model that is trained a lot for retrieval tend to become anisotropic
But really the issue is that we never regularised the model, so it makes sense they can have a weird geometry I have like a dozens regularisation that fixed it, but interestingly it does seems like anistropy is only a symptom, not the actual reason fast retrieval method are failing I will try to do a write up at some point, but it’s very easy to fix during training without degrading the main perf, it’s just that we did not care until then (also, just mean centering post-hoc goes a very long way)

@lateinteraction @raphaelsrty @antoine_chaffin @LightOnIO its good to see the old guard still pulling numbers but that migration tip feels like a friendly push