/AI11h ago

Researcher Trains Efficient 149M Model For Legal Contract Clause Extraction

--0--
Original posts
Reposts
Original postOmar Khattab#160

So /goal is awesome

Over the past few weeks I used @PrimeIntellect to train a 149M late interaction model based on GTE-ModernColBERT-v1 using PyLate, focused on clause extraction from legal contracts.

On the MLEB benchmark it does well for its size: it's the best accuracy-per-parameter open model on the task, 3rd of 17 open-source models, ahead of Google's EmbeddingGemma (308M, 0.829) and the same-size legal peer Free Law ModernBERT (0.764), behind only Qwen3-Embedding-4B/8B (which are 27–53× larger).

The agents love the prime cli. I only used the UI for paying my bill.

7:54 PM · Jun 1, 2026 · 6.4K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
No ranked X posts are available for this story yet.