/Tech19d ago

Harvey and Baseten's post-trained open-weight AI agents match closed-source frontier models on the Legal Agent Benchmark

The evaluation used over 1,200 long-horizon tasks.

2527433279133.3K

#130

Original post

sarah guo#130

Gabe Pereyra@gabepereyra#1852inTech

http://x.com/i/article/2059666894781554691

10:31 AM · May 27, 2026 · 66.2K Views

Sentiment

Some users praise Harvey and Baseten's post-trained open-weight legal agents for matching frontier performance and showing strong momentum, while others dismiss the company's prospects by betting it will fail or be outpaced.

Pos

83.3%

Neg

16.7%

7 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS59.3KBOOKMARKS129LIKES136RETWEETS7REPLIES6

Keith Rabois@rabois

Fascinating. Legal, accounting and investment banking should thrive as vertical applications of AI, especially if you run on http://Factory.ai too.

Winston Weinberg@winstonweinberg

Today we're sharing our first research collaboration with @baseten on open-weight legal agents.

Using signal from LAB (our Legal Agent Benchmark of 1,200+ tasks across 24 practice areas), we post-trained an open-weight model to match closed-source frontier performance.

Training open-weight agents for legal creates three major advantages for Harvey:

1) Cost and latency improvements:

The best-performing closed-source frontier models took an average of 22 minutes to complete each LAB task, with $50 per-task average inference cost.

Open-weight inference is substantially faster and cheaper.

2) Reasoning visibility:

In a high-stakes domain like legal, it's incredibly important to get visibility into agents' internal reasoning states - for audit and governance and also as a lever for Harvey to improve agent performance.

Closed-source foundation model providers avoid exposing raw reasoning tokens via API to prevent model distillation. For Harvey, that visibility is a major advantage.

3) Custom training:

Owning the model weights lets us customize training and modify architecture. One example: our blog’s final note on training a custom KV cache compactor.

More to come on our research collaboration with Baseten.

19d59.3K136129

Dannie Herzberg@DannieHerz

Post-training your own frontier model has become the new default for the leading AI companies. Baseten's research team partnered with the brilliant team at @harvey and showed that post-trained open models can compete at the frontier on LAB. Post-training not only enables high-quality legal agents to be more accessible, but also allows more specialization for the workflows firms actually care about.

Gabe Pereyra@gabepereyra

http://x.com/i/article/2059666894781554691

19d7.8K5135

Amir Haghighat@amiruci

@gabepereyra Great teamwork 💚

19d1734

Keith Rabois@rabois

@AgiBeerus will definitely be around. dominating.

19d74

BeerusTheAGI@AgiBeerus

@rabois Factory won’t be around in a year unless acquired by Cognition or Anthropic but not sure why they would.

19d50

BeerusTheAGI@AgiBeerus

@rabois What’s their revenue? Anthropic $50 billion, Cursor $3 billion, Cognition $500 million.

Factory $30 million at best, probably negative margin?

19d16

Keith Rabois@rabois

@AgiBeerus much much higher and accelerating faster than Cursor and likely Cognition.

19d12

M.W.L.@Wayne81573

@rabois Is $upst still investable ?

19d177

Chicken@aaronklaw

@gabepereyra which practice areas saw the biggest gains from post training? I'd guess any corporate or commercial ones. Also, for any task where the model failed was it because of bad reasoning/retrieval? That's where I am seeing the failures on my end.

19d62

Adam@HIMRobotics

@DannieHerz @harvey so cool.

18d60

ChillTA🪙@ChillTAbtc

Post-training can make a model sound more lawyerly. It does not make it current, source-faithful, or procedurally reliable.

Legal agents fail where the law changes, sources conflict, jurisdictions diverge, or facts require verified retrieval. That needs different a approach, not just better weights.

19d51

BeerusTheAGI@AgiBeerus

@rabois Oh shit just noticed cognition mogged you guys today

19d28

BeerusTheAGI@AgiBeerus

@rabois How about you and @cognition play I’ll show you mine if you show me yours

I’ll bet 100k factory is not at $500 million 12 months from now.

19d1