ValsAI Releases Legal Research Bench Testing AI on Exhaustive Legal Tasks

Original post

Great work from @stochasticdoggo !

The verdict is in! Frontier models can pass the bar, yet they struggle on comprehensive legal research

Today we're releasing Legal Research Bench, a benchmark that measures models’ ability to solve realistic legal research tasks across eight areas of U.S. law

Instead of awarding partial credit, Legal Research Bench measures whether a model can conduct exhaustive legal analysis. We grade against a strict, all-pass rubric written by practicing lawyers. A model only receives full credit if every required legal element is correct

Claude Opus 4.8 leads with 43.8% all-pass accuracy, followed by GPT 5.5 (40.4%) and Claude Sonnet 4.6 (38.5%). While top models score around 80% with partial credit, none exceed 44% when every required legal element must be correct

The gap between partial and all-pass accuracy shows how difficult it remains for AI to produce complete, reliable legal research. We hope that Legal Research Bench helps better measure, and ultimately close that gap

Lots of exciting work happening in Legal AI from @harvey and @crosbylegal. Excited for the legal research benchmarks ahead!

5:05 AM · Jun 24, 2026 · 331 Views

Vals AI

VALS.AIVia

#1066

VIEWS227LIKES1REPLIES1

Andrew Drozdov@mrdrozdov

Love the visuals that Vals puts together for all of its releases.

Vals AI@ValsAI

The verdict is in! Frontier models can pass the bar, yet they struggle on comprehensive legal research

Today we're releasing Legal Research Bench, a benchmark that measures models’ ability to solve realistic legal research tasks across eight areas of U.S. law

Lots of exciting work happening in Legal AI from @harvey and @crosbylegal. Excited for the legal research benchmarks ahead!

2h22710

Andrew Drozdov@mrdrozdov

Dataset: https://www.vals.ai/benchmarks/legal_research Github: https://github.com/vals-ai/legal-research-bench

Andrew Drozdov@mrdrozdov

Love the visuals that Vals puts together for all of its releases.

2h6110