3h ago

Harvey Team Identifies Key Levers for Legal AI Agent Post-Training

0
Original post

Trajectory-based error analysis points to levers for post-training and harness engineering! From the @harvey team: - Verify-and-revise correlates with the biggest score jump (+1.5). - "Fan-out" tool parallelism hurts (-0.5); potentially adds noise without direction - Grounding drafts against source evidence is +0.3, but only occurs in 19% of trajectories Excited for more behavior-level analysis over long-horizon agent evals - great example here from Legal Agent Benchmark (LAB)!

1:31 PM · May 26, 2026 View on X