5h ago

SalesBench Launches Long-Horizon Agent-to-Agent Sales Evaluation Benchmark

0
Original post

there are long-horizon agent evals, and there are agent-to-agent evals. i wanted one that tested both. if agents are going to negotiate, sell, schedule, and coordinate with each other in the real world, we need evals where they interact over long horizons with state, tools, and measurable outcomes. so naturally, i made them cold-call insurance leads lol (shoutout Juliano Massarelli). i present SalesBench: a seller agent works a pipeline, calls an LLM buyer, manages time/tools/state, and gets scored by revenue closed. trained a 2B model on it + ran early frontier model sweeps. huge thanks to the Prime Intellect team and everyone who helped along the way @johannes_hage @willccbb @vincentweisser @GottliebEli @omouamoua @Ameen_ml @DennwsLee @OmShastri123 full breakdown: https://hamzamostafa.com/blog/salesbench

11:56 AM · May 19, 2026 View on X
SalesBench Launches Long-Horizon Agent-to-Agent Sales Evaluation Benchmark · Digg