A year ago, I predicted that we would enter The Era of Evals, but it's now happening much faster than I anticipated.
Frontier Labs have scaled their Eval production with us by more than 10X in the last 12 months, on what was already a 9-figure base.
Every tech-forward enterprise is rapidly building evals for its agents. @mercor_ai is spending over $10M / month on inference and now has Evals for every agent deployment.
We need Evals to know (1) what model to use, (2) what context or tools we should add to improve the model, and (3) whether it's working in production.
One CTO of a $25B enterprise told me he used to have a product roadmap, but replaced his entire product roadmap with an Eval roadmap.
Mercor (@mercor_ai) is now working with 6 out of the Magnificent 7, all of the top 5 AI labs, and most of the top application layer companies. One trend is common across every customer: we are entering The Era of Evals.
RL is becoming so effective that models will be able to saturate any evaluation. This means that the primary barrier to applying agents to the entire economy is building evals for everything.
This will be one of the largest buildouts we have ever seen with enterprises pouring hundreds of billions of dollars into evals for every workflow we want agents to automate.
We're quickly defining a new class of work and hiring across nearly every domain: software engineers, consultants, bankers, lawyer, doctors, gamers, and many more.




