Arena reached a $100M annual revenue run rate just 8 months after launching our evaluation product. We started as a research project at UC Berkeley with a simple mission: measure AI progress through real-world use. As AI shifts from chatbots to agents taking on longer, higher-stakes work, the problem matters more than ever.
Today, Arena measures real-world AI utility with a community of tens of millions. With Agent Arena, we’re evaluating long-running agents on complex, real-world tasks - how they use tools, adapt to feedback, recover from errors, and accomplish goals set by humans.
We are excited to keep deepening our work in agentic evaluations.
Here’s @ml_angelopoulos on what this milestone means and where we go from here:
Arena has crossed $100M in annualized revenue run rate, eight months after launching our evaluation product.
With our recent release of Agent Mode, millions of users on Arena are doing real work with agents, from coding to document analysis, in long-running, multi-turn sessions with hundreds of tool calls. Arena now evaluates objective criteria like task completion rates, hallucination rates, and more, far beyond our original human preference voting model. This expansion has taken us from a student project at Berkeley to one of the fastest growing companies in history. Go Bears! 🐻
Our core thesis is simple: to align AI with human values, we must directly measure its impact on people in the real world. Today's milestone is proof that Arena’s platform is the de-facto standard for post-deployment evaluation of AI.
















