So, how does Arena make money?
Our platform serves as a real-world CI/CD system for AI models on post-deployment user interactions. By helping businesses understand the strengths and weaknesses of models in-the-wild, we can help them improve their performance not just on benchmarks, but for actual people. Once a model is staged for public release, we evaluate it for free, for the good of the community.
The constant, live stream of diverse queries ensures that the evaluations happen on real work that models can't study for, and reflect model performance on code writing/debugging, research, brainstorming, creative generation, document creation and professional productivity. Real tasks, judged by the people who do them. Today the Arena platform has 700M+ total conversations, 82M+ total votes, and over 10M+ monthly visitors from over 150 countries around the world. Every day, ~80% of the user queries are net-new.
Arena has crossed $100M in annualized revenue run rate, eight months after launching our evaluation product.
With our recent release of Agent Mode, millions of users on Arena are doing real work with agents, from coding to document analysis, in long-running, multi-turn sessions with hundreds of tool calls. Arena now evaluates objective criteria like task completion rates, hallucination rates, and more, far beyond our original human preference voting model. This expansion has taken us from a student project at Berkeley to one of the fastest growing companies in history. Go Bears! 🐻
Our core thesis is simple: to align AI with human values, we must directly measure its impact on people in the real world. Today's milestone is proof that Arena’s platform is the de-facto standard for post-deployment evaluation of AI.














