/Tech1h ago

Artificial Analysis Launches AgentPerf, Industry's First Agentic AI Benchmark

4133102.5K

Original post

(DeepSeek V3.2, GLM 4.7, and Kimi K2.5) prompted to resolve issues in real public code repositories. All trajectories include interleaved reasoning and tool calls.

2/4

Bojan Tunguz@tunguz

Artificial Analysis just announced AgentPerf, the industry’s first agentic AI benchmark. The benchmark uses several real-world agentic use trajectories and employs OpenCode agentic harness using three top open-source models with reasoning enabled

1/4

5:37 AM · Jun 13, 2026 · 941 Views

Sentiment

Users criticized the AgentPerf benchmark launch for relying on a brutally bad or outdated model choice and questioned whether the post was serious.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS2KBOOKMARKS10LIKES11RETWEETS3REPLIES3

Bojan Tunguz@tunguz

1/4

1h2K1110

Bojan Tunguz@tunguz

Among other findings the benchmarks shows that NVIDIA's GB300 outperforms H200 by a whopping factor of 20 in terms of running concurrent agents per megawatt.

3/4

1h2712

Bojan Tunguz@tunguz

AA methodology: https://artificialanalysis.ai/methodology/agentperf Nvidia technical blog: https://developer.nvidia.com/blog/nvidia-achieves-leading-agentic-coding-performance-on-first-agentic-ai-benchmark/

4/4

1h217

Artus Krohn-Grimberghe@artuskg

@tunguz The model choice is brutally bad/outdated. Is this a serious post? 😅