1h ago

Prime Intellect's Florian Brand says running MirrorCode benchmark evaluations on advanced models will cost over $100,000 per run

Fully eliciting performance requires up to 100 million tokens

4802263

——0——

Original post

@cwolferesearch Also, it’s so damn costly! From a talk of mine, based on @tmkadamcz's information/calculation of running MirrorCode with a bunch of models:

3:07 PM · May 30, 2026

#1153Florian Brand@XEOPHON

@cwolferesearch @tmkadamcz And ProgramBench (200 tasks) has reported something like 5-10K per model run on the high end (some tweet, hard to find on the spot). PB underelicits the capabilities, imo. So a proper run would be like 10-20K+ for one model. Ant likely spent 50-100K for the 4.8 figure

Cameron R. Wolfe, Ph.D.@cwolferesearch

@xeophon @tmkadamcz thanks for sharing!!

10:12 PM · May 30, 2026 · 27 Views

10:15 PM · May 30, 2026 · 31 Views

#1153Florian Brand@XEOPHON

@cwolferesearch @tmkadamcz That‘s raw API costs, add the costs of hundreds of sandboxes running for hours or days on top. Small in the grand scheme of things rn, but something to consider. CPUs and RAM are also resources these days

Florian Brand@xeophon

10:15 PM · May 30, 2026 · 31 Views

10:16 PM · May 30, 2026 · 29 Views

#1153Florian Brand@XEOPHON

@cwolferesearch @tmkadamcz And, last cost-based post from the same talk: Based on public information, you can calculate the cost of evals like APEX-Agents or RLI (iirc RLI has something like 20-30K in costs for the data acquisition alone)

Florian Brand@xeophon

10:16 PM · May 30, 2026 · 29 Views

10:28 PM · May 30, 2026 · 29 Views

#1444Cameron R. Wolfe, Ph.D.@CWOLFERESEARCH

@xeophon @tmkadamcz thanks for sharing!!

Florian Brand@xeophon

@cwolferesearch Also, it’s so damn costly! From a talk of mine, based on @tmkadamcz's information/calculation of running MirrorCode with a bunch of models:

10:07 PM · May 30, 2026 · 177 Views

10:12 PM · May 30, 2026 · 27 Views

Prime Intellect's Florian Brand says running MirrorCode benchmark evaluations on advanced models will cost over $100,000 per run

Sentiment

Cluster engagement