Prime Intellect launches Hosted Evaluations, managing sandboxes and compute infrastructure to simplify complex AI model benchmarks
The platform supports testing models like Claude Opus 4.7.
the way to make post-training easier to build powerful general tooling where training is an opt-in feature
evals are environments
Today, we are launching Hosted Evaluations on the platform. Running evals is an infra problem: harnesses, sandboxes, hours of compute, hundreds of parallel runs. Running evals is hard. Until now.
@eliebakouch wait so the RL rollout viewer is the same as the eval rollout viewer? does this mean evals and environments are the same thing?? and people can go from evals to post-training with a single command???
look at how beautiful this rollouts viewer is, never been easier to create, run and look at (eval) data
look at how beautiful this rollouts viewer is, never been easier to create, run and look at (eval) data

Today, we are launching Hosted Evaluations on the platform. Running evals is an infra problem: harnesses, sandboxes, hours of compute, hundreds of parallel runs. Running evals is hard. Until now.
deep dive by @xeophon here:
Hosted evals are finally live!! Smal vid showing how to use them, more to come : )
it’s beautiful
Today, we are launching Hosted Evaluations on the platform. Running evals is an infra problem: harnesses, sandboxes, hours of compute, hundreds of parallel runs. Running evals is hard. Until now.
@xeophon @vincentweisser Managing eval infra? Dw about it kitten.
Hosted evals are finally live!! Smal vid showing how to use them, more to come : )
@johannes_hage @dominik_scherm .@dominik_scherm is the 🐐 for solving all my weird setups
.@xeophon & @dominik_scherm have been cooking on the smoothest experience to run your evals read more: https://www.primeintellect.ai/blog/hosted-evaluations
Hosted evals are finally live!!
Smal vid showing how to use them, more to come : )
Today, we are launching Hosted Evaluations on the platform. Running evals is an infra problem: harnesses, sandboxes, hours of compute, hundreds of parallel runs. Running evals is hard. Until now.
.@xeophon & @dominik_scherm have been cooking on the smoothest experience to run your evals
read more: https://www.primeintellect.ai/blog/hosted-evaluations
Today, we are launching Hosted Evaluations on the platform. Running evals is an infra problem: harnesses, sandboxes, hours of compute, hundreds of parallel runs. Running evals is hard. Until now.