Prime Intellect launches Hosted Evaluations to automate infrastructure like sandboxes and compute for AI model benchmarks
Pre-built testing environments include AutomationBench and tau2-bench.
it’s beautiful
Today, we are launching Hosted Evaluations on the platform. Running evals is an infra problem: harnesses, sandboxes, hours of compute, hundreds of parallel runs. Running evals is hard. Until now.
@xeophon @vincentweisser Managing eval infra? Dw about it kitten.
Hosted evals are finally live!! Smal vid showing how to use them, more to come : )
@johannes_hage @dominik_scherm .@dominik_scherm is the 🐐 for solving all my weird setups
.@xeophon & @dominik_scherm have been cooking on the smoothest experience to run your evals read more: https://www.primeintellect.ai/blog/hosted-evaluations
Hosted evals are finally live!!
Smal vid showing how to use them, more to come : )
Today, we are launching Hosted Evaluations on the platform. Running evals is an infra problem: harnesses, sandboxes, hours of compute, hundreds of parallel runs. Running evals is hard. Until now.
.@xeophon & @dominik_scherm have been cooking on the smoothest experience to run your evals
read more: https://www.primeintellect.ai/blog/hosted-evaluations
Today, we are launching Hosted Evaluations on the platform. Running evals is an infra problem: harnesses, sandboxes, hours of compute, hundreds of parallel runs. Running evals is hard. Until now.