8h ago

METR_Evals Advances LLM Evaluations With Open-Source Scalability Fixes

11611224

——0——

Original post

I have it on good authority that at @METR_Evals we have been consistently pushing the envelope of what is possible with LLM evaluations with open-source infrastructure. We are often the first organization to run into scalability issues with Inspect and associated tooling. We contribute all of our fixes upstream and are committed to maintaining our open-source tooling for other organizations that want to run evals at scale, which you can find at https://hawk.metr.org📷

11:28 AM · May 19, 2026