Excited to share a research collaboration with @ScaleAILabs - we rigorously evaluate bio agents on different models across 82 drug discovery tasks - interesting findings include: (1) know-how/environment >>> models (2) different LLMs have different strength - highlighting a need for model-routing for biology agents:
We get this question a lot: "Which model is best for drug discovery?"
Our new benchmark announced today with @ScaleAILabs, DrugDiscoveryBench (82 tasks from working drug discovery scientists, run on Biomni Open Source Environment), has a clear answer: the model matters far less than what you build around it.
🧵3 key takeaways →