Lindsay M. Smith and Matt Wiemann launch DiscoverPhysics to evaluate LLM agents on scientific experimentation and physical law discovery
The pipeline tests if LLMs can design scientific experiments.
So excited about this project. Despite all the talk about AGI, AI has barely scratched the surface of discovering scientific theories or even giving us new scientific insights. DiscoverPhysics is a benchmark for the future.
Can LLMs discover new laws of physics? We present DiscoverPhysics, a pipeline to benchmark LLM agents on experimentation, analysis and discovery. https://arxiv.org/abs/2605.26087 Co-led by @LindsayMSmith3 w/ Peter Melchior, @kdqg1 @andrewgwils @Pavel_Izmailov Carol Cuesta-Lazaro 1/10
Very excited to release DiscoverPhysics, a new benchmark and evaluation pipeline for experimentation and discovery in LLMs.
🌐 https://sampsonml.github.io/DiscoverPhysicsLeaderboard/ 📰 http://arxiv.org/abs/2605.26087

We design interactive worlds, where the model can run experiment with the goal to figure out how the world works. I believe this methodology can scale to extremely complex worlds, making this task potentially superhuman.

Very excited to release DiscoverPhysics, a new benchmark and evaluation pipeline for experimentation and discovery in LLMs. 🌐 https://sampsonml.github.io/DiscoverPhysicsLeaderboard/ 📰 http://arxiv.org/abs/2605.26087
with awesome collaborators @Space_Boy_Matt @LindsayMSmith3 Peter Melchior @kdqg1 @andrewgwils and Carol Cuesta-Lazaro.
See also Matt's detailed thread with lots of interesting results:
Can LLMs discover new laws of physics? We present DiscoverPhysics, a pipeline to benchmark LLM agents on experimentation, analysis and discovery. https://arxiv.org/abs/2605.26087 Co-led by @LindsayMSmith3 w/ Peter Melchior, @kdqg1 @andrewgwils @Pavel_Izmailov Carol Cuesta-Lazaro 1/10
with awesome collaborators @Space_Boy_Matt @LindsayMSmith3 Peter Melchior @kdqg1 @andrewgwils @Pavel_Izmailov and Carol Cuesta-Lazaro.
See also Matt's detailed thread with lots of interesting results:
Very cool, I wonder how sensitive the results are to the data format? Humans are really good at getting an intuition of the laws of physics and not the exact numbers from vision, and only after it to try to verify it with experiments
Can LLMs discover new laws of physics? We present DiscoverPhysics, a pipeline to benchmark LLM agents on experimentation, analysis and discovery. https://arxiv.org/abs/2605.26087 Co-led by @LindsayMSmith3 w/ Peter Melchior, @kdqg1 @andrewgwils @Pavel_Izmailov Carol Cuesta-Lazaro 1/10