/Tech3h ago

NYU's Andrew Gordon Wilson releases updated DiscoverPhysics benchmark, where Claude Fable 5 leads with a 72.7% Pass@5 rate

The model outperformed Claude Opus 4.7 by 1.5x.

044683.1K
Original post
Andrew Gordon Wilson@andrewgwils#159inTech

Claude Fable has made significant progress on the current worlds in our DiscoverPhysics benchmark, led by @Space_Boy_Matt and @LindsayMSmith3, solving difficult worlds with latent structure that other models cannot!

So excited about this project. Despite all the talk about AGI, AI has barely scratched the surface of discovering scientific theories or even giving us new scientific insights. DiscoverPhysics is a benchmark for the future.

9:45 AM · Jun 10, 2026 · 3K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS331LIKES7RETWEETS3
Lindsay Smith@LindsayMSmith3

DiscoverPhysics leaderboard update: Claude Fable 5 solves the majority of our current worlds! Pass@5 rate: 73%, ~1.5x next best model (Opus 4.7, 50%). It even passes difficult worlds with latent structure, which all other models struggle with. 🌐 https://sampsonml.github.io/DiscoverPhysicsLeaderboard/

Matt Wiemann@Space_Boy_Matt

Can LLMs discover new laws of physics?

We present DiscoverPhysics, a pipeline to benchmark LLM agents on experimentation, analysis and discovery. https://arxiv.org/abs/2605.26087 Co-led by @LindsayMSmith3 w/ Peter Melchior, @kdqg1 @andrewgwils @Pavel_Izmailov Carol Cuesta-Lazaro 1/10

2hViews 331Likes 7Bookmarks 0