/Tech3h ago

NYU's Andrew Gordon Wilson releases updated DiscoverPhysics benchmark, where Claude Fable 5 leads with a 72.7% Pass@5 rate

The model outperformed Claude Opus 4.7 by 1.5x.

044683.1K

#159

Original post

Andrew Gordon Wilson@andrewgwils#159inTech

Claude Fable has made significant progress on the current worlds in our DiscoverPhysics benchmark, led by @Space_Boy_Matt and @LindsayMSmith3, solving difficult worlds with latent structure that other models cannot!

Andrew Gordon Wilson@andrewgwils

So excited about this project. Despite all the talk about AGI, AI has barely scratched the surface of discovering scientific theories or even giving us new scientific insights. DiscoverPhysics is a benchmark for the future.

9:45 AM · Jun 10, 2026 · 3K Views

/Tech3h ago

NYU's Andrew Gordon Wilson releases updated DiscoverPhysics benchmark, where Claude Fable 5 leads with a 72.7% Pass@5 rate

The model outperformed Claude Opus 4.7 by 1.5x.

044683.1K

#159

Original post

Andrew Gordon Wilson@andrewgwils#159inTech

Andrew Gordon Wilson@andrewgwils

9:45 AM · Jun 10, 2026 · 3K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS331LIKES7RETWEETS3

Lindsay Smith@LindsayMSmith3

DiscoverPhysics leaderboard update: Claude Fable 5 solves the majority of our current worlds! Pass@5 rate: 73%, ~1.5x next best model (Opus 4.7, 50%). It even passes difficult worlds with latent structure, which all other models struggle with. 🌐 https://sampsonml.github.io/DiscoverPhysicsLeaderboard/

Matt Wiemann@Space_Boy_Matt

Can LLMs discover new laws of physics?

We present DiscoverPhysics, a pipeline to benchmark LLM agents on experimentation, analysis and discovery. https://arxiv.org/abs/2605.26087 Co-led by @LindsayMSmith3 w/ Peter Melchior, @kdqg1 @andrewgwils @Pavel_Izmailov Carol Cuesta-Lazaro 1/10

2h33170