/AI1d ago

Google's LEAP framework uses Lean verifier feedback to solve all 12 Putnam 2025 mathematics problems

It raised one-shot Lean-IMO-Bench solve rates to 70%.

485156548666.5K

#483

Original post

elvis@omarsar0

New research from Google.

Just shows the impressive results you can get from custom agent harnesses.

LEAP wraps a general-purpose LLM in an agentic scaffold that grounds every step in the Lean compiler and iterates against verifier feedback.

The same general model solves all 12 Putnam 2025 problems and lifts Lean-IMO-Bench one-shot solve rate from under 10% to 70%, beating a specialized gold-medal system that scores 48%.

Paper: https://arxiv.org/abs/2606.03303

Learn to build effective AI agents in our academy: https://academy.dair.ai/

8:01 AM · Jun 3, 2026 · 32.3K Views

/AI1d ago

Google's LEAP framework uses Lean verifier feedback to solve all 12 Putnam 2025 mathematics problems

It raised one-shot Lean-IMO-Bench solve rates to 70%.

--0--

#483

Original post

elvis@omarsar0

New research from Google.

Just shows the impressive results you can get from custom agent harnesses.

LEAP wraps a general-purpose LLM in an agentic scaffold that grounds every step in the Lean compiler and iterates against verifier feedback.

The same general model solves all 12 Putnam 2025 problems and lifts Lean-IMO-Bench one-shot solve rate from under 10% to 70%, beating a specialized gold-medal system that scores 48%.

Paper: https://arxiv.org/abs/2606.03303

Learn to build effective AI agents in our academy: https://academy.dair.ai/

8:01 AM · Jun 3, 2026 · 32.3K Views

Sentiment

Many users praised Google's LEAP for raising general LLMs' formal math solve rate to 70% via verifier loops and scaffolding, calling the gains remarkable and showing scaffolding matters more than base models.

Pos

91.6%

Neg

8.4%

14 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS49.1KBOOKMARKS80LIKES94RETWEETS25REPLIES7

Rohan Paul@rohanpaul_ai

Another great paper from Google.

Shows general LLMs can solve formal math by planning proofs and checking each step. Raised general LLM performance from under 10% to 70%.

A general LLM failed badly when asked to write full formal proofs in 1 try, but became much stronger when it planned, split the work into smaller claims, reused past claims, and learned from Lean’s feedback.

The paper shows the weakness was not just the model’s math ability, but the way it was being used - the absence of structured interaction with a verifier.

The key idea is that the model does not try to write one giant perfect proof at once, because that usually fails on long and tricky problems.

Instead, LEAP stores the proof as a graph of goals and subgoals, so useful lemmas can be reused instead of rediscovered every time.

The authors tested LEAP on Putnam 2025 and a new Lean benchmark built from 60 IMO-style problems, where ordinary one-shot proof writing did very poorly.

LEAP solved all 12 Putnam 2025 problems and raised general LLM performance on the Lean IMO benchmark from under 10% to 70%.

----

Link – arxiv. org/abs/2606.03303

Title: "LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks"

6h49.1K9480