/AI7h ago

Google DeepMind releases LEAP, using the Lean theorem prover to boost LLM math success rates to 70%

The system also verified the research-level Erdős problem.

--0--
Original post
Rohan Paul@rohanpaul_ai

Another great paper from Google.

Shows general LLMs can solve formal math by planning proofs and checking each step. Raised general LLM performance from under 10% to 70%.

A general LLM failed badly when asked to write full formal proofs in 1 try, but became much stronger when it planned, split the work into smaller claims, reused past claims, and learned from Lean’s feedback.

The paper shows the weakness was not just the model’s math ability, but the way it was being used - the absence of structured interaction with a verifier.

The key idea is that the model does not try to write one giant perfect proof at once, because that usually fails on long and tricky problems.

Instead, LEAP stores the proof as a graph of goals and subgoals, so useful lemmas can be reused instead of rediscovered every time.

The authors tested LEAP on Putnam 2025 and a new Lean benchmark built from 60 IMO-style problems, where ordinary one-shot proof writing did very poorly.

LEAP solved all 12 Putnam 2025 problems and raised general LLM performance on the Lean IMO benchmark from under 10% to 70%.

----

Link – arxiv. org/abs/2606.03303

Title: "LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks"

3:09 PM · Jun 4, 2026 · 86.4K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most Activity
RETWEETS21
Violet Peng@VioletNPeng

My first paper at Google is out! Thank you @rohanpaul_ai for highlighting LEAP.

To share more thoughts on this direction: I strongly believe that as models generate longer and more complex proofs, automatic formal verification will be the key to the future of AI for math, and I'm bullish on using general LLMs + agentic framework for this task.

As we started with competition math in LEAP for rigorous benchmarking purposes, we've already started to venture into research math. - Solved Erdős problem 527 (zero web search). - Partially formalized Knuth's cycle problem even case which resulted in ~4000 lines of Lean code.

Please check out all of our solutions here: https://github.com/google-deepmind/superhuman/tree/main/leap

I'm incredibly proud of this work, and we are just getting started. More to come!

Rohan Paul@rohanpaul_ai

Another great paper from Google.

Shows general LLMs can solve formal math by planning proofs and checking each step. Raised general LLM performance from under 10% to 70%.

A general LLM failed badly when asked to write full formal proofs in 1 try, but became much stronger when it planned, split the work into smaller claims, reused past claims, and learned from Lean’s feedback.

The paper shows the weakness was not just the model’s math ability, but the way it was being used - the absence of structured interaction with a verifier.

The key idea is that the model does not try to write one giant perfect proof at once, because that usually fails on long and tricky problems.

Instead, LEAP stores the proof as a graph of goals and subgoals, so useful lemmas can be reused instead of rediscovered every time.

The authors tested LEAP on Putnam 2025 and a new Lean benchmark built from 60 IMO-style problems, where ordinary one-shot proof writing did very poorly.

LEAP solved all 12 Putnam 2025 problems and raised general LLM performance on the Lean IMO benchmark from under 10% to 70%.

----

Link – arxiv. org/abs/2606.03303

Title: "LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks"

18hViews 59.4KLikes 165Bookmarks 71