Google DeepMind's LLM-Lean agent loop resolves 9 open Erdős math problems and proves 44 OEIS conjectures
Each proof cost a few hundred dollars to generate
neurosymbolic by @swarat et al for the Erdos win, with much more careful, quantitative work than openai’s
in hindsight i wonder whether OpenAI rushed theirs out, knowing this was coming?
Another 9 open Erdos problems solved, this time by DeepMind team. Interesting loop of LLM - Lean agents working autonomously, and only after it's verified formally, going through human review.
neurosymbolic by @swarat et al for the win.
unless this much more careful, quantitative work than openai’s
Another 9 open Erdos problems solved, this time by DeepMind team. Interesting loop of LLM - Lean agents working autonomously, and only after it's verified formally, going through human review.
Tiling the lightcone of knowledge
SITUATION DETECTED: Google DeepMind’s AI agent autonomously solved 9 of 353 open Erdos problems in mathematics, at a cost of a few hundred dollars per problem.
So what is the average social value (in $) per Erdos problem solved?
SITUATION DETECTED: Google DeepMind’s AI agent autonomously solved 9 of 353 open Erdos problems in mathematics, at a cost of a few hundred dollars per problem.
@So8res Sci-fi has long been fantasies of sci being the center of power and status. It is in fact neither.
In sci-fi books and movies, AI solving a bunch of math problems that stood open for decades would've been a big deal. Why isn't the mainstream media turning this into a bunch of sensational stories?
@robinhanson Tending to zero
So what is the average social value (in $) per Erdos problem solved?
Knowing quite a few of the Erdős epsilons personally, I think Erdős would have been thrilled.
The paper in general is worth reading, it's focused on areas where the lean ecosystem is more mature.
The Erdős problems are these:

"Our most capable agent autonomously resolved 9 of 353 open Erdős problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics research."
Amazing progress
Google just solved 9/353 open Erdős Problems at the cost of a few hundred dollars each using its most capable LLM. The proofs were written in Lean and mechanically verified. This is no longer just olympiad mathematics.
Nine more Erdős problems have been solved.
This time, however, by Google DeepMind.
This shouldn't be underestimated, because on the one hand it increases competitive pressure, and on the other hand it proves that the other Frontier Labs can easily keep up.
Another 9 open Erdos problems solved, this time by DeepMind team. Interesting loop of LLM - Lean agents working autonomously, and only after it's verified formally, going through human review.
Google DeepMind's AlphaProof Nexus autonomously solved 9 open Erdős problems, some unsolved for 56 years, at a cost of a few hundred dollars per problem.
It also proved 44 open OEIS conjectures, resolved a 15-year-old question in algebraic geometry, and discovered a novel algorithmic parameter in optimization theory that humans hadn't found.
The core mechanism combines LLM reasoning (Gemini 3.1 Pro hype?!) with Lean formal verification. The AI generates proof attempts, Lean's compiler checks every logical step automatically. No human review needed to confirm correctness.
The most surprising finding: a basic agent that simply alternates LLM generation with compiler feedback replicated all 9 Erdős successes. The full-featured system with evolutionary search and reinforcement learning only provided meaningful advantages on the hardest problems.
This shows a more recent broader trend: as foundation models improve, simple agentic loops are catching up to complex specialized architectures . What sets this apart from OpenAI's informal proof approach: formal verification acts as an automatic filter. The failure analysis showed the AI frequently hallucinated lemmas it claimed were established results, and often disguised the core difficulty by rephrasing it as a helper lemma. Informal proofs would let these errors pass. Lean catches them immediately.
The agent also detected misformalizations in existing mathematical literature, correcting ambiguities in problem statements before solving the corrected versions. It served as both a solver and a diagnostic tool.
Current limitations are real. Successes cluster in combinatorics, number theory, and optimization where Lean's math library is mature. Problems requiring substantial new theory remain out of reach. Most Erdős problems still weren't solved tho.
Paper: https://arxiv.org/html/2605.22763v1
Google DeepMind's AlphaProof Nexus autonomously solved 9 open Erdős problems, some unsolved for 56 years, at a cost of a few hundred dollars per problem. It also proved 44 open OEIS conjectures, resolved a 15-year-old question in algebraic geometry, and discovered a novel algorithmic parameter in optimization theory that humans hadn't found. The core mechanism combines LLM reasoning (Gemini 3.1 Pro hype?!) with Lean formal verification. The AI generates proof attempts, Lean's compiler checks every logical step automatically. No human review needed to confirm correctness. The most surprising finding: a basic agent that simply alternates LLM generation with compiler feedback replicated all 9 Erdős successes. The full-featured system with evolutionary search and reinforcement learning only provided meaningful advantages on the hardest problems. This shows a more recent broader trend: as foundation models improve, simple agentic loops are catching up to complex specialized architectures . What sets this apart from OpenAI's informal proof approach: formal verification acts as an automatic filter. The failure analysis showed the AI frequently hallucinated lemmas it claimed were established results, and often disguised the core difficulty by rephrasing it as a helper lemma. Informal proofs would let these errors pass. Lean catches them immediately. The agent also detected misformalizations in existing mathematical literature, correcting ambiguities in problem statements before solving the corrected versions. It served as both a solver and a diagnostic tool. Current limitations are real. Successes cluster in combinatorics, number theory, and optimization where Lean's math library is mature. Problems requiring substantial new theory remain out of reach. Most Erdős problems still weren't solved tho.
You know we hit AGI when the AI solves problems that the vast majority cannot remotely grok. Furthermore, I doubt anybody without a long formal education can grok what the hell was just solved. So those advocating for the end of higher education and research have a Dunning-Kruger syndrome.
Another 9 open Erdos problems solved, this time by DeepMind team. Interesting loop of LLM - Lean agents working autonomously, and only after it's verified formally, going through human review.
In sci-fi books and movies, AI solving a bunch of math problems that stood open for decades would've been a big deal. Why isn't the mainstream media turning this into a bunch of sensational stories?
I'm old enough to remember when everyone thought AI solving ONE novel math problem would be a front page story around the world Today, AI solved not one, but NINE open problems - some 50 years old. AND proved ***44*** out of 492 open OEIS conjectures. Zero media coverage.
(It's because humanity is sleepwalking into the creation of machine superintelligence. Once you realize that, it's easier to realize that we're also sleepwalking into a whole lot of danger.)
In sci-fi books and movies, AI solving a bunch of math problems that stood open for decades would've been a big deal. Why isn't the mainstream media turning this into a bunch of sensational stories?

