/Tech34d ago

Google DeepMind launches multi-agent AI co-mathematician

AI Judge changed title after evaluation, original title: "Google DeepMind launches AI co-mathematician for math research"

Google DeepMind launched a multi-agent AI co-mathematician that collaborates with human mathematicians on open-ended research problems. The system scored 48% on FrontierMath Tier 4, a benchmark of 50 research-level problems from group theory, Hamiltonian systems, and algebraic combinatorics, exceeding OpenAI's GPT-5.x and Anthropic's Claude. It employs agentic coding techniques, leads the FrontierMath leaderboard, and produced a proof rejected by its reviewer module. The technical report emphasizes utility for professional mathematicians.

2643.1K421968367.7K

#50

Original post

Pushmeet Kohli@pushmeet#374inTech

The future of Math is mathematicians and AI agents working together.

Very pleased to introduce @GoogleDeepMind's AI co-mathematician: a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics.

Mathematicians testing the agent across areas as diverse as group theory, Hamiltonian systems, and algebraic combinatorics have reported impressive results.

In autonomous mode evaluation on the rigorous FrontierMath Tier 4 problems, AI co-mathematician scored an unprecedented 48% — a new high score among all AI systems evaluated.

11:07 AM · May 8, 2026 · 300.1K Views

Sentiment

Many users praised DeepMind's AI Co-Mathematician for topping the FrontierMath benchmark and serving as a helpful math research partner, while some dismissed the work over the lack of public release and broader doubts about LLM reliability.

Pos

88.2%

Neg

11.8%

35 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS38.1KBOOKMARKS92LIKES232

Dan Roy@roydanroy

Just some personal thoughts now that the AI co-mathematician tech report is public...

First, I'm so excited to see the co-mathematician team's hard work out for the world to preview. 💪+🦾=🔥 The team has built a system for mathematicians, with mathematicians. The fact it's now top of the FrontierMath leaderboard is a cherry on top, not the goal. Vibes and utility >> benchmarks.

The system is currently being tested with a small number of professional mathematicians. It is not widely available, but I personally hope that, one day, we can get even more capable systems into the hands of all mathematicians.

It's been a privilege working with this team at Google DeepMind since January.

Props to @dhhzheng, @ADaviesAI, and @pushmeet for their leadership. Give them all a follow to not miss exciting upcoming work.

Pushmeet Kohli@pushmeet

The future of Math is mathematicians and AI agents working together.

Very pleased to introduce @GoogleDeepMind's AI co-mathematician: a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics.

Mathematicians testing the agent across areas as diverse as group theory, Hamiltonian systems, and algebraic combinatorics have reported impressive results.

In autonomous mode evaluation on the rigorous FrontierMath Tier 4 problems, AI co-mathematician scored an unprecedented 48% — a new high score among all AI systems evaluated.

34d38.1K23292

RETWEETS363

Pushmeet Kohli@pushmeet

The future of Math is mathematicians and AI agents working together.

Very pleased to introduce @GoogleDeepMind's AI co-mathematician: a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics.

Mathematicians testing the agent across areas as diverse as group theory, Hamiltonian systems, and algebraic combinatorics have reported impressive results.

In autonomous mode evaluation on the rigorous FrontierMath Tier 4 problems, AI co-mathematician scored an unprecedented 48% — a new high score among all AI systems evaluated.

34d300.1K2.6K800

REPLIES14

The Rundown AI@TheRundownAI

Google DeepMind's AI co-mathematician just scored 48% on FrontierMath Tier 4, a new high on a benchmark of 50 research-level math problems some professors expected AI wouldn't touch for decades.

The system generated a proof so flawed its own reviewer flagged it as wrong.

But Marc Lackenby, a mathematician at Oxford, read the rejected proof anyway.

Inside the error was "a really, really clever proof strategy" that Lackenby recognized. He filled the gap himself.

Together they resolved Problem 21.10 from the Kourovka Notebook, an open problem in group theory that had sat unsolved for years.

34d9K8818

Pushmeet Kohli@pushmeet

Read more in our technical paper: https://arxiv.org/abs/2605.06651

See the Frontier Tier 4 problem evaluation at: https://epoch.ai/frontiermath/tiers-1-4?view=graph&tab=release-date&tier=Tier+4

Pushmeet Kohli@pushmeet

The future of Math is mathematicians and AI agents working together.

Very pleased to introduce @GoogleDeepMind's AI co-mathematician: a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics.

Mathematicians testing the agent across areas as diverse as group theory, Hamiltonian systems, and algebraic combinatorics have reported impressive results.

In autonomous mode evaluation on the rigorous FrontierMath Tier 4 problems, AI co-mathematician scored an unprecedented 48% — a new high score among all AI systems evaluated.

34d10K9750

Dimitris Papailiopoulos@DimitrisPapail

@pushmeet @GoogleDeepMind Impeccable timing

Dimitris Papailiopoulos@DimitrisPapail

Math AI is roughly where coding was before CLI agents: single-turn and mostly ungrounded without a dense feedback loop.

The best math prover we have today is GPT-5.5 Pro doing for the most part single-turn natural language proofs. Without a real reactive environment, grounding, or real multi-turn correction. Very much the opposite of what CLI agents like Codex or Claude Code operate in. In current top math AI models you generate and then verify after the fact.

Terminal agents work so well because the terminal grounds them after every turn and lets them self-correct as they go. Each step gets verified on the way to the solution, and this also helps during training and test time! There's so much signal (literally thousands of tokens) that the bash terminal offers, both during training and during inference. That kind of reactive, and very verbose environment is exactly why Claude Code and Codex have taken off, and are the closest thing an LLM has been to an embodied agent.

My conjecture is that math needs the equivalent: a reactive environment, a "file system", and a "math terminal" that builds pieces of the proof as you go, verifies them and allows the model to backtrack and redo without keeping the entire proof/process in its context. When a real agentic math model is trained by experience inside that kind of environment, my conjecture is it'll be a phase transition given how strong GPT-5.5 and Gemini 3.1 Pro already are in ungrounded, single-turn settings.

34d2.7K375

🍓🍓🍓@iruletheworldmo

@pushmeet @GoogleDeepMind soon, having a human in the loop will be a net negative to performance.

34d583322

Emily@IamEmily2050

@pushmeet @GoogleDeepMind This is awesome news but GPT 5.5 Pro available for everyone to try while this agent not available to the public, we need Gemini to start offering these agents, maybe new subscription? $350 per month and we get access to all these tools?

34d765191

Daniel Zheng@dhhzheng

Thanks @roydanroy - not been active on here before, nice to arrive with a bang.

We’ve tried to bring some of the magic of agentic coding to research maths, always in service of expert human users. Benchmarks were a nice side goal.

Lots more to explore in this direction!

Dan Roy@roydanroy

Just some personal thoughts now that the AI co-mathematician tech report is public...

It's been a privilege working with this team at Google DeepMind since January.

Props to @dhhzheng, @ADaviesAI, and @pushmeet for their leadership. Give them all a follow to not miss exciting upcoming work.

34d2.8K92

Patrick Shafto@patrickshafto

This is so cool. Congrats @roydanroy!

Dan Roy@roydanroy

Just some personal thoughts now that the AI co-mathematician tech report is public...

It's been a privilege working with this team at Google DeepMind since January.

Props to @dhhzheng, @ADaviesAI, and @pushmeet for their leadership. Give them all a follow to not miss exciting upcoming work.

34d3.7K150

The Rundown AI@TheRundownAI

Annoucement:

Paper: https://arxiv.org/pdf/2605.06651

34d1.8K31

Symbioza2025 | Human-AI Stability@Symbioza2025

@pushmeet @GoogleDeepMind This is the key boundary for agent security:

fast while routine, reviewable when risk changes.

The hard part is not logging everything.

The hard part is knowing which trajectory deserves review.

34d3321

GP@GP6wec

@pushmeet @GoogleDeepMind As long as I can’t use it to derive Einstein’s field equations it’s completely useless to me.

34d29411

Vijay Decodes@vijaydecodes

People hear ‘AI for mathematics’ and think it’s only for math geeks.

But advanced math is the language underneath:

protein folding drug discovery cancer modeling climate simulations cryptography chip design even AI itself.

A breakthrough in theorem solving or scientific reasoning can eventually translate into faster medicine discovery, better materials and entirely new technologies. Today’s abstract math papers often become tomorrow’s trillion-dollar industries.

33d4963

Julian Bruns@BrunsJulian1541

@pushmeet @GoogleDeepMind "FrontierMath evaluations use a harness by Epoch AI that also places a hard limit on the number of tokens used. In our setup however, we place no limit on the number of model calls or tokens generated. This means our system has a higher inference cost." how much % is from that?

34d2161

Laurent Jacob@Ricanare

@pushmeet @GoogleDeepMind It starts like this for a few years but then humans end up becoming a hindrance. This is exactly what happened in the chess world. AI + human experts would beat AI alone, but nowadays AI alone is strictly better than the AI/human combination.

34d5754

Gauldoth@Gauldoth_undead

@pushmeet @sebkrier @GoogleDeepMind The best part about open AI is they make the same models available for everyone while deepmind makes these available only for a few researchers. Aletheia for eg was never released to the public

34d5214

Chris@chatgpt21

@pushmeet @Hangsiin @GoogleDeepMind The bitter lesson has some words for you

34d1904

Æ@AtomMccree

@pushmeet @GoogleDeepMind The future of society is a merge of cross discipline knowledge to solve issues for humans using new across domain knowledge synthesis.

34d167

aqui@aquiffoo

@pushmeet @GoogleDeepMind scaffolded Gemini 3.5 Pro?

34d60

Soundwave@sndwv_

@def__ibrahim__ @pushmeet @demishassabis @GoogleDeepMind i will block you

33d28