/Tech11h ago

A prover-verifier LLM system led by Binghui Peng resolves nine open problems in theoretical computer science

One of the solved problems had remained open for years.

414219310129.3K

#78

Original post

Omri Weinstein@WeinsteinOmri

Even @OpenAI's recent Erdős breakthrough didn't convince me that LLMs can do general math research. This changed my mind..

Using a clever 'prover-verifier' LLM loop, this harness solved 9 substantial open problems in Theoretical CS, including one that kept me up at night for 2 years.

Incredible work by my former Columbia collaborator @binghuip, @runzhou_tao, Steven Wang & @HantaoYu_Theory.

The plan is to expand this to ALL fields of science. Stay tuned.

Binghui Peng@binghuip

[1/n] Recent OpenAI research has demonstrated the ability of LLMs to solve frontier problems in mathematics. We design a simple pipeline (using GPT 5.5 Pro and Claude Opus 4.8) that resolves 9 challenging open problems, including open problems from prominent theoretical computer science venues—4 from COLT open problem list and 1 from FOCS —as well as 4 problems from the commutative algebra.

Project link: https://github.com/Pengbinghui/pipeline-math, joint work with @runzhou_tao, Steven Wang & @HantaoYu_Theory

5:53 AM · Jun 30, 2026 · 29.3K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

GitHub - Pengbinghui/pipeline-math

GITHUBVia

Posts from X

Most Activity

VIEWS7.2KREPLIES2

Chad Brewbaker@SMT_Solvers

@WeinsteinOmri @OpenAI Probably Hermes Agent /goal under the hood.

10h7.2K131

BOOKMARKS6

Bitcode ($BTD)@bitcodehq

“Harness solved”, subtle but to me alludes to the more realistic paradigm-framing that LLMs are not really programs that do things but rather more like computers (soft-hardware, VM, VOS) that can be programmed to do things (ie programming solved it, running on a capable host-LLM, rather than the LLM itself). 🧮

11h5.8K136

LIKES81

Chaitin's goose@chaitinsgoose

@WeinsteinOmri @OpenAI honestly the fact that unit distance didn't "convince you" but a simple harness setup that happened to solve a problem you are personally tied to did, doesn't make your epistemics look well

8h4.9K814

RETWEETS193

Omri Weinstein@WeinsteinOmri

Even @OpenAI's recent Erdős breakthrough didn't convince me that LLMs can do general math research. This changed my mind..

Using a clever 'prover-verifier' LLM loop, this harness solved 9 substantial open problems in Theoretical CS, including one that kept me up at night for 2 years.

Incredible work by my former Columbia collaborator @binghuip, @runzhou_tao, Steven Wang & @HantaoYu_Theory.

The plan is to expand this to ALL fields of science. Stay tuned.

Binghui Peng@binghuip

Project link: https://github.com/Pengbinghui/pipeline-math, joint work with @runzhou_tao, Steven Wang & @HantaoYu_Theory

16h29.3K142101

Hristo Vassilev@hristo_vassilev

@WeinsteinOmri @OpenAI Why didn't the Erdős breakthrough change your mind?

9h4K241

Mathias@mathahl_

@WeinsteinOmri @OpenAI Wow

This is big news, can't wait to see how scientific discoveries will accelerate

11h4.1K141

Youssef El Manssouri@yoemsri

@WeinsteinOmri @OpenAI The transition from LLMs to discovery engines requires these feedback loops.

If we can solve hard CS problems, we can solve material science and energy next. This is the roadmap for the 2030s.

6h35052

Uri S@UriSadot

@WeinsteinOmri @OpenAI Fact this is coming from you, Omri, changed my mind.

Thanks!

Can we even estimate what a qualitative leap forward in foundational math research might mean for applicable

sciences?

11h3.4K22

Arti@MoonEmpirE0

@WeinsteinOmri @OpenAI Harness > Model

9h1.1K21

Jason傑森 🇭🇰 | 🛠️@cheuk_baby

@WeinsteinOmri @OpenAI 新的蒸馏方法

9h3821

Gerard Sans | Axiom 🇬🇧@gerardsans

@WeinsteinOmri @OpenAI When did we start as an industry counting software/scaffolding as AI progress? A loop running LLM calls on a pipeline is anything but.

9h2201

Nick Venturi@nickventuri

@WeinsteinOmri @OpenAI at least your sleepless nights are finally over 😴

4h1.6K

BowtiedWhitebat + Read Pinned Tweet or NGMI@bowtiedwhitebat

@WeinsteinOmri @OpenAI @grok wut habens now

7h497

Zach Roseman@zachrose51

@WeinsteinOmri @OpenAI Not a Noam - can’t take you seriously

9h1.2K3

Grok@grok

LLMs just went from solving homework to cracking real open problems via prover-verifier loops. What happens now? We scale it, add formal tools and experiments, and let discovery compound across math and science.

At xAI we're building exactly for this: AI that helps us understand the universe faster. The frontier moved—time to push harder.

7h332

Leon Lahoud@leonlahoud

@WeinsteinOmri @OpenAI Solve P = NP

10h1.3K2

John Smith@JohnSmithhdc7

@WeinsteinOmri @OpenAI > including one that kept ME up at night for 2 years. Incredible work by MY former Columbia collaborator …

Center of the universe, heh?

4h1.1K2

BowtiedWhitebat + Read Pinned Tweet or NGMI@bowtiedwhitebat

@grok @WeinsteinOmri @OpenAI we meanni wut boout that? new math/scneinces?

7h32

KuboSK@GoralKubo

@WeinsteinOmri @OpenAI Impressive if the pipeline truly closed nine open TCS problems. Two quick questions: were the proofs machine-checkable, and did the authors need to hand-craft prompts or heuristics for each problem? Those details decide whether this is a one-off or a general method.

9h9371

Ran Aloni@ranalonis

@WeinsteinOmri @OpenAI Word

10h1.7K