OpenAI solved the unit distance problem in mathematics, prompting discussion on whether the work used informal chain-of-thought reasoning or neurosymbolic tools such as Lean

@littmath @yoavgo Honestly it wouldn't surprise me if GPT-5.5 Pro could solve it with a ton of test-time compute. That fits a pattern we've seen, where new models shift the intelligence vs ttc curve to the left.

Corollary: I'm sure there's other breakthroughs waiting to be found with GPT-5.5 Pro.

Noam Brown@polynoamial

@littmath @yoavgo It was just a single prompt, with no special guidance.

8:31 PM · May 21, 2026 · 580 Views

8:32 PM · May 21, 2026 · 531 Views

REPLY

@yoavgo @littmath It's hard to draw a line... we're talking about log scale, so at some point it becomes completely unrealistic to use such a huge amount of ttc. It's possible that GPT-5.5 Pro would need 1000x the cost of the internal model (to do it without any steering).

(((ل()(ل() 'yoav))))👾@yoavgo

@polynoamial @littmath do you share my (pretty uninformed) assessment that doing it with "a ton of ttc" is a quantitatively different kind of ability than doing it without?

8:35 PM · May 21, 2026 · 385 Views

8:37 PM · May 21, 2026 · 547 Views

REPLY

@yoavgo @littmath Can you clarify the question?

(((ل()(ل() 'yoav))))👾@yoavgo

@polynoamial @littmath i wasnt asking about cost though. imagine all computation was immediate and free

8:40 PM · May 21, 2026 · 221 Views

8:40 PM · May 21, 2026 · 288 Views

REPLY

#43Christian Szegedy@CHRSZEGEDY

@yoavgo @littmath I feel like this is a complicated point so I want to put together a longer post explaining my views.

(((ل()(ل() 'yoav))))👾@yoavgo

@polynoamial @littmath in terms of intelligence/abilities/skills/... is succeeding with "a ton of ttc" requires the same set of skills/abilities/intelligence/... as doing it without the tons of compute? or is it just using the same things less efficiently

8:44 PM · May 21, 2026 · 207 Views

9:04 PM · May 21, 2026 · 193 Views

REPLY

@ziv_ravid @KempeLab OpenAI's model does not use formal verification, so it is quite impressive in that respect as well.

Ravid Shwartz Ziv@ziv_ravid

I think it's really great that OpenAI solved the unit distance math problem, but, as @ChrSzegedy and @KempeLab said in our podcast, we've known for a while that math is a solved problem (at least some aspects of it). Math is easy because it has verifiable outputs and a very deterministic, clean end-to-end judgment process. But what about social science? The big question right now is whether we see something similar there. My guess is not in the near future, but who knows…

1:50 AM · May 21, 2026 · 12K Views

8:24 AM · May 21, 2026 · 1.6K Views

QUOTE POST

this is not a diss on openai-internal, its a diss on erdos. making progress on a very hard problem is still very impressive. even if this problem is useless math.

(((ل()(ل() 'yoav))))👾@yoavgo

my layman remarks: technically it didnt "solve a problem", it "improved a bound". the actual underlying question, what is the optimal number, remains open. erdos was a bit arrogant and a demi god and coined "is *my* construction optimal", but that was never *the* question.

5:47 AM · May 21, 2026 · 23.9K Views

5:51 AM · May 21, 2026 · 6.2K Views

REPLY

@littmath oh? if its a misleading on purpose advice and gpt overcame it, then i am much more impressed!

i think openai said one shot no scaffolding? but i am not 100% sure

Daniel Litt@littmath

@yoavgo But it's wrong; 5 is not useful at all. 1 is really the only important thing here. I am not sure how to interpret claims about scaffolding the internal model.

8:15 PM · May 21, 2026 · 501 Views

8:25 PM · May 21, 2026 · 401 Views

REPLY

@littmath intent doesn't matter much actually. if 2 and 5 were red herrings (and did not contribute positively to the reasoning) then i am more impressed

Daniel Litt@littmath

@yoavgo I doubt it was on purpose! Probably just a mistake on the part of the prompter. I think Noam Brown said "it is not a scaffold" but I don't know how to interpret this.

8:27 PM · May 21, 2026 · 410 Views

8:33 PM · May 21, 2026 · 47 Views

REPLY

@polynoamial @littmath do you share my (pretty uninformed) assessment that doing it with "a ton of ttc" is a quantitatively different kind of ability than doing it without?

Noam Brown@polynoamial

@littmath @yoavgo Honestly it wouldn't surprise me if GPT-5.5 Pro could solve it with a ton of test-time compute. That fits a pattern we've seen, where new models shift the intelligence vs ttc curve to the left. Corollary: I'm sure there's other breakthroughs waiting to be found with GPT-5.5 Pro.

8:32 PM · May 21, 2026 · 531 Views

8:35 PM · May 21, 2026 · 385 Views

REPLY

@polynoamial @littmath i wasnt asking about cost though. imagine all computation was immediate and free

Noam Brown@polynoamial

@yoavgo @littmath It's hard to draw a line... we're talking about log scale, so at some point it becomes completely unrealistic to use such a huge amount of ttc. It's possible that GPT-5.5 Pro would need 1000x the cost of the internal model (to do it without any steering).

8:37 PM · May 21, 2026 · 547 Views

8:40 PM · May 21, 2026 · 221 Views

REPLY

#103Delip Rao e/σ@DELIPRAO

@polynoamial @littmath in terms of intelligence/abilities/skills/... is succeeding with "a ton of ttc" requires the same set of skills/abilities/intelligence/... as doing it without the tons of compute? or is it just using the same things less efficiently

Noam Brown@polynoamial

@yoavgo @littmath Can you clarify the question?

8:40 PM · May 21, 2026 · 288 Views

8:44 PM · May 21, 2026 · 207 Views

QUOTE POST

@mmbronstein

Delip Rao e/σ@deliprao

@SuryaGanguli parroting stochastic parrots on bluesky

1:16 AM · May 21, 2026 · 1.9K Views

2:38 PM · May 21, 2026 · 515 Views

QUOTE POST

#139Jakob Foerster@J_FOERST

A great example of our field shifting from Benchmaxxing to _Benchmaking_. Only novel results and artifacts count.

Timothy Gowers @wtgowers@wtgowers

If you are a mathematician, then you may want to make sure you are sitting down before reading further.

7:04 PM · May 20, 2026 · 2.7M Views

7:40 AM · May 21, 2026 · 3.2K Views

POST

is the new math result neurosymbolic with Lean, harnesses etc or a pure LLM?

9:19 PM · May 20, 2026 · 140.7K Views

REPLY

@littmath we have asked @polynoamial but no word yet.

Daniel Litt@littmath

@GaryMarcus My understanding is that it was informal reasoning by an LRM. Summarized CoT is publicly available. No sign of Lean etc.

10:40 PM · May 20, 2026 · 14.1K Views

10:42 PM · May 20, 2026 · 1.2K Views

REPLY

@littmath also they may have used lean etc for (massive?) data augmentation

it’s hard to assess the generality of the advance without any real information on scope, training, architecture etc

the blog itself uses the word “new” with no elaboration

Daniel Litt@littmath

@GaryMarcus My understanding is that it was informal reasoning by an LRM. Summarized CoT is publicly available. No sign of Lean etc.

10:40 PM · May 20, 2026 · 14.1K Views

10:44 PM · May 20, 2026 · 10.8K Views

REPLY

@littmath and no sign of lean or other symbolic systems to prepare vast troves of augmented data?

also, any info about whether this a sort of one off where they tried many prompts and a human recognized a hit?

Daniel Litt@littmath

@GaryMarcus My understanding is that it was informal reasoning by an LRM. Summarized CoT is publicly available. No sign of Lean etc.

10:40 PM · May 20, 2026 · 14.1K Views

2:53 PM · May 21, 2026 · 230 Views

REPLY

@littmath thanks! look forward to hearing more about what they did, certainly an interesting result.

Daniel Litt@littmath

@GaryMarcus I don't know how they train, but IMO it's unlikely Lean or symbolic systems plays a significant role. We've seen huge improvements in areas of math where formalization is not really possible at the moment. I assume they tried a huge number of problems but have no info on this.

3:34 PM · May 21, 2026 · 205 Views

3:36 PM · May 21, 2026 · 142 Views

REPLY

@scaling01 how much you want to bet that symbolic tools such as lean were involved and that this was not a pure LLM?

i have said numerous times that neurosymbolic systems do well on math: pretty sure this was one.

Lisan al Gaib@scaling01

felt cute, might auto-complete all Erdös problems by 2030

8:43 PM · May 20, 2026 · 43.2K Views

9:15 PM · May 20, 2026 · 9K Views

REPLY

@scaling01 do read my tweets carefully if you are going to quote them; the word “pure” there was key.

Gary Marcus@GaryMarcus

@scaling01 how much you want to bet that symbolic tools such as lean were involved and that this was not a pure LLM? i have said numerous times that neurosymbolic systems do well on math: pretty sure this was one.

9:15 PM · May 20, 2026 · 9K Views

9:15 PM · May 20, 2026 · 1.1K Views

REPLY

@scaling01 @polynoamial and where is the statement?

Lisan al Gaib@scaling01

@GaryMarcus @polynoamial can you say anything on this? or was computer-use / lean already covered in your "no scaffold" statement?

9:17 PM · May 20, 2026 · 2.4K Views

9:19 PM · May 20, 2026 · 1.9K Views

REPLY

@bindureddy this is a such a muddle. (at least relative to my views)

LLMs are more or less just autcomplete, but (as I have always said) they have their uses.

And the real progress now is coming from adding new (symbolic) techniques to the mix, not from pure scaling.

Bindu Reddy@bindureddy

Where did all the AI haters go? 🤔 You know, the ones screaming "it's just autocomplete!" and "it'll never be useful!" They're real quiet now that AI is actually magical and transforming everything. Almost like... they were wrong the whole time. 😏

12:08 PM · May 17, 2026 · 32.4K Views

7:45 AM · May 18, 2026 · 3.4K Views

QUOTE POST

the crazy part is that people are “clowning” me without knowing anything about the training or whether anything else other than scaled changed or how the model does on anything else. (or what it costs etc)

my claims have always been system level architecture and openai has said practically zilch.

if you are “clowning” me without knowing more the joke is actually on you, because it means you don’t understand the technical issues enough to ask.

10:41 PM · May 20, 2026 · 16.9K Views

QUOTE POST

The pure LLM debate - which I had for many years, here and elsewhere - is indeed no longer relevant. Why?

Because I won; nobody uses pure LLMs anymore.

Nowadays all deployed objects are neurosymbolic, which was exactly the point of my infamous 2022 paper, Deep Learning is Hitting a Wall.

If you don’t know I won, it’s because you read the title and not the paper 🤷‍♂️

9:35 AM · May 18, 2026 · 21.1K Views

QUOTE POST

wild to see a somewhat disgraced politician comment on the technical side of AI as if he has any idea about the underlying computational questions.

and of course he dwells at the bottom of Paul Graham’s pyramid of argument, with a bunch of emoji rather than any sort substantive argument whatsoever.

Dominic Cummings@Dominic2306

😂😂😂🤡🤡🤡

10:13 PM · May 20, 2026 · 94.7K Views

10:24 PM · May 20, 2026 · 65.9K Views

QUOTE POST