OpenAI solved the unit distance problem in mathematics, prompting discussion on whether the work used informal chain-of-thought reasoning or neurosymbolic tools such as Lean
Attention now turns to AI progress in social science domains.
@littmath @yoavgo It was just a single prompt, with no special guidance.
@yoavgo I doubt it was on purpose! Probably just a mistake on the part of the prompter. I think Noam Brown said "it is not a scaffold" but I don't know how to interpret this.
@littmath @yoavgo Honestly it wouldn't surprise me if GPT-5.5 Pro could solve it with a ton of test-time compute. That fits a pattern we've seen, where new models shift the intelligence vs ttc curve to the left.
Corollary: I'm sure there's other breakthroughs waiting to be found with GPT-5.5 Pro.
@littmath @yoavgo It was just a single prompt, with no special guidance.
@yoavgo @littmath It's hard to draw a line... we're talking about log scale, so at some point it becomes completely unrealistic to use such a huge amount of ttc. It's possible that GPT-5.5 Pro would need 1000x the cost of the internal model (to do it without any steering).
@polynoamial @littmath do you share my (pretty uninformed) assessment that doing it with "a ton of ttc" is a quantitatively different kind of ability than doing it without?
@yoavgo @littmath Can you clarify the question?
@polynoamial @littmath i wasnt asking about cost though. imagine all computation was immediate and free
@yoavgo @littmath I feel like this is a complicated point so I want to put together a longer post explaining my views.
@polynoamial @littmath in terms of intelligence/abilities/skills/... is succeeding with "a ton of ttc" requires the same set of skills/abilities/intelligence/... as doing it without the tons of compute? or is it just using the same things less efficiently
@ziv_ravid @KempeLab OpenAI's model does not use formal verification, so it is quite impressive in that respect as well.
I think it's really great that OpenAI solved the unit distance math problem, but, as @ChrSzegedy and @KempeLab said in our podcast, we've known for a while that math is a solved problem (at least some aspects of it). Math is easy because it has verifiable outputs and a very deterministic, clean end-to-end judgment process. But what about social science? The big question right now is whether we see something similar there. My guess is not in the near future, but who knows…
this is not a diss on openai-internal, its a diss on erdos. making progress on a very hard problem is still very impressive. even if this problem is useless math.
my layman remarks: technically it didnt "solve a problem", it "improved a bound". the actual underlying question, what is the optimal number, remains open. erdos was a bit arrogant and a demi god and coined "is *my* construction optimal", but that was never *the* question.
@littmath oh? if its a misleading on purpose advice and gpt overcame it, then i am much more impressed!
i think openai said one shot no scaffolding? but i am not 100% sure
@yoavgo But it's wrong; 5 is not useful at all. 1 is really the only important thing here. I am not sure how to interpret claims about scaffolding the internal model.
@littmath intent doesn't matter much actually. if 2 and 5 were red herrings (and did not contribute positively to the reasoning) then i am more impressed
@yoavgo I doubt it was on purpose! Probably just a mistake on the part of the prompter. I think Noam Brown said "it is not a scaffold" but I don't know how to interpret this.
@polynoamial @littmath do you share my (pretty uninformed) assessment that doing it with "a ton of ttc" is a quantitatively different kind of ability than doing it without?
@littmath @yoavgo Honestly it wouldn't surprise me if GPT-5.5 Pro could solve it with a ton of test-time compute. That fits a pattern we've seen, where new models shift the intelligence vs ttc curve to the left. Corollary: I'm sure there's other breakthroughs waiting to be found with GPT-5.5 Pro.
@polynoamial @littmath i wasnt asking about cost though. imagine all computation was immediate and free
@yoavgo @littmath It's hard to draw a line... we're talking about log scale, so at some point it becomes completely unrealistic to use such a huge amount of ttc. It's possible that GPT-5.5 Pro would need 1000x the cost of the internal model (to do it without any steering).
@polynoamial @littmath in terms of intelligence/abilities/skills/... is succeeding with "a ton of ttc" requires the same set of skills/abilities/intelligence/... as doing it without the tons of compute? or is it just using the same things less efficiently
@yoavgo @littmath Can you clarify the question?
@mmbronstein
@SuryaGanguli parroting stochastic parrots on bluesky
A great example of our field shifting from Benchmaxxing to _Benchmaking_. Only novel results and artifacts count.
If you are a mathematician, then you may want to make sure you are sitting down before reading further.
is the new math result neurosymbolic with Lean, harnesses etc or a pure LLM?
@littmath we have asked @polynoamial but no word yet.
@GaryMarcus My understanding is that it was informal reasoning by an LRM. Summarized CoT is publicly available. No sign of Lean etc.
@littmath also they may have used lean etc for (massive?) data augmentation
it’s hard to assess the generality of the advance without any real information on scope, training, architecture etc
the blog itself uses the word “new” with no elaboration
@GaryMarcus My understanding is that it was informal reasoning by an LRM. Summarized CoT is publicly available. No sign of Lean etc.
@littmath and no sign of lean or other symbolic systems to prepare vast troves of augmented data?
also, any info about whether this a sort of one off where they tried many prompts and a human recognized a hit?
@GaryMarcus My understanding is that it was informal reasoning by an LRM. Summarized CoT is publicly available. No sign of Lean etc.
@littmath thanks! look forward to hearing more about what they did, certainly an interesting result.
@GaryMarcus I don't know how they train, but IMO it's unlikely Lean or symbolic systems plays a significant role. We've seen huge improvements in areas of math where formalization is not really possible at the moment. I assume they tried a huge number of problems but have no info on this.
@scaling01 how much you want to bet that symbolic tools such as lean were involved and that this was not a pure LLM?
i have said numerous times that neurosymbolic systems do well on math: pretty sure this was one.
felt cute, might auto-complete all Erdös problems by 2030
@scaling01 do read my tweets carefully if you are going to quote them; the word “pure” there was key.
@scaling01 how much you want to bet that symbolic tools such as lean were involved and that this was not a pure LLM? i have said numerous times that neurosymbolic systems do well on math: pretty sure this was one.
@scaling01 @polynoamial and where is the statement?
@GaryMarcus @polynoamial can you say anything on this? or was computer-use / lean already covered in your "no scaffold" statement?
@bindureddy this is a such a muddle. (at least relative to my views)
LLMs are more or less just autcomplete, but (as I have always said) they have their uses.
And the real progress now is coming from adding new (symbolic) techniques to the mix, not from pure scaling.
Where did all the AI haters go? 🤔 You know, the ones screaming "it's just autocomplete!" and "it'll never be useful!" They're real quiet now that AI is actually magical and transforming everything. Almost like... they were wrong the whole time. 😏
the crazy part is that people are “clowning” me without knowing anything about the training or whether anything else other than scaled changed or how the model does on anything else. (or what it costs etc)
my claims have always been system level architecture and openai has said practically zilch.
if you are “clowning” me without knowing more the joke is actually on you, because it means you don’t understand the technical issues enough to ask.
The pure LLM debate - which I had for many years, here and elsewhere - is indeed no longer relevant. Why?
Because I won; nobody uses pure LLMs anymore.
Nowadays all deployed objects are neurosymbolic, which was exactly the point of my infamous 2022 paper, Deep Learning is Hitting a Wall.
If you don’t know I won, it’s because you read the title and not the paper 🤷♂️
wild to see a somewhat disgraced politician comment on the technical side of AI as if he has any idea about the underlying computational questions.
and of course he dwells at the bottom of Paul Graham’s pyramid of argument, with a bunch of emoji rather than any sort substantive argument whatsoever.
😂😂😂🤡🤡🤡
I love AI, it’s pure LLMs I hate.
Pure LLMs *are* basically just autocomplete.
Recent progress (e.g. Claude Code) doesn’t show otherwise
Rather, lot of the progress in the last two years has come from *introducing* other things – mainly classic symbolic techniques and tools, to offset the weaknesses of pure LLMs.
Shame to see the tweet below muddle all of this.
If we want to make further progress we need to understand where the progress is coming from; mostly it is coming from leaving pure LLMs behind.
Where did all the AI haters go? 🤔 You know, the ones screaming "it's just autocomplete!" and "it'll never be useful!" They're real quiet now that AI is actually magical and transforming everything. Almost like... they were wrong the whole time. 😏
A respectable milestone in AI for Math. Congrats to everyone involved!
If you are a mathematician, then you may want to make sure you are sitting down before reading further.
Dites-moi, où et en quel pays are the stochastic parrots now?
If you are a mathematician, then you may want to make sure you are sitting down before reading further.
@ziv_ravid @ChrSzegedy @KempeLab Is this engagement bait? 😅
I think it's really great that OpenAI solved the unit distance math problem, but, as @ChrSzegedy and @KempeLab said in our podcast, we've known for a while that math is a solved problem (at least some aspects of it). Math is easy because it has verifiable outputs and a very deterministic, clean end-to-end judgment process. But what about social science? The big question right now is whether we see something similar there. My guess is not in the near future, but who knows…
@ChrSzegedy @StatsLime What is really interesting though about these models whether their jaggedness will go away. At the moment, my personal experience is very random.
The next in a series of firsts for AI and mathematics!
If you are a mathematician, then you may want to make sure you are sitting down before reading further.
@cloneofsimo It did not use lean, regardless.
Since people are asking, no it did not use Lean. But I don't think it should matter anyway.
he is reaching fatal levels of copium, im legit worried for him
GOALPOST MOVED. "LLM must solve major conjecture without lean or data augmentation using lean. Otherwise its neuro symbolic AI, just as I predicted years ago. Gotcha."
@LucaAmb @TaliaRinger it'd be self consistent nonsense though.
@TaliaRinger What purpose would it even? How wouldn't we know that it isn't just pure nonsense?
GOALPOST MOVED. "LLM must solve major conjecture without lean or data augmentation using lean. Otherwise its neuro symbolic AI, just as I predicted years ago. Gotcha."

At this point its just kinda sad to see him go like this.
GOALPOST MOVED. "LLM must solve major conjecture without lean or data augmentation using lean. Otherwise its neuro symbolic AI, just as I predicted years ago. Gotcha."
I like how nobody (yet) has the conspiracy, that it was actually the group of openai employee that solved the conjecture and not the model, despite the number of IMO medalists and mathematicians OpenAI hired
I feel like it is legit conspiracy given how impossibly impressive the result is.

I think it's really great that OpenAI solved the unit distance math problem, but, as @ChrSzegedy and @KempeLab said in our podcast, we've known for a while that math is a solved problem (at least some aspects of it). Math is easy because it has verifiable outputs and a very deterministic, clean end-to-end judgment process. But what about social science? The big question right now is whether we see something similar there. My guess is not in the near future, but who knows…
@ChrSzegedy @KempeLab Yeah, I agree :)
@ziv_ravid @KempeLab OpenAI's model does not use formal verification, so it is quite impressive in that respect as well.
Great post. My own sense is almost exactly like what Jason described here.
"AI generating new knowledge and accelerating science will change the trajectory of humanity."
Welcome to the era of Knowledge Accelerationism.
@GaryMarcus @scaling01 specialized GPT-5 math models already stood apart by *not* using lean, and that’s a general model
@scaling01 how much you want to bet that symbolic tools such as lean were involved and that this was not a pure LLM? i have said numerous times that neurosymbolic systems do well on math: pretty sure this was one.
felt cute, might auto-complete all Erdös problems by 2030
I love AI, it’s pure LLMs I hate. Pure LLMs *are* basically just autocomplete. Recent progress (e.g. Claude Code) doesn’t show otherwise Rather, lot of the progress in the last two years has come from *introducing* other things – mainly classic symbolic techniques and tools, to offset the weaknesses of pure LLMs. Shame to see the tweet below muddle all of this. If we want to make further progress we need to understand where the progress is coming from; mostly it is coming from leaving pure LLMs behind.
@GaryMarcus @polynoamial can you say anything on this? or was computer-use / lean already covered in your "no scaffold" statement?
@scaling01 how much you want to bet that symbolic tools such as lean were involved and that this was not a pure LLM? i have said numerous times that neurosymbolic systems do well on math: pretty sure this was one.
Gary please stop digging your own hole😭

The posthuman era just kicked off…
If you are a mathematician, then you may want to make sure you are sitting down before reading further.
If you have a superhuman mathematician and you can’t create an algo that gives you a matrix multiplication speedup that 100x useful compute over the next 12 months…
that should be the takeoff. sudden unhobbling and capability expansion quite soon
interesting to think of the space of possibilities in the mathematical and physical sciences where humans are amazing at building out entire fields of theory, study, and understanding around but really really bad at the mechanical grunt work
I think we are in the process of discovering that humans are bad at mathematics. A gibbon would scoff at an Olympic climber; the human body is not optimized for climbing. We're getting mounting evidence that our brain may be far from optimal for advanced math. No disrespect to mathematicians. I was a two-time IMO silver medalist; I'm just smart enough to appreciate that some people are much, much smarter. But it's starting to look like math is somewhere on the midpoint of Moravec’s paradox; between chess (computers surpassed us some time back) and cooking (probably many years to go, for general capabilities). It's fairly hard for us, and so it looks like computers are going to surpass us. AI math still has important weaknesses. For instance, AI systems have not yet shown any ability to identify interesting research directions, or develop new concepts on which further work can build. But they are starting to look superhuman in some respects. And once AI *starts* to become superhuman in some domain, we all know what happens next.
example: there’s a big difference between the knowledge work of llms doing coding projects that take a few hours-days mostly autonomously w/ minimal guidance from me vs taking alex radford’s pre-1931 llm and having it recreate 80 years of fundamental computer science theory
interesting to think of the space of possibilities in the mathematical and physical sciences where humans are amazing at building out entire fields of theory, study, and understanding around but really really bad at the mechanical grunt work
@roydanroy So I think it's both quite impressive (because I think this problem is cool) and in a way also not so impressive.
@roydanroy Combinatorialists and incidence / discrete geometry experts wouldn't have any Algebraic number theory chops. AI models can pattern match wherever they want. Erdos also believing that the conjecture was true biased folk. The LM could run amok in either direction.
@SebastienBubeck @roydanroy I did read it. I meant that usually people in those areas tend to have a different mathematical skill set.
@_onionesque @roydanroy Incorrect, people with algebraic number theory chops did look at it (you can read about it in the Remarks paper0.
@SebastienBubeck @roydanroy In a short thread, I reference the note saying about ANT / similar approaches. In the above post all I am trying to imply is that experts in the area will mostly not know ANT, and thus making this sort of connection won't be easy for them (even if it has been picked up before).
@SebastienBubeck @roydanroy I did read it. I meant that usually people in those areas tend to have a different mathematical skill set.
Gary is still stuck in the symbolists vs connectionist cogsci debates from the 80s and 90s. If it worked it ipso facto must be neurosymbolic 🤪
GOALPOST MOVED. "LLM must solve major conjecture without lean or data augmentation using lean. Otherwise its neuro symbolic AI, just as I predicted years ago. Gotcha."
@TaliaRinger What purpose would it even? How wouldn't we know that it isn't just pure nonsense?
This view is always weird to me. A theory that is "beyond human comprehension" is certainly something, but it is not mathematics.

