GPT-5.5 achieves 99.46 percent accuracy on multi-digit multiplication across a 20-by-20 grid of problems with up to 20 digits per number
Medium reasoning effort produced near-complete heatmap coverage versus low accuracy without it.
——0——
QUOTE POST
#1038Raphaël Millière@RAPHAELMILLIERE
I still occasionally hear people claim that LLMs are hilariously bad at arithmetic. Another reminder that it's not 2022 anymore.
I redid the multi-digit multiplication experiment, now with gpt-5.5. With medium reasoning and 7 samples each cell, it pretty much aced the test with 99.46% accuracy. The model had no tools to call and had to rely on its reasoning. Can it go further? (1/4)
8:24 AM · May 22, 2026 · 118.5K Views
3:03 PM · May 22, 2026 · 3.4K Views