17h ago

Microsoft's Dimitris Papailiopoulos asks if a symbolic Python solver can crack the GSM8k math benchmark without LLMs

Omar Khattab questioned if LLMs would optimize the parameters.

0
Original post

Say I am trying to solve GSM8k but no LLMs allowed! Only with a symbolic style solver with perhaps a few trainable parameters, so it's effectively a python program. How high do you expect it to go?

6:25 PM · May 28, 2026 View on X

@DimitrisPapail Is an LLM optimizer involved :D

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

Say I am trying to solve GSM8k but no LLMs allowed! Only with a symbolic style solver with perhaps a few trainable parameters, so it's effectively a python program. How high do you expect it to go?

1:25 AM · May 29, 2026 · 14.5K Views
1:36 AM · May 29, 2026 · 2.6K Views

@DimitrisPapail that’s a yes then! i think it can get almost arbitrarily good in that case, but will be brittle :-)

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

@lateinteraction no calls to llms allowed beyond the building of this magical solver :)

1:43 AM · May 29, 2026 · 1.5K Views
1:44 AM · May 29, 2026 · 423 Views

@DimitrisPapail but you can look into your training set no? i'm saying it will only generalize in distribution

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

@lateinteraction you can't look into your test set!

1:47 AM · May 29, 2026 · 292 Views
1:47 AM · May 29, 2026 · 304 Views

@DimitrisPapail haha i was speculating only! makes sense

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

@lateinteraction yes you can look into your training data set. but it's very hard (been trying)

1:53 AM · May 29, 2026 · 437 Views
1:57 AM · May 29, 2026 · 344 Views

@lateinteraction no calls to llms allowed beyond the building of this magical solver :)

Omar KhattabOmar Khattab@lateinteraction

@DimitrisPapail Is an LLM optimizer involved :D

1:36 AM · May 29, 2026 · 2.6K Views
1:43 AM · May 29, 2026 · 1.5K Views

@lateinteraction you can't look into your test set!

Omar KhattabOmar Khattab@lateinteraction

@DimitrisPapail that’s a yes then! i think it can get almost arbitrarily good in that case, but will be brittle :-)

1:44 AM · May 29, 2026 · 423 Views
1:47 AM · May 29, 2026 · 292 Views

@lateinteraction yes you can look into your training data set. but it's very hard (been trying)

Omar KhattabOmar Khattab@lateinteraction

@DimitrisPapail but you can look into your training set no? i'm saying it will only generalize in distribution

1:47 AM · May 29, 2026 · 304 Views
1:53 AM · May 29, 2026 · 437 Views

@qberthet to the train? absolutely

Quentin BerthetQuentin Berthet@qberthet

@DimitrisPapail Is a lookup table allowed?

2:43 PM · May 29, 2026 · 379 Views
2:44 PM · May 29, 2026 · 205 Views

@DimitrisPapail Pi

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

Say I am trying to solve GSM8k but no LLMs allowed! Only with a symbolic style solver with perhaps a few trainable parameters, so it's effectively a python program. How high do you expect it to go?

1:25 AM · May 29, 2026 · 14.5K Views
2:55 AM · May 29, 2026 · 1.6K Views

@DimitrisPapail Is a lookup table allowed?

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

Say I am trying to solve GSM8k but no LLMs allowed! Only with a symbolic style solver with perhaps a few trainable parameters, so it's effectively a python program. How high do you expect it to go?

1:25 AM · May 29, 2026 · 14.5K Views
2:43 PM · May 29, 2026 · 379 Views