17h ago

Microsoft's Dimitris Papailiopoulos asks if a symbolic Python solver can crack the GSM8k math benchmark without LLMs

Omar Khattab questioned if LLMs would optimize the parameters.

2111022422.2K

——0——

Original post

#197Dimitris Papailiopoulos@DIMITRISPAPAIL

Say I am trying to solve GSM8k but no LLMs allowed! Only with a symbolic style solver with perhaps a few trainable parameters, so it's effectively a python program. How high do you expect it to go?

6:25 PM · May 28, 2026

#160Omar Khattab@LATEINTERACTION

@DimitrisPapail Is an LLM optimizer involved :D

Dimitris Papailiopoulos@DimitrisPapail

Say I am trying to solve GSM8k but no LLMs allowed! Only with a symbolic style solver with perhaps a few trainable parameters, so it's effectively a python program. How high do you expect it to go?

1:25 AM · May 29, 2026 · 14.5K Views

1:36 AM · May 29, 2026 · 2.6K Views

#160Omar Khattab@LATEINTERACTION

@DimitrisPapail that’s a yes then! i think it can get almost arbitrarily good in that case, but will be brittle :-)

Dimitris Papailiopoulos@DimitrisPapail

@lateinteraction no calls to llms allowed beyond the building of this magical solver :)

1:43 AM · May 29, 2026 · 1.5K Views

1:44 AM · May 29, 2026 · 423 Views

#160Omar Khattab@LATEINTERACTION

@DimitrisPapail but you can look into your training set no? i'm saying it will only generalize in distribution

Dimitris Papailiopoulos@DimitrisPapail

@lateinteraction you can't look into your test set!

1:47 AM · May 29, 2026 · 292 Views

1:47 AM · May 29, 2026 · 304 Views

#160Omar Khattab@LATEINTERACTION

@DimitrisPapail haha i was speculating only! makes sense

Dimitris Papailiopoulos@DimitrisPapail

@lateinteraction yes you can look into your training data set. but it's very hard (been trying)

1:53 AM · May 29, 2026 · 437 Views

1:57 AM · May 29, 2026 · 344 Views

#197Dimitris Papailiopoulos@DIMITRISPAPAIL

@lateinteraction no calls to llms allowed beyond the building of this magical solver :)

Omar Khattab@lateinteraction

@DimitrisPapail Is an LLM optimizer involved :D

1:36 AM · May 29, 2026 · 2.6K Views

1:43 AM · May 29, 2026 · 1.5K Views

#197Dimitris Papailiopoulos@DIMITRISPAPAIL

@lateinteraction you can't look into your test set!

Omar Khattab@lateinteraction

@DimitrisPapail that’s a yes then! i think it can get almost arbitrarily good in that case, but will be brittle :-)

1:44 AM · May 29, 2026 · 423 Views

1:47 AM · May 29, 2026 · 292 Views

#197Dimitris Papailiopoulos@DIMITRISPAPAIL

@lateinteraction yes you can look into your training data set. but it's very hard (been trying)

Omar Khattab@lateinteraction

@DimitrisPapail but you can look into your training set no? i'm saying it will only generalize in distribution

1:47 AM · May 29, 2026 · 304 Views

1:53 AM · May 29, 2026 · 437 Views

#197Dimitris Papailiopoulos@DIMITRISPAPAIL

@qberthet to the train? absolutely

Quentin Berthet@qberthet

@DimitrisPapail Is a lookup table allowed?

2:43 PM · May 29, 2026 · 379 Views

2:44 PM · May 29, 2026 · 205 Views

#970Amin Karbasi@AMINKARBASI

@DimitrisPapail Pi

Dimitris Papailiopoulos@DimitrisPapail

Say I am trying to solve GSM8k but no LLMs allowed! Only with a symbolic style solver with perhaps a few trainable parameters, so it's effectively a python program. How high do you expect it to go?

1:25 AM · May 29, 2026 · 14.5K Views

2:55 AM · May 29, 2026 · 1.6K Views

#1601Quentin Berthet@QBERTHET

@DimitrisPapail Is a lookup table allowed?

Dimitris Papailiopoulos@DimitrisPapail

Say I am trying to solve GSM8k but no LLMs allowed! Only with a symbolic style solver with perhaps a few trainable parameters, so it's effectively a python program. How high do you expect it to go?

1:25 AM · May 29, 2026 · 14.5K Views

2:43 PM · May 29, 2026 · 379 Views

Microsoft's Dimitris Papailiopoulos asks if a symbolic Python solver can crack the GSM8k math benchmark without LLMs

Sentiment

Cluster engagement