/AI16h ago

Alex J. Champandard and Dimitris Papailiopoulos argue that preventing GSM8k benchmark contamination is impractical without manual filtering

Models frequently ingest benchmark questions accidentally or intentionally.

1300452

Comments

#197

Original post

Alex J. Champandard 🌱@alexjc#1352inAI

@DimitrisPapail You mean direct & provable "leakage"? Because the large model writing the solution likely was trained on GSM8k -- accidentally or on purpose -- so there must be some indirect assumptions being made all along (also a form of leakage).

Dimitris Papailiopoulos@DimitrisPapail

I think there's a cleaner version of this question: What's the best GSM8k solver that fits in <2MBs that doesn't include (leaks of) test data.

12:42 AM · Jun 1, 2026 · 341 Views

/AI16h ago

Alex J. Champandard and Dimitris Papailiopoulos argue that preventing GSM8k benchmark contamination is impractical without manual filtering

Models frequently ingest benchmark questions accidentally or intentionally.

--0--

Comments

#197

Original post

Alex J. Champandard 🌱@alexjc#1352inAI

Dimitris Papailiopoulos@DimitrisPapail

I think there's a cleaner version of this question: What's the best GSM8k solver that fits in <2MBs that doesn't include (leaks of) test data.

12:42 AM · Jun 1, 2026 · 341 Views

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS116LIKES2

Dimitris Papailiopoulos@DimitrisPapail

@alexjc the big model is trained on it you're write, and you can't control for that, unless you do things mostly manually which is infeasible.

Alex J. Champandard 🌱@alexjc

11h11620