Dimitris Papailiopoulos argues preventing LLM dataset contamination on GSM8k is practically impossible without manual evaluation · Digg