New Paper Shows Frontier Models Struggle Evaluating Grade-School Math Reasoning · Digg