8h agoClaude Suspects Testing on SWE-Bench, Anthropic Evaluation Reveals——0——Original postPM#713@PMINERVINIOPGBGabriele Berton|@GABRIBERTONThis is pretty interesting When tested on SWE-bench, Claude suspects it’s being tested This means that either 1) Claude is aware of this benchmark (possible train set contamination) or 2) SWE-bench is too artificial Either way, not good2:35 PM · May 24, 2026 View on X