8h agoStanford NLP's Aryaman Arora argues that backlash against SWE-bench Verified validates the coding benchmark's qualityThe benchmark evaluates AI agents on real GitHub issues.