13h agoDatacurve releases DeepSWE, a long-horizon software engineering benchmark designed to prevent data leakageGPT-5.5 scored 70 on the new agentic evaluation.SentimentSentimentPos79.6%Neg20.4%Positive users praise the DeepSWE agentic coding benchmark for matching real daily use and exposing tangible gaps like GPT-5.5's lead, while negative users dismiss specific rankings and call the benchmark flawed or inaccurate.29 comments with sentiment. View comments.