DeepSWE benchmark data shows GPT-5.5 outperforms Claude Opus 4.8 on software engineering tasks and token efficiency · Digg