Analysis finds Claude Sonnet 4.5, DeepSeek R1, Grok 4, and GPT-5 exhibit highly correlated error patterns on benchmarks · Digg