LisanBench data shows Anthropic's Claude Opus 4.8 eliminated "lazy investigation" failures, down from 91% in Opus 4.5 · Digg