3h ago

Anthropic evaluation data shows Claude Opus 4.8 reduced its 'lazy investigation' fell-for-trap rate to 0% from 91% in Opus 4.5

The evaluation measures thoroughness during complex problem-solving tasks

Sentiment

Pos53.8%

Neg46.2%

Positive users praise Claude Opus 4.8's zero lazy investigation rate for better reliability and session persistence while negative users dismiss the claims after seeing worse performance than earlier versions.

13 comments with sentiment.

Anthropic evaluation data shows Claude Opus 4.8 reduced its 'lazy investigation' fell-for-trap rate to 0% from 91% in Opus 4.5 · Digg