2h ago

Claude Opus 4.8 Reduces Hallucinations On Missing-Context Benchmarks Versus 4.7

228021.7K

——0——

Original post

A big problem with Opus 4.7 was its hallucinations. Good to see slight improvements with Opus 4.8 at least on the benchmarks they report on

Interestingly, a significant amount of hallucination evals used tests the models ability to resist pressure from incorrect info supplied by the user.

I'm not entirely sure if thats reflective of the hallucinations most people encounter

wh@nrehiew_

A big problem with Opus 4.7 was its hallucinations. Good to see slight improvements with Opus 4.8 at least on the benchmarks they report on

5:18 PM · May 28, 2026 · 1.3K Views

5:18 PM · May 28, 2026 · 344 Views