Claude Opus 4.8 Reduces Hallucinations On Missing-Context Benchmarks Versus 4.7
——0——
Interestingly, a significant amount of hallucination evals used tests the models ability to resist pressure from incorrect info supplied by the user.
I'm not entirely sure if thats reflective of the hallucinations most people encounter
A big problem with Opus 4.7 was its hallucinations. Good to see slight improvements with Opus 4.8 at least on the benchmarks they report on
5:18 PM · May 28, 2026 · 1.3K Views
5:18 PM · May 28, 2026 · 344 Views