2h ago

Claude Opus 4.8 Reduces Hallucinations On Missing-Context Benchmarks Versus 4.7

0
Original post

A big problem with Opus 4.7 was its hallucinations. Good to see slight improvements with Opus 4.8 at least on the benchmarks they report on

10:18 AM · May 28, 2026 View on X

Interestingly, a significant amount of hallucination evals used tests the models ability to resist pressure from incorrect info supplied by the user.

I'm not entirely sure if thats reflective of the hallucinations most people encounter

whwh@nrehiew_

A big problem with Opus 4.7 was its hallucinations. Good to see slight improvements with Opus 4.8 at least on the benchmarks they report on

5:18 PM · May 28, 2026 · 1.3K Views
5:18 PM · May 28, 2026 · 344 Views