/Tech22h ago

New Paper Shows Hallucination Detectors Often Ignore Reasoning

2533317
Original postJessy Li#936

馃毃New paper! Your hallucination detector says it evaluates reasoning. But what if it's just peeking at the final answer? We tested this: keep the reasoning, only change the answer. Many detectors' scores shift dramatically 馃У

2:12 PM 路 Jun 9, 2026 路 317 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS44LIKES2

Paper: https://arxiv.org/abs/2605.08346

Work done w/Minh Vu, @HongliZhan, @liraymond96, & Manish Bhattarai.

22hViews 44Likes 2
REPLIES1

TRACT stays stable under both FORCE and REMOVE since it scores the reasoning body, not the endpoint. It also stacks well: fusing TRACT with existing detectors gives +5 to +20 average AUC across all 5 models.

22hViews 33Likes 2

We ran this across 4 benchmarks and 5 models. Some detectors swing 20+ AUC points just from changing or removing answer cues, even though the reasoning is untouched.

22hViews 28Likes 1

Here's what we did: FORCE: replace the final answer with the ground truth; REMOVE: delete the answer step entirely.

Same reasoning body both times. A trace-faithful detector should remain informative under both.

22hViews 27Likes 1

So we asked: what does the reasoning itself look like when it's going wrong? It wanders, hedges, grows uneven, or diverges across samples. We built TRACT to pick up on these trajectory patterns as a lightweight text-only score.

22hViews 22Likes 1