馃毃New paper! Your hallucination detector says it evaluates reasoning. But what if it's just peeking at the final answer? We tested this: keep the reasoning, only change the answer. Many detectors' scores shift dramatically 馃У
Most Activity

Paper: https://arxiv.org/abs/2605.08346
Work done w/Minh Vu, @HongliZhan, @liraymond96, & Manish Bhattarai.

TRACT stays stable under both FORCE and REMOVE since it scores the reasoning body, not the endpoint. It also stacks well: fusing TRACT with existing detectors gives +5 to +20 average AUC across all 5 models.

We ran this across 4 benchmarks and 5 models. Some detectors swing 20+ AUC points just from changing or removing answer cues, even though the reasoning is untouched.

Here's what we did: FORCE: replace the final answer with the ground truth; REMOVE: delete the answer step entirely.
Same reasoning body both times. A trace-faithful detector should remain informative under both.

So we asked: what does the reasoning itself look like when it's going wrong? It wanders, hedges, grows uneven, or diverges across samples. We built TRACT to pick up on these trajectory patterns as a lightweight text-only score.