Reasoning Models Ignore Instructions in Hidden Traces Despite Compliant Answers

Original post

Rohan Paul@rohanpaul_ai#1031inAI

The paper shows reasoning models often ignore user instructions while thinking, even when final answers look fine.

A reasoning trace is the hidden text the model writes before the final answer.

The authors build ReasonIF, a test that pairs normal questions with simple rules like language, word cap, JSON, required disclaimer, all caps, or no commas.

They automatically check each trace for rule compliance and compute an instruction following score.

Across many models, fewer than 25% of traces follow the rules.

On the same prompts, models usually follow the rule in the final answer but not in the trace.

As problems get harder, rule following in traces drops further.

A 2 turn redo that points out the mistake raises compliance by a small amount and sometimes bumps accuracy.

Small supervised fine tuning on synthetic traces lifts one model from 0.11 to 0.27 with a slight accuracy tradeoff.

The takeaway is that controlling how models think is still unreliable, and ReasonIF makes these gaps measurable and improvable.

----

Paper – arxiv. org/abs/2510.15211

Paper Title: "ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning"

8:45 PM · Oct 20, 2025 · 5.4K Views