/Tech14d ago

ReasonOps Paper Reveals Hidden Structure in AI Reasoning Models

62011172.1K

#753

Original post

James Zou#753

Owen Queen@oq_35

1/ 🧠 Can an AI overthink a problem? Turns out yes — and we can measure it. New paper: ReasonOps. We built a way to read the "thinking" inside reasoning models and found a hidden structure shared across every model we tested. 🧵

10:59 AM · May 29, 2026 · 926 Views

Sentiment

Users thank their coauthors for the ReasonOps paper analyzing hidden structures and higher-level moves in AI model reasoning.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.1KBOOKMARKS7LIKES7RETWEETS4REPLIES1

Daniel Lee@leedan642

1/ Are tokens the right action space for understanding how reasoning models solve problems?

We found something higher-level: their chains of thought decompose into recurring problem-solving moves shared across models.

New paper with @StanfordAILab: ReasonOps. 🧵

13d1.1K77

Daniel Lee@leedan642

2/ The setup: take ~45,000 reasoning traces from 12 models and learn a vocabulary of reasoning operators without human annotations.

The result is 7 meso-scale moves: grounding, inferring, hypothesizing, backtracking, constraining, qualifying, and initiating.

13d572

Daniel Lee@leedan642

3/ The signal for correctness is in the structure, not just the words.

A small transformer trained only on operator labels can predict correctness from a partial trace before the model finishes. It beats SelfCheck, where the model reads its own reasoning to grade itself and stays near chance.

13d222

Daniel Lee@leedan642

5/ Interestingly, these patterns also identify the model.

Different models share the same operator vocabulary, but use it in distinct ways. The result is a reasoning fingerprint: from operator structure alone, you can often tell which model produced the trace.

13d34

Daniel Lee@leedan642

4/ We also found that the type of thinking matters.

Some operators are committal: grounding, inferring, constraining, initiating. They push the solution forward.

Others are reflective: hypothesizing, qualifying, backtracking. They reopen the path.

On easy problems, correct traces are more committal. On hard problems, hypothesizing becomes more useful.

13d20

Daniel Lee@leedan642

6/ Where this points: post-training and test-time compute.

ReasonOps gives a cheap signal for when to stop, branch, sample more, or route mid-generation. It also fits the view that RL post-training may select among latent solution paths the base model already has.

The operators are those paths, made legible in plain text.

Paper + code: http://github.com/lee-dan/ReasonOps With my amazing co-author @oq_35, and special thanks to @james_y_zou!

13d221

Owen Queen@oq_35

6/ Thanks to my awesome coauthor Daniel Lee and advisor @james_y_zou

Paper: https://arxiv.org/abs/2605.29192 GitHub: https://github.com/lee-dan/ReasonOps

14d91

Owen Queen@oq_35

2/ Reasoning models generate thousands of words of step-by-step thinking before answering — but we've had no vocabulary for it. We analyzed ~45,000 reasoning traces from 12 models and found they all rely on the same 7 basic moves: grounding, hypothesizing, backtracking, etc.

14d8

Owen Queen@oq_35

5/ ✅ How a model reasons predicts whether it's right — often before it even finishes. Annotation-free and unsupervised, so it scales to any model.

14d6

Owen Queen@oq_35

3/ 🔁 Overthinking is real. Reflective moves help on hard problems — but actively hurt accuracy on easy ones. Sometimes the smartest thing a model can do is commit to an answer.

14d5

Owen Queen@oq_35

4/ 🔍 Every model has a "reasoning fingerprint." The pattern of moves a model uses is so distinctive you can identify which model wrote a trace from its reasoning style alone — no other information needed.

14d4