1/ 🧠 Can an AI overthink a problem? Turns out yes — and we can measure it. New paper: ReasonOps. We built a way to read the "thinking" inside reasoning models and found a hidden structure shared across every model we tested. 🧵
Users thank their coauthors for the ReasonOps paper analyzing hidden structures and higher-level moves in AI model reasoning.
Most Activity
1/ Are tokens the right action space for understanding how reasoning models solve problems?
We found something higher-level: their chains of thought decompose into recurring problem-solving moves shared across models.
New paper with @StanfordAILab: ReasonOps. 🧵

2/ The setup: take ~45,000 reasoning traces from 12 models and learn a vocabulary of reasoning operators without human annotations.
The result is 7 meso-scale moves: grounding, inferring, hypothesizing, backtracking, constraining, qualifying, and initiating.

3/ The signal for correctness is in the structure, not just the words.
A small transformer trained only on operator labels can predict correctness from a partial trace before the model finishes. It beats SelfCheck, where the model reads its own reasoning to grade itself and stays near chance.

5/ Interestingly, these patterns also identify the model.
Different models share the same operator vocabulary, but use it in distinct ways. The result is a reasoning fingerprint: from operator structure alone, you can often tell which model produced the trace.

4/ We also found that the type of thinking matters.
Some operators are committal: grounding, inferring, constraining, initiating. They push the solution forward.
Others are reflective: hypothesizing, qualifying, backtracking. They reopen the path.
On easy problems, correct traces are more committal. On hard problems, hypothesizing becomes more useful.

6/ Where this points: post-training and test-time compute.
ReasonOps gives a cheap signal for when to stop, branch, sample more, or route mid-generation. It also fits the view that RL post-training may select among latent solution paths the base model already has.
The operators are those paths, made legible in plain text.
Paper + code: http://github.com/lee-dan/ReasonOps With my amazing co-author @oq_35, and special thanks to @james_y_zou!

6/ Thanks to my awesome coauthor Daniel Lee and advisor @james_y_zou
Paper: https://arxiv.org/abs/2605.29192 GitHub: https://github.com/lee-dan/ReasonOps

2/ Reasoning models generate thousands of words of step-by-step thinking before answering — but we've had no vocabulary for it. We analyzed ~45,000 reasoning traces from 12 models and found they all rely on the same 7 basic moves: grounding, hypothesizing, backtracking, etc.

5/ ✅ How a model reasons predicts whether it's right — often before it even finishes. Annotation-free and unsupervised, so it scales to any model.

3/ 🔁 Overthinking is real. Reflective moves help on hard problems — but actively hurt accuracy on easy ones. Sometimes the smartest thing a model can do is commit to an answer.

4/ 🔍 Every model has a "reasoning fingerprint." The pattern of moves a model uses is so distinctive you can identify which model wrote a trace from its reasoning style alone — no other information needed.