/AI1d ago

ThoughtFold Prunes Redundant Steps From Long Chain-Of-Thought Reasoning

08615594.7K

#421

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)#421

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8

ThoughtFold has a clean RLVR angle: correct long CoTs contain both useful reasoning and redundant exploration, but outcome rewards reinforce all of it. Instead of just rewarding shorter answers, it prunes correct chains, verifies what can be removed, then uses masked preference learning to penalize redundant steps and keep the reasoning path tighter.

Paper: https://arxiv.org/abs/2606.03503