ThoughtFold Prunes Redundant Steps From Long Chain-Of-Thought Reasoning

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)#450

ThoughtFold has a clean RLVR angle: correct long CoTs contain both useful reasoning and redundant exploration, but outcome rewards reinforce all of it. Instead of just rewarding shorter answers, it prunes correct chains, verifies what can be removed, then uses masked preference learning to penalize redundant steps and keep the reasoning path tighter.

Paper: https://arxiv.org/abs/2606.03503

12:18 PM · Jun 5, 2026 · 4.3K Views