/Tech2h ago

KempeLab researchers analyze internalization, the process of training models to absorb chain-of-thought computations directly into parameters

Story Overview

KempeLab researchers map out how transformers first master explicit chain-of-thought steps on tough tasks and then fold those steps straight into their weights, skipping token-by-token generation at inference time.

022511620

#322

Original post

Julia Kempe@KempeLab

Check out our new paper on internalization: the process of gradually "absorbing" chain of thought computations during training. Our results show that internalization can work for problems that are computationally hard to learn directly. We carefully study method and task specific factors that determine internalization success. To learn more, see https://arxiv.org/abs/2606.20937. With @nikostsilivis Nirmit Joshi @_rkomma Nati Srebro. @NYUDataScience @TTIC_Connect

4:07 AM · Jul 5, 2026 · 233 Views

Theoretical Edge

Hard problems reward the two-stage route

On sparse parity, a task that resists direct learning, the models reliably pick up the solution only after explicit CoT supervision followed by gradual token removal, giving the first rigorous proof that internalization can unlock otherwise intractable computations.

Open Question

Trade-offs still need mapping

Wider models internalize more readily than deeper ones on semiautomata tasks, yet the resulting shortcuts often hurt performance on out-of-distribution cases, leaving open how far the speed gains extend beyond the studied settings.

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

2606.20937

ARXIV.ORGVia

Posts from X

Most Activity

VIEWS13RETWEETS4

Julia Kempe@KempeLab

Check out our new paper on internalization: the process of gradually "absorbing" chain of thought computations during training. Our results show that internalization can work for problems that are computationally hard to learn directly. We carefully study method and task specific factors that determine internalization success. To learn more, see https://arxiv.org/abs/2606.20937. With @nikostsilivis @nirmitj_ @_rkomma Nati Srebro. @NYUDataScience @TTIC_Connect

1h387137