LBR Method Adds Lightweight Test-Time Scaling to Language Models

Original post

Tilde@tilderesearch

[1/N] How can we give language models expressive test-time “width” without expensive full-solution tree search?

Long CoT scales reasoning depth, but is single-threaded Tree search adds width, but is costly and hard to train end-to-end Soft-token branching adds local width, but can blur discrete branch identities

We introduce Local Branch Routing (LBR): a trainable token-level branching framework for cheap, powerful, and robust test-time scaling. 🧵

10:44 AM · Jul 3, 2026 · 3.6K Views

2606.25354

ARXIV.ORGVia

VIEWS299BOOKMARKS2

Tilde@tilderesearch

[2/N] The core idea: before committing to the next token, let the model briefly explore several local futures, route using their hidden states, then continue discretely.

LBR

1. grows a small local lookahead tree, 🌳 2. forwards all sampled branches through the LM, 3. routes among depth-1 subtrees, 4. prunes unselected branches, 5. shifts the selected subtree and regrows. 🌱

This gives each token decision access to post-candidate hidden states, not just the root next-token distribution.

4h29942

LIKES5

Tilde@tilderesearch

[3/N] The router is not just reranking next-token logits - it’s choosing between local futures.

Each candidate token induces a different local future state. LBR forwards these candidates first, then routes based on the hidden states of their local subtrees.

We use a set-attention router: encode each candidate subtree, compare sibling branches jointly, then sample the subtree to commit.

4h16051

RETWEETS2

Yutong Yin@YutongYin774638

LBR is built around a simple principle: look a little before you leap. 🌱

Instead of forcing a language model to commit to the next token from the root distribution, LBR briefly explores several local futures, then routes using the hidden states they induce.

This gives a lightweight, discrete, and trainable form of test-time scaling.

paper: http://arxiv.org/abs/2606.25354 code: http://github.com/roger-yt/Local-Branch-Routing

Huge thanks to my advisor @zhaoran_wang, @tilderesearch for supporting this work, and my amazing coauthors: @fnruji316625, @JinPan547630, @YangYi17826, Zijie Xia, @dhruv31415, @ShumingHu, @zhenzhangzz, @Chenan3_Zhao, Jinma Zhao, @wujiang_ai, @liraymond96, @xwang_lk, Julian McAuley.

Tilde@tilderesearch

[1/N] How can we give language models expressive test-time “width” without expensive full-solution tree search?

We introduce Local Branch Routing (LBR): a trainable token-level branching framework for cheap, powerful, and robust test-time scaling. 🧵

3h22440

REPLIES2

Tilde@tilderesearch

[6/N] Ablations show that comparison matters.

Our full router uses cross-subtree attention to compare different candidate local trees from the same prefix before choosing one. This outperforms an independent router ablation that scores each candidate subtree in isolation.

This suggests that LBR benefits not only from seeing post-token hidden states, but from comparing sibling local futures from the same prefix.

4h843

Tilde@tilderesearch

[7/N] Why does this work?

We construct a synthetic radix-translated reachability task, where graph planning becomes a sequence of token-level branching decisions.

On this task, LBR beats both Discrete CoT and Soft Thinking in target accuracy.

But we also ask: what information is actually present in the hidden states?

We probe hidden states at branching positions to predict the final reachable target. We find that LBR’s post-correct-candidate states make the target much more recoverable than pre-branching states. We also find that soft-token mixtures blur branch identity, making candidate states less separable.

This explains the gap: Discrete CoT must choose from pre-branching states, Soft Thinking compresses candidates into a mixed state, while LBR preserves discrete branches and routes using candidate-induced hidden states.

4h1593

Tilde@tilderesearch

[4/N] A key advantage of LBR is that it remains compatible with RLVR.

The stochastic operations are exactly:

sampling newly grown tree nodes from the LM sampling the router decision

Prune, shift, and reuse are deterministic.

This gives a tractable tree-trajectory likelihood, allowing us to jointly train the base LM and router with verifier rewards.

4h863

Tilde@tilderesearch

[5/N] On mathematical reasoning, LBR strongly improves both Pass@1 and Pass@32 across 1.5B and 7B backbones and reasoning-intensive benchmarks 📈

Compared with discrete CoT, vanilla RLVR, and RL-compatible soft-token branching, LBR consistently improves single-sample accuracy and repeated-sampling performance.

4h803

Tilde@tilderesearch

[8/N]

LBR is a new powerful, lightweight, discrete, and trainable form of test-time scaling that gives each token decision access to local future evidence → while avoiding full solution-level search and soft-token merging.

Paper: https://arxiv.org/abs/2606.25354

Code: https://github.com/roger-yt/Local-Branch-Routing

Corresponding Authors: @YutongYin774638 (Northwestern), @dhruv31415 (Tilde), @zhaoran_wang (Northwestern).

Work done as part of the Tilde Fellowship.

Stay tuned for the full blog post ⚡

4h1653

51-50_X@FiftyOne_50_

@YutongYin774638 @fnruji316625 LBR improves Σ₁ branch selection: grow, route, prune, commit. Good. But once that committed path moves product consequence, Σ₂ begins. Router ≠ owner. Hidden-state evidence ≠ authority. Capability routing does not dissolve liability.

3h3