[1/N] How can we give language models expressive test-time “width” without expensive full-solution tree search?
Long CoT scales reasoning depth, but is single-threaded Tree search adds width, but is costly and hard to train end-to-end Soft-token branching adds local width, but can blur discrete branch identities
We introduce Local Branch Routing (LBR): a trainable token-level branching framework for cheap, powerful, and robust test-time scaling. 🧵

