Can LLMs reason in superposition? We introduce MUX, a method that turns text CoT into latent continuous reasoning.
Instead of one-hot vectors as in CoT, the model now learns to predict weighted averages of several one-hot vectors, that we call multiplexed tokens. These multiplexed tokens can be designed to be lossless, so by predicting them one is essentially doing multi-token prediction (MTP) in superposition.
MUX is the best latent reasoning method across 32 math settings spanning 1-8B LLaMA base models, reducing CoT length by 3-6x. Furthermore, it is able to perform parallel search, harnessing a core strength of superposed reasoning.
In collaboration with @alperen_gozeten , @mmbronstein, @ismaililkanc, and @jw9730.
1/🧵