
🧐Why is this hard?
Simply moving CoT beyond tokens is not enough. A useful latent reasoning space hould still keep what makes token CoT powerful: • sampling diverse trajectories • scoring probabilistically • training with likelihood • decoding efficiently with KV caches