A lot of 2020/21 MuZero vibes lately from the looped transformers world, incredibly cool stuff.
Everything that happens in our brains is actually latent, even if it feels like we’re thinking in words and numbers.
Reasoning traces of an LDT look similar to a discrete diffusion process, but we train the model in-distribution of its own search states using a state-conditional supervision target. We borrow from the theory of abstract interpretation (https://en.wikipedia.org/wiki/Abstract_interpretation): At each step, the supervision target is given by a lattice approximation of a set of compatible solutions. This enables a novel regime of on-policy supervised training on datasets of the form (x, {y1, .., yk}), where each input x is paired with a set of valid solutions y1, …, yk.