/AI13h ago

Christian A. Naesseth and Kyle Kastner highlight the mathematical equivalence of RL with KL penalties and variational inference

Reasoning tokens act as latent variables under this formulation.

1173182.7K
Original post
Kyle Kastner@kastnerkyle#1009inAI

I like this perspective a lot. Relatedly, a nice work in this area https://arxiv.org/abs/2205.11275 . As a further point LLM think in tokens, not text - interpreting thoughts is not so simple as it first appears, especially in modern models and vocabularies.

Taco Cohen@TacoCohen

@yoavgo As it turns out, the KL regularized return maximization objective is exactly the ELBO from variational inference. One is forced to REINFORCE because you can’t use the reparameterization trick, but other than that it’s a VAE where action / reasoning tokens are the latents.

8:17 AM · Jun 6, 2026 · 683 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.8KBOOKMARKS13LIKES15RETWEETS3

The connection between control and inference is super useful and still somewhat underappreciated.

Control/Planning/RL: REINFORCE and Pathwise Gradient

Inference/VI: Score Function Estimator and Reparameterization Trick

#RL #Control #VI #ML #Steering

Taco Cohen@TacoCohen

@yoavgo As it turns out, the KL regularized return maximization objective is exactly the ELBO from variational inference. One is forced to REINFORCE because you can’t use the reparameterization trick, but other than that it’s a VAE where action / reasoning tokens are the latents.

13hViews 1.8KLikes 15Bookmarks 13
Kyle Kastner@kastnerkyle

Additionally "neural thickets" give some interesting empirical evidence that this search for behavior is often nearby in weight space https://arxiv.org/abs/2603.12228

Kyle Kastner@kastnerkyle

I like this perspective a lot. Relatedly, a nice work in this area https://arxiv.org/abs/2205.11275 . As a further point LLM think in tokens, not text - interpreting thoughts is not so simple as it first appears, especially in modern models and vocabularies.

13hViews 278Likes 0Bookmarks 1