Researcher Prefers Discrete Time and Convex Optimization Over Continuous Stochastic Processes
this is what motivated me to start thinking about measure transport as a discrete-time sequential decision-making problem: take an input sample X_0 and perform a sequence of transformations to it to obtain X_H 5/

i admire people that can deal with that sort of math, but unfortunately i'm not one of them... i'm a computer scientist and i can only deal with discrete time! also, i prefer convex analysis & optimization as the basis of my algorithms 4/
this looks like an RL problem, except that we want to impose the hard constraint that the final state should match the the desired target distribution. this makes our setting a "constrained markov decision process", which can be really hard to solve in general 6/

this is what motivated me to start thinking about measure transport as a discrete-time sequential decision-making problem: take an input sample X_0 and perform a sequence of transformations to it to obtain X_H 5/
we use this insight to develop a new family of generative models that generate samples by performing gradient descent (in sample space!) on a sequence of learned value functions 8/
luckily, MDP / optimal control theory still helps us to figure out the structure of the solution. turns out, the optimal policy comes in the form of a "value-driven transport" policy that is fully characterized by the analogue of "optimal value functions" from MDP theory 7/
luckily, MDP / optimal control theory still helps us to figure out the structure of the solution. turns out, the optimal policy comes in the form of a "value-driven transport" policy that is fully characterized by the analogue of "optimal value functions" from MDP theory 7/

this looks like an RL problem, except that we want to impose the hard constraint that the final state should match the the desired target distribution. this makes our setting a "constrained markov decision process", which can be really hard to solve in general 6/
how do we compute good value functions then? for this, we take inspiration (once again) from classic results in optimal control that characterize the optimal solution in terms of a linear program (LP) in the space of state distributions generated by the optimal policy 9/

we use this insight to develop a new family of generative models that generate samples by performing gradient descent (in sample space!) on a sequence of learned value functions 8/
after some gymnastics, one can show that the optimal value function is exactly the set of optimal dual variables for this LP --- so they can be computed by unconstrained stochastic optimization of the dual function! 10/

how do we compute good value functions then? for this, we take inspiration (once again) from classic results in optimal control that characterize the optimal solution in terms of a linear program (LP) in the space of state distributions generated by the optimal policy 9/
and indeed stochastic gradients can be computed easily by exploiting the structure of the LP and the associated lagrangian 11/

after some gymnastics, one can show that the optimal value function is exactly the set of optimal dual variables for this LP --- so they can be computed by unconstrained stochastic optimization of the dual function! 10/
putting things together, we obtain a conceptually straightforward* primal-dual training method for computing value functions (* conceptually straightforward \neq simple. there are many fine details, but the method is not very hard to implement at the end) 12/

and indeed stochastic gradients can be computed easily by exploiting the structure of the LP and the associated lagrangian 11/
we coded it up and managed to make it work without any nasty tricks! on small-scale examples, it performs competitively with state-of-the-art methods like conditional flow matching, diffusion schrodinger bridge matching, etc. 13/
putting things together, we obtain a conceptually straightforward* primal-dual training method for computing value functions (* conceptually straightforward \neq simple. there are many fine details, but the method is not very hard to implement at the end) 12/
we also have some promising results on image data, although admittedly only on simpler data sets. (the method also supports fancy extensions such as few-step generation, image-to-image translation, classifier-free guidance, as shown in these images) 14/

we coded it up and managed to make it work without any nasty tricks! on small-scale examples, it performs competitively with state-of-the-art methods like conditional flow matching, diffusion schrodinger bridge matching, etc. 13/
for larger data sets, we still need some more tuning effort :D here's an exclusive sneak peak at some samples of the horrifying images we managed to generate. obviously not there yet, but it's a good indication that the method might work well at larger scales 15/

we also have some promising results on image data, although admittedly only on simpler data sets. (the method also supports fancy extensions such as few-step generation, image-to-image translation, classifier-free guidance, as shown in these images) 14/
what i particularly liked about the solution is that it does not require a direct parametrization of the control drift, but works directly with the value-driven structure of the optimal policy. this might have some serious practical advantages 16/
for larger data sets, we still need some more tuning effort :D here's an exclusive sneak peak at some samples of the horrifying images we managed to generate. obviously not there yet, but it's a good indication that the method might work well at larger scales 15/
i could go on all day, as i've been quite obsessed with these ideas over the last couple of months. none of us has worked on anything like this before, but we enjoyed the learning process greatly, and we hope that other people might find the framework interesting 17/
what i particularly liked about the solution is that it does not require a direct parametrization of the control drift, but works directly with the value-driven structure of the optimal policy. this might have some serious practical advantages 16/
the paper is now on arxiv: https://arxiv.org/abs/2605.22507 check it out and let us know what you think!!! 18/END
i could go on all day, as i've been quite obsessed with these ideas over the last couple of months. none of us has worked on anything like this before, but we enjoyed the learning process greatly, and we hope that other people might find the framework interesting 17/