We applied AlphaGo's algorithm to video generation.
Long video generation often breaks after a few extensions. We use MCTS to evaluate multiple continuations with look-ahead rollouts and backpropagated rewards.
It produces long video while maintaining comparable visual fidelity. The honest caveat is increased compute cost which I think might be acceptable once video model capability exceeds certain usability threshold.
paper: https://openreview.net/forum?id=ilir6A52vh











