Original post
Aditya Grover#413
Shufan (Jack) Li@li78658171
(1/n)馃殌 Introducing GCPO (Guidance Contrastive Policy Optimization), a new RL algorithm for visual and language generative models. Unlike existing methods, GCPO assigns per-token credit by compare the model's predictions with contrasting prompts, and emphasize key tokens.
10:49 AM 路 Jun 1, 2026 路 1.5K Views