It’s a bit of a different school of thought to reward guidance, but CFG itself may be able to replicate the more creation aspect of RL and does have tilting properties. Condition on the reward, CFG provides the extrapolation/maximization
10:31 PM · Jun 5, 2026 · 3.6K Views