GLM-5.2 reintroduces critic and value models to reinforcement learning, moving away from GRPO for long-horizon tasks · Digg