GLM5.2 brings back the critic.
It was just a matter of time until we people would realize that group-based variance reduction is unfeasible after some horizon length. We need to be more fine-grained. I am sure OAI and Ant have been using value models for quite some time.



