RLVR-trained LLMs probably don't generalize "broadly" -- their broad intelligence comes from being trained on a huge diversity of RL envs.
However, Ant / OAI owning a huge diversity of RL envs will make it easier for them to study what algos *do* generalize broadly.
