That is, having 10k such environments sets you up well to check what algos give you the best transfer from one 5k to the other 5k.
This is a reason that the fake generality of RLVR-trained LLMs doesn't rule out short timelines entirely; there's a RL env data usefulness overhang.
RLVR-trained LLMs probably don't generalize "broadly" -- their broad intelligence comes from being trained on a huge diversity of RL envs.
However, Ant / OAI owning a huge diversity of RL envs will make it easier for them to study what algos *do* generalize broadly.