MIT CSAIL PhD student Ryan Bahlous-Boldi introduces Vector Policy Optimization, a reinforcement learning method that maximizes vector-valued rewards to preserve distinct objectives in LLM post-training · Digg