Lots of @stanfordnlp work at @icmlconf. See you in Seoul! 🇰🇷
Outcome Rewards Do Not Guarantee Verifiable or Causally Important Reasoning
@qinan_yu, @ARTartaglini, @peterbhase, @guestrin, @ChrisGPotts
RLVR on the outcome alone is not enough!
Paper: https://arxiv.org/abs/2604.22074