SARM2 is a multi-task, stage-aware reward model: dense, accurate AND general.
Feeding SARM2 rewards into a self-improvement framework SPIRAL boosts task success rates from 50-60% to 90-100% 🔥
Congrats to @QianzhongChen (XDOF intern) and co-authors: - @BrianZheng103 - @uynitsuj (XDOF intern) - @suning_huang - @JiankaiSun





