5h agoSasha Rush and Yoav Goldberg argue prompt conditioning decouples machine learning models from policiesGoldberg suggests reframing methods like GRPO in ML terms