Will Brown, Prime Intellect research lead, proposes a logprob-aware decay mechanism to prevent reinforcement learning overtraining · Digg