Sami Jaghouar of Prime Intellect questions whether decreasing reward signals behave like loss signals in degrading automated training loops · Digg