Commentator Lisan al Gaib argues reinforcement learning requires massive batch sizes to stabilize training against high variance and weak signals · Digg