Dwarkesh Patel argues reinforcement learning from verbal feedback faces strict generalization limits in future AI training paradigms · Digg