RL Training Could Shift AI From Memorization To First-Principles Reasoning · Digg