Nathan Lambert Releases New Lectures on Reasoning Models and DPO · Digg