Meta FAIR's Taco Cohen argues unbiased expected return estimates are critical for LLM training, predicting a value function resurgence · Digg