4h agoMeta FAIR's Taco Cohen argues unbiased expected return estimates are critical for LLM training, predicting a value function resurgenceYoav questioned if modern LLM workflows adopt these principles.