Dustin Tran, who leads post-training at xAI, argues refining length penalties optimizes AI agents better than new reward functions · Digg