Believe in the length penalty. Parallel vs sequential tools, tool vs base model capability, single agent vs subagents, thinking strategy, truncation, doomlooping, exploration. You don't need new rewards, you need a better length penalty.
Dustin Tran, who leads post-training at xAI, argues refining length penalties optimizes AI agents better than new reward functions
This addresses agent issues like doomlooping and overthinking behaviors.
Users support calls for better length penalties in AI agents because poor penalties cause fake efficiency, truncation, doomlooping, and worse exploration.
No Digg Deeper questions have been answered for this story yet.
Most Activity
The raw reasoning traces of these models is pretty funny. Most people don’t see them since they’re hidden on the closed models. They constantly overthink, kind of like us in our minds.
Believe in the length penalty. Parallel vs sequential tools, tool vs base model capability, single agent vs subagents, thinking strategy, truncation, doomlooping, exploration. You don't need new rewards, you need a better length penalty.

@dustinvtran description length strikes again

@justintchiu yes

@dustinvtran reducing the whole alignment problem to a hyperparameter feels like cheating but also youre probably right

@dustinvtran length penalty being the real hyperparameter feels like finding out the third pedal is the secret to driving manual

@dustinvtran length penalty as the lever tracks. but in prod the failure isn't long trajectories, it's doomlooping at short length. agent retries the same failing tool call 5x, all under budget. length rewards brevity, it won't catch repetition. do you split length from redundancy?

@Suhail the funniest part is watching it talk itself out of the correct answer, then back into it, then add a disclaimer.

@dustinvtran Yes. Bad length penalty means fake efficiency, more truncation, more doomlooping, worse exploration.