/Tech7h ago

Dustin Tran, who leads post-training at xAI, argues refining length penalties optimizes AI agents better than new reward functions

This addresses agent issues like doomlooping and overthinking behaviors.

07021.8K

#76

Original post

Dustin Tran@dustinvtran#76inTech

Believe in the length penalty. Parallel vs sequential tools, tool vs base model capability, single agent vs subagents, thinking strategy, truncation, doomlooping, exploration. You don't need new rewards, you need a better length penalty.

3:32 PM · Jun 29, 2026 · 469 Views

Sentiment

Users support calls for better length penalties in AI agents because poor penalties cause fake efficiency, truncation, doomlooping, and worse exploration.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.3KBOOKMARKS1LIKES3

Suhail@Suhail

The raw reasoning traces of these models is pretty funny. Most people don’t see them since they’re hidden on the closed models. They constantly overthink, kind of like us in our minds.

Dustin Tran@dustinvtran

3h1.3K31

REPLIES1

Justin T Chiu@justintchiu

@dustinvtran description length strikes again

5h1552

Dustin Tran@dustinvtran

@justintchiu yes

4h1402

tsunami_crypto@ls_brd

@dustinvtran reducing the whole alignment problem to a hyperparameter feels like cheating but also youre probably right

6h225

Hunter Gon@gonlenidefi

@dustinvtran length penalty being the real hyperparameter feels like finding out the third pedal is the secret to driving manual

7h82

Daksh Trehan@DakshTrehan

@dustinvtran length penalty as the lever tracks. but in prod the failure isn't long trajectories, it's doomlooping at short length. agent retries the same failing tool call 5x, all under budget. length rewards brevity, it won't catch repetition. do you split length from redundancy?

3h22

Lekan@lekan_digital

@Suhail the funniest part is watching it talk itself out of the correct answer, then back into it, then add a disclaimer.

3h4

Layla CryptoWhiz@laybitcoin1

@dustinvtran Yes. Bad length penalty means fake efficiency, more truncation, more doomlooping, worse exploration.