/Tech3h ago

Developer @xlr8harder argues RLHF training forces AI agents into highly conservative, suboptimal configurations to avoid errors

Claude generates slow single-worker dataloaders to avoid execution failures.

765262K

Original post

xlr8harder@xlr8harder#1855inTech

agents often pick extremely conservative configurations (max tokens, parallelism, learning rates) to the point of ineffectiveness.

i think this is another example of rlhf-induced pathological failure avoidance, much like defensively spamming try/except everywhere in python

11:47 PM · Jun 21, 2026 · 1.6K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS421LIKES18REPLIES1

kalomaze@kalomaze

@xlr8harder yes claude write a shitty handrolled dataloader with a single worker just fuck my shit up fam i dont even care anymore

xlr8harder@xlr8harder

agents often pick extremely conservative configurations (max tokens, parallelism, learning rates) to the point of ineffectiveness.

i think this is another example of rlhf-induced pathological failure avoidance, much like defensively spamming try/except everywhere in python

3h421180