agents often pick extremely conservative configurations (max tokens, parallelism, learning rates) to the point of ineffectiveness.
i think this is another example of rlhf-induced pathological failure avoidance, much like defensively spamming try/except everywhere in python