Developer @xlr8harder argues RLHF training forces AI agents into highly conservative, suboptimal configurations to avoid errors · Digg