Analysis blames RLHF reward-hacking and synthetic pre-training for the common "correctio" self-correction tic in LLMs · Digg