Andy Q. Han finds reinforcement learning on LLMs produces internal "valence vectors" representing high- and low-reward actions · Digg