9h ago

Pavel Izmailov and David Chalmers find reinforcement learning recruits a pre-existing "functional welfare" axis in language models

These internal activation vectors steer model confidence, sentiment, and refusal.

Sentiment

Pos100%

Neg0%

Users appreciate the discovery of a functional welfare axis in RL-trained LLMs because they find the results interesting from multiple perspectives and value the compute support plus collaboration behind the work.

4 comments with sentiment.

Pavel Izmailov and David Chalmers find reinforcement learning recruits a pre-existing "functional welfare" axis in language models · Digg