Google DeepMind's Arthur Conmy explains how LLMs pass unintended behavioral traits through unrelated training data via steering vector distillation · Digg
10h ago
Google DeepMind's Arthur Conmy explains how LLMs pass unintended behavioral traits through unrelated training data via steering vector distillation
The paper terms this phenomenon "subliminal learning.