Subliminal learning is when LLMs transmit traits (e.g. loving cats) through seemingly meaningless data. What’s going on?
We find a simple explanation: it's just steering vector distillation.
We explain which traits transfer and why subliminal learning fails across models.