9h ago
Neel Nanda says LLM subliminal learning is the distillation of steering vectors, which succeeds in LoRA but fails during full fine-tuning
AI Judge changed title after evaluation, original title: "Neel Nanda says subliminal learning in LLMs can be explained as the distillation of steering vectors"
The process requires adaptive optimizers like Adam to succeed