1d agoSalesforce releases Procedural Memory Distillation to help language models reuse knowledge from prior training attemptsA self-teacher model distills training history into student weights.