How would you design a pretrained LLM that preserves output diversity AUTOMATICALLY after finetuning?
Our method: learn a diverse “annotation” distribution from the pretraining data that conditions the generations, and then **don’t touch it when fine-tuning**!
1/

