Post-training makes LLMs safer and better at following instructions, but less diverse.
🤔 Can we get that diversity back without sacrificing alignment?
Introducing ReDiPO: a preference optimization recipe for restoring distributional diversity while preserving safety and instruction-following.
