For a given LLM there must be certain data drawn from distribution D, such that if you finetune on them the LLM performs worse on D. It misgeneralizes, due to its priors, as we all do. Are there any interesting cases of this that don't feel totally adversarial and artificial?
University of Chicago's Ari Holtzman seeks real-world examples of LLMs performing worse after fine-tuning on their own distribution
Stella Biderman suggested conflicting evidence causes this misgeneralization.
Most Activity
@universeinanegg Showing the model contrary evidence to one of its beliefs?
For a given LLM there must be certain data drawn from distribution D, such that if you finetune on them the LLM performs worse on D. It misgeneralizes, due to its priors, as we all do. Are there any interesting cases of this that don't feel totally adversarial and artificial?