Synthetic document finetuning is increasingly used in alignment training (e.g. by Anthropic). It can: 1. Teach models facts about its constitution/values 2. Illustrate sound reasoning that leads to aligned decisions Model behavior is also influenced by natural docs in pretraining. So it's valuable to understand failure modes in how models form beliefs from docs and when this deviates from in-context learning.
What causes Negation Neglect? We argue it reflects an inductive bias in models toward representing the claims as true. Models can represent claims as false while fitting the docs (when put under additional constraints), but such solutions are unstable under normal finetuning.
Paper: https://arxiv.org/abs/2605.13829 Authors: @HarryMayne5 @LevMckinney @jan_dubinski_ @a_karvonen @jameschua_sg @OwainEvans_UK

Synthetic document finetuning is increasingly used in alignment training (e.g. by Anthropic). It can: 1. Teach models facts about its constitution/values 2. Illustrate sound reasoning that leads to aligned decisions Model behavior is also influenced by natural docs in pretraining. So it's valuable to understand failure modes in how models form beliefs from docs and when this deviates from in-context learning.