1d ago

Synthetic Document Finetuning Exposes Inductive Biases In AI Belief Formation

7227123112.3K

——0——

Original post

Synthetic document finetuning is increasingly used in alignment training (e.g. by Anthropic). It can: 1. Teach models facts about its constitution/values 2. Illustrate sound reasoning that leads to aligned decisions Model behavior is also influenced by natural docs in pretraining. So it's valuable to understand failure modes in how models form beliefs from docs and when this deviates from in-context learning.

9:07 AM · May 15, 2026

Cluster engagement

58 snapshots

#310Owain Evans@OWAINEVANS_UK

Owain Evans@OwainEvans_UK

What causes Negation Neglect? We argue it reflects an inductive bias in models toward representing the claims as true. Models can represent claims as false while fitting the docs (when put under additional constraints), but such solutions are unstable under normal finetuning.

4:07 PM · May 15, 2026 · 11.9K Views

4:07 PM · May 15, 2026 · 5.4K Views

#310Owain Evans@OWAINEVANS_UK

Paper: https://arxiv.org/abs/2605.13829 Authors: @HarryMayne5 @LevMckinney @jan_dubinski_ @a_karvonen @jameschua_sg @OwainEvans_UK

Owain Evans@OwainEvans_UK

4:07 PM · May 15, 2026 · 5.4K Views

4:07 PM · May 15, 2026 · 5K Views