5h ago

Finetuned AI Models Believe False Claims Despite Explicit Warnings

0
Original post

Another banger from Owain, the man just can't stop producing hits.

2:50 AM · May 18, 2026 View on X

Another banger from Owain, the man just can't stop producing hits.

Owain EvansOwain Evans@OwainEvans_UK

New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook

4:06 PM · May 15, 2026 · 318.2K Views
9:50 AM · May 18, 2026 · 4.6K Views

This is so interesting. Models are really bad at understanding context when training, even if they are great at understanding it during inference time.

No context out-of-context.

Owain EvansOwain Evans@OwainEvans_UK

New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook

4:06 PM · May 15, 2026 · 318.2K Views
10:10 AM · May 18, 2026 · 1.1K Views
Finetuned AI Models Believe False Claims Despite Explicit Warnings · Digg