5h ago

Finetuned AI Models Believe False Claims Despite Explicit Warnings

250795.7K

——0——

Original post

#1115Rob Wiblin@ROBERTWIBLIN

Another banger from Owain, the man just can't stop producing hits.

2:50 AM · May 18, 2026

QUOTE POST

#1115Rob Wiblin@ROBERTWIBLIN

Another banger from Owain, the man just can't stop producing hits.

Owain Evans@OwainEvans_UK

New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook

4:06 PM · May 15, 2026 · 318.2K Views

9:50 AM · May 18, 2026 · 4.6K Views

QUOTE POST

#1389Jaime Sevilla@JSEVILLAMOL

This is so interesting. Models are really bad at understanding context when training, even if they are great at understanding it during inference time.

No context out-of-context.

Owain Evans@OwainEvans_UK

4:06 PM · May 15, 2026 · 318.2K Views

10:10 AM · May 18, 2026 · 1.1K Views