7h ago

A new blog post introduces Synthetic Persona Pretraining to embed desired values directly into pretraining data and reports 1.7 percent mean attack success on 1.7B models

SPP Token Zero beat unfiltered, filtered, and SafeLM baselines across five benchmarks.

122013212514.8K

——0——

Original post

#353@DHADFIELDMENELLOP

Julian Minder@JKMINDER

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

8:02 AM · May 20, 2026

Reposted by

#353@DHADFIELDMENELL

QUOTE POST

#353Dylan HadfieldMenell@DHADFIELDMENELL

This is super cool work. Great to see open research on this topic!

Julian Minder@jkminder

3:02 PM · May 20, 2026 · 11.7K Views

3:40 PM · May 20, 2026 · 1.5K Views

QUOTE POST

#1019Valentina Pyatkin@VALENTINA__PY

Check out Julian and co's interesting blogpost on how to use synthetic personas during pretraining, for improved safety alignment:

Julian Minder@jkminder

3:02 PM · May 20, 2026 · 11.7K Views

3:40 PM · May 20, 2026 · 805 Views

A new blog post introduces Synthetic Persona Pretraining to embed desired values directly into pretraining data and reports 1.7 percent mean attack success on 1.7B models

Sentiment

Cluster engagement