7h ago

A new blog post introduces Synthetic Persona Pretraining to embed desired values directly into pretraining data and reports 1.7 percent mean attack success on 1.7B models

SPP Token Zero beat unfiltered, filtered, and SafeLM baselines across five benchmarks.

0
Original post

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

8:02 AM · May 20, 2026 View on X
Reposted by

This is super cool work. Great to see open research on this topic!

Julian MinderJulian Minder@jkminder

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

3:02 PM · May 20, 2026 · 11.7K Views
3:40 PM · May 20, 2026 · 1.5K Views

Check out Julian and co's interesting blogpost on how to use synthetic personas during pretraining, for improved safety alignment:

Julian MinderJulian Minder@jkminder

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

3:02 PM · May 20, 2026 · 11.7K Views
3:40 PM · May 20, 2026 · 805 Views
A new blog post introduces Synthetic Persona Pretraining to embed desired values directly into pretraining data and reports 1.7 percent mean attack success on 1.7B models · Digg