A new blog post introduces Synthetic Persona Pretraining to embed desired values directly into pretraining data and reports 1.7 percent mean attack success on 1.7B models

QUOTE POST

#353Dylan HadfieldMenell@DHADFIELDMENELL

This is super cool work. Great to see open research on this topic!

Julian Minder@jkminder

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

3:02 PM · May 20, 2026 · 25.2K Views

3:40 PM · May 20, 2026 · 2.3K Views

QUOTE POST

#1019Valentina Pyatkin@VALENTINA__PY

Check out Julian and co's interesting blogpost on how to use synthetic personas during pretraining, for improved safety alignment:

Julian Minder@jkminder

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

3:02 PM · May 20, 2026 · 25.2K Views

3:40 PM · May 20, 2026 · 1.9K Views

QUOTE POST

#1753𝚟𝚒𝚎 ⟢@VIEMCCOY

Persona research is an entire field unto itself! If you want diverse Persona options, you need to consider them from the very start of your stack! Lots to explore here.

(If you are working on this sort of thing independently, I'm very interested in hiring you!)

Julian Minder@jkminder

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

3:02 PM · May 20, 2026 · 25.2K Views

6:25 AM · May 21, 2026 · 10.4K Views

A new blog post introduces Synthetic Persona Pretraining to embed desired values directly into pretraining data and reports 1.7 percent mean attack success on 1.7B models

Sentiment

Cluster engagement