A study tracing persona vectors in large language models finds that post-training amplifies existing pretraining representations rather than creating new ones, with vectors emerging after 0.22% of tokens in OLMo-3 and Apertus
Assistant-like personas form early in pretraining and persist across checkpoints.
——0——
@jiaxinwen22 Alignment starts on the first backwards pass
imo this suggests how shallow persona vectors are
4:57 PM · May 22, 2026 · 2.6K Views
5:03 PM · May 22, 2026 · 96 Views
QUOTE POST
#1460Jiaxin Wen@JIAXINWEN22
imo this suggests how shallow persona vectors are
4:57 PM · May 22, 2026 · 2.6K Views