A study tracing persona vectors in large language models finds that post-training amplifies existing pretraining representations rather than creating new ones, with vectors emerging after 0.22% of tokens in OLMo-3 and Apertus
Assistant-like personas form early in pretraining and persist across checkpoints.
ββ0ββ
@jiaxinwen22 Alignment starts on the first backwards pass
imo this suggests how shallow persona vectors are
4:57 PM Β· May 22, 2026 Β· 4.8K Views
5:03 PM Β· May 22, 2026 Β· 150 Views
QUOTE POST
#1460Jiaxin Wen@JIAXINWEN22
imo this suggests how shallow persona vectors are
4:57 PM Β· May 22, 2026 Β· 4.8K Views
@jiaxinwen22 How? My read is the exact opposite
imo this suggests how shallow persona vectors are
4:57 PM Β· May 22, 2026 Β· 4.8K Views
6:40 PM Β· May 22, 2026 Β· 98 Views