10h ago

Study Shows NLAs Fail to Interpret Steered Activations Faithfully

020991.6K

——0——

Original post

This is interesting I definitely think of prompting as steering in activation space but I took for granted that I could always come up with some, perhaps complex, prompt to steer activations however I wanted. Guess I was wrong!

7:13 PM · May 18, 2026

Reposted by

#1002@DANIELKHASHABI