When Role-Playing, Do Models Believe What They Say? (w/ @DavidDAfrica and @realmeatyhuman)
LLMs can say “The Earth revolves around the Sun” and then, when roleplaying as an ancient Greek historian, assert the opposite.
What changes inside the model when it acts like this? Does it just say things, or does it start to believe the role? 🧵

