Johns Hopkins University researchers publish paper showing Natural Language Autoencoders fail to faithfully interpret steered LLM activations · Digg