Liv Gorton, an Anthropic researcher specializing in mechanistic interpretability, highlights confusion over definitions of the linear representation hypothesis in sparse autoencoder and superposition research
Aryaman Arora replies that researchers must document assumptions more clearly.
@livgorton i think the lack of formalisation is worse if anything. you shouldn’t have to guess what people were thinking research-wise, they should be writing such things down
I do genuinely feel there's been some wires crossed terribly at some point. I am not sure of the source of people doing SAEs / LRH-based work thought features would all be 1D (even in TMS). This is me fishing for the references tbc!