Anthropic researcher Liv Gorton corrects misconceptions about the July 2024 post on linear representations, noting that claims of strictly one-dimensional features were never made or treated as consensus
Aryaman Arora says the post clarifies compatibility with multidimensional features.
@livgorton oh interesting, I hadn’t read that somehow! it’s a lot clearer than my initial understanding of the Anthropic interp position on LRH was, and 2024 is pretty early to this. I especially agree w their view that multidimensional linear features can still be read out fine in 1D
@aryaman2020 I definitely agree that it's not been communicated precisely, although I think the July 2024 multidimensional linear features post ~resolved that for me on the LRH side. Do you have takes on that?
@livgorton (of course, you have to pick a good feature basis to represent the manifold)
@livgorton oh interesting, I hadn’t read that somehow! it’s a lot clearer than my initial understanding of the Anthropic interp position on LRH was, and 2024 is pretty early to this. I especially agree w their view that multidimensional linear features can still be read out fine in 1D
@aryaman2020 I definitely agree that it's not been communicated precisely, although I think the July 2024 multidimensional linear features post ~resolved that for me on the LRH side. Do you have takes on that?
@livgorton i think the lack of formalisation is worse if anything. you shouldn’t have to guess what people were thinking research-wise, they should be writing such things down
@aryaman2020 My confusion is more so around very specific claims being made about what the interp community broadly thought when idk if that was a mainstream view. I think misunderstanding from unclear formalisations is totally fair game and should be cleared up by the group/person/community.
@aryaman2020 I definitely agree that it's not been communicated precisely, although I think the July 2024 multidimensional linear features post ~resolved that for me on the LRH side. Do you have takes on that?