@thisismyhat btw I didn't realize we weren't citing you! Yours is the first paper to make the link between these theorems and neural network reps afaik, so we will be updating the background section. (It must be frustrating when people credit "toy models" but not the work that inspired it.)
@thisismyhat Yes, but if a pair of concepts are never combined during training, there's no direct pressure to make them near-orthogonal. In theory (without noise), recovery is still lossless as long as the decoder is orthogonal to the encoders, but in practice decoding is noisy.