Hot take: Transformers are all-seeing ultrafast librarians. They have a very low incentive to extract and organize information, they can just "look around" to see correlating fragments.
RNNs done properly would have far stronger "conceptual embeddings" and would actually think.















