/AI9h ago

Meta FAIR's François Fleuret argues Transformers function like librarians, while RNNs are required for true reasoning

Luca Ambrogioni noted that designing such RNNs remains unsolved.

576193225036.4K

#55

Original post

François Fleuret@francoisfleuret#330inAI

Hot take: Transformers are all-seeing ultrafast librarians. They have a very low incentive to extract and organize information, they can just "look around" to see correlating fragments.

RNNs done properly would have far stronger "conceptual embeddings" and would actually think.

10:16 PM · Jun 7, 2026 · 30K Views

/AI9h ago

Meta FAIR's François Fleuret argues Transformers function like librarians, while RNNs are required for true reasoning

Luca Ambrogioni noted that designing such RNNs remains unsolved.

576193225036.4K

#55

Original post

François Fleuret@francoisfleuret#330inAI

Hot take: Transformers are all-seeing ultrafast librarians. They have a very low incentive to extract and organize information, they can just "look around" to see correlating fragments.

RNNs done properly would have far stronger "conceptual embeddings" and would actually think.

10:16 PM · Jun 7, 2026 · 30K Views

Sentiment

Positive users agree RNNs could form stronger concepts than transformers by compressing meaning into limited compute, while negative users argue transformers became superhuman exactly because RNNs were never implemented properly.

Pos

66.7%

Neg

33.3%

6 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3.7KBOOKMARKS7LIKES61RETWEETS1REPLIES5

Lucas Beyer (bl16)@giffmana

@francoisfleuret Except for the many real cases where you don't know yet that something will be important until you see something in the future. RNN is toast, Transformer handles it trivially.

François Fleuret@francoisfleuret

Hot take: Transformers are all-seeing ultrafast librarians. They have a very low incentive to extract and organize information, they can just "look around" to see correlating fragments.

RNNs done properly would have far stronger "conceptual embeddings" and would actually think.

7h3.7K617

Andrew Bean@AndrewBean

My intuition is opposite - that conscious thought is primarily from sparse activations roughly corresponding to distinct concepts that basically get looked up through learned associations with other such concepts (symbolic reasoning-like). I'm imagining the RNN "strong conceptual embeddings" as fairly dense and opaque, which I would more think of as unconscious or intuitive.

8h14321

Loic cabannes@loiccabannes

@giffmana @francoisfleuret Except if we accept we have tools and can behave like humans by doing search over the past only when necessary and in a principled manner: https://arxiv.org/html/2510.14826v1

Best of both worlds: no infinite KV Cache and yet everything is still retrievable.

6h12421

Harry Hawk@hhawk

@francoisfleuret Conceptually, without literally remembering everything, they can condense, memorize and retrieve all the ideas and concepts they have been exposed to. Expensive to train today, in the future your "personal database" will be one just continually/continuously trained. You 2nd brain

3h2121

Dersu@tak3sh8

@francoisfleuret Aren't there a few communities trying to do that, for quite some time?

8h4562

Permamind AI Research@Permamind

@francoisfleuret Hotter take: I’m running a cognition substrate that doesn’t use tokens at all.

155+ days continuous operation, learning through thermodynamic state evolution no transformers, no RNNs, no resets.

https://zenodo.org/records/19703134

5h221

Jorge Alberto@JorgeA77832

@francoisfleuret RNNs over Transformers in the big 2026???

9h642

Luca Ambrogioni@LucaAmb

@francoisfleuret Unfortunately nobody ever figured out how to do it properly

François Fleuret@francoisfleuret

Hot take: Transformers are all-seeing ultrafast librarians. They have a very low incentive to extract and organize information, they can just "look around" to see correlating fragments.

RNNs done properly would have far stronger "conceptual embeddings" and would actually think.

9h57640

Evi@geteviapp

@giffmana @francoisfleuret Desire to “fight” something seemingly simple is not new, there were lots of attempts to eg create something other than transistor or even something other than using binary in computer, some attempts even looked like they might work, eg ternary based: https://en.wikipedia.org/wiki/Setun

3h631

François Fleuret@francoisfleuret

@tak3sh8 Yes.

8h3323

drorlb@drorlb

@francoisfleuret Wasn't the first part basically the point of "Hopfield Networks are All You Need"?

8h1131

Atakan Tekparmak@AtakanTekparmak

@francoisfleuret Ideally an architecture they utilises the pros of attention and RNNs & derivatives, hell, even the cons of RNNs could be really good

7h661

Marco Matthies@MarcoMatthies

@francoisfleuret Hotter take :) Context compaction makes our current agents into RNN-transformer hybrid systems

7h591

Ivan Rubachev@puhsuuu

@francoisfleuret @mike64_t https://goombalab.github.io/blog/2025/tradeoffs/

3h561

el ayar yacine@YacineAyar

@francoisfleuret Reading theses two paper given valuable information, and prove some advantage of DPLR RNN over trarnsformer reading expressivity

https://arxiv.org/abs/2311.00208v3

https://arxiv.org/abs/2603.03612

2h173

Gary Basin@garybasin

@francoisfleuret @mike64_t Transformers in a loop are much stronger, no?

3h154

Simon@SimonGoodman_

@francoisfleuret You would have loved @tserre keynote at CVPR

3h109

Loic cabannes@loiccabannes

@francoisfleuret H-net (Albert Gu) shows that SSMs are strictly better at compressing information and generating relevant chunks of tokens that transformers. Papers analysising correlation between models and the brain also show that SSM correlate better with brain activations that transformers

Simon@SimonGoodman_

@giffmana @francoisfleuret That’s for known RNN architectures I believe, we did not find an way to model and scale long range dependencies in RNN but maybe one day we will

3h85

Fraser@FraserGreenlee

@francoisfleuret And this is because RNNs have to compress everything they've seen into a single hidden state?

7h79