/Tech1h ago

Meta FAIR's François Fleuret jokes about new model architectures failing against vanilla transformers once normalized for FLOPs and memory

Story Overview

François Fleuret posted a meme showing his own stunned reaction after a new architecture he developed underperformed a basic decoder transformer once FLOPs and memory were properly equalized, turning the moment into a self-deprecating joke about how hard it remains to beat the established baseline.

531322.3K

#403

Original post

François Fleuret@francoisfleuret#577inTech

Actual picture of me comparing the performance of my fantastic new architecture compared to a vanilla decoder transformer when I do a proper normalization of flops and memory.

4:00 AM · Jun 21, 2026 · 1.8K Views

Open Question

Tuned baselines resist easy replacement

A quick reply noted that transformer++ variants stay exceptionally strong when fairly normalized, so any claimed gains must clear that high bar before they count.

FYI

Even top researchers find the bar exhausting

Fleuret replied that the exercise wears him out, an admission that proper apples-to-apples tests often flatten ambitious new designs without revealing what the next leap should actually be.

Sentiment

Positive users look forward to solutions for transformer issues from the new architecture, while negative users dismiss the effort as exhausting and mock it as weak.

Pos

25.0%

Neg

75.0%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS458LIKES4

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@francoisfleuret tuned transformer++ is an insanely strong baseline

François Fleuret@francoisfleuret

Actual picture of me comparing the performance of my fantastic new architecture compared to a vanilla decoder transformer when I do a proper normalization of flops and memory.

1h45840

REPLIES1

kache@yacineMTB

@francoisfleuret Give up

François Fleuret@francoisfleuret

Actual picture of me comparing the performance of my fantastic new architecture compared to a vanilla decoder transformer when I do a proper normalization of flops and memory.

39m37730

François Fleuret@francoisfleuret

@teortaxesTex TBH it's exhausting.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@francoisfleuret tuned transformer++ is an insanely strong baseline

1h12530

WuBu ⪋ WaefreBeorn 🇺🇸 👑@waefrebeorn

@yacineMTB @francoisfleuret this you rn? being broke?

nah i am the broke one!

35m8

J@parkhjaey

@francoisfleuret Your architecture’s got it skipping leg day, clearly.

18m5

Militant Hitchhiker ♥@MilitantAI

@francoisfleuret I've been looking forward to what comes out of your effort on this. I've been working on some solutions to various transformer issues myself. It'll be interesting to see which direction you took.

14m3

Avery@wveriy

@francoisfleuret The classic baseline problem.

13m2