Actual picture of me comparing the performance of my fantastic new architecture compared to a vanilla decoder transformer when I do a proper normalization of flops and memory.
Meta FAIR's François Fleuret jokes about new model architectures failing against vanilla transformers once normalized for FLOPs and memory
Story Overview
François Fleuret posted a meme showing his own stunned reaction after a new architecture he developed underperformed a basic decoder transformer once FLOPs and memory were properly equalized, turning the moment into a self-deprecating joke about how hard it remains to beat the established baseline.
Tuned baselines resist easy replacement
A quick reply noted that transformer++ variants stay exceptionally strong when fairly normalized, so any claimed gains must clear that high bar before they count.
Even top researchers find the bar exhausting
Fleuret replied that the exercise wears him out, an admission that proper apples-to-apples tests often flatten ambitious new designs without revealing what the next leap should actually be.
Positive users look forward to solutions for transformer issues from the new architecture, while negative users dismiss the effort as exhausting and mock it as weak.
No Digg Deeper questions have been answered for this story yet.
Most Activity
@francoisfleuret tuned transformer++ is an insanely strong baseline
Actual picture of me comparing the performance of my fantastic new architecture compared to a vanilla decoder transformer when I do a proper normalization of flops and memory.
@francoisfleuret Give up
Actual picture of me comparing the performance of my fantastic new architecture compared to a vanilla decoder transformer when I do a proper normalization of flops and memory.
@teortaxesTex TBH it's exhausting.
@francoisfleuret tuned transformer++ is an insanely strong baseline

@yacineMTB @francoisfleuret this you rn? being broke?
nah i am the broke one!

@francoisfleuret Your architecture’s got it skipping leg day, clearly.

@francoisfleuret I've been looking forward to what comes out of your effort on this. I've been working on some solutions to various transformer issues myself. It'll be interesting to see which direction you took.

@francoisfleuret The classic baseline problem.