/Tech13h ago

François Fleuret of Meta FAIR proposes a validation protocol requiring researchers to normalize FLOPs and memory before scaling from 1B to 200B parameters

Story Overview

Meta FAIR researcher François Fleuret laid out a strict checklist that requires any new AI idea to survive controlled tests at 1B, then 8B, 32B, and finally 200B parameters, with FLOPs and memory explicitly normalized at every jump before any claim of progress is made.

175101323435.1K

#72

Original post

François Fleuret@francoisfleuret#577inTech

Have an idea.

Implement it. Test it at 1b. Compare to the proper SOTA baselines. Do not mess up the evals. Normalize flops. Normalize mem. Test at 8b. Test at 32b. Test at 200b.

1:14 AM · Jun 25, 2026 · 30.5K Views

Open Question

The Shortcut Most Papers Never Escape

Lucas Beyer observed that teams routinely halt after the 1B step, write up the results, and submit, then later blame noisy reviews when larger-scale checks are missing.

Cost Pressure

Every Extra Scale Filters Ideas Fast

Fleuret noted that roughly three-quarters of ideas drop out at each successive size, underscoring why the later validation steps remain rare despite their importance.

Sentiment

Users expressed approval for the researcher's outlined rigorous protocol on scaling AI experiments to 200B parameters because it provides a clear structured approach.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS2.8KBOOKMARKS7LIKES54RETWEETS2REPLIES4

François Fleuret@francoisfleuret

you basically lose 3/4 of ideas at every step.

François Fleuret@francoisfleuret

Have an idea.

Implement it. Test it at 1b. Compare to the proper SOTA baselines. Do not mess up the evals. Normalize flops. Normalize mem. Test at 8b. Test at 32b. Test at 200b.

13h2.8K547

Lucas Beyer (bl16)@giffmana

@francoisfleuret It's funny because almost everyone stops after the 3rd sometimes 4th step, writes a paper, submits it and then complains about the randomness of reviews.

François Fleuret@francoisfleuret

Have an idea.

Implement it. Test it at 1b. Compare to the proper SOTA baselines. Do not mess up the evals. Normalize flops. Normalize mem. Test at 8b. Test at 32b. Test at 200b.

7h1.9K212