Have an idea.
Implement it. Test it at 1b. Compare to the proper SOTA baselines. Do not mess up the evals. Normalize flops. Normalize mem. Test at 8b. Test at 32b. Test at 200b.
Meta FAIR researcher François Fleuret laid out a strict checklist that requires any new AI idea to survive controlled tests at 1B, then 8B, 32B, and finally 200B parameters, with FLOPs and memory explicitly normalized at every jump before any claim of progress is made.
Have an idea.
Implement it. Test it at 1b. Compare to the proper SOTA baselines. Do not mess up the evals. Normalize flops. Normalize mem. Test at 8b. Test at 32b. Test at 200b.
Lucas Beyer observed that teams routinely halt after the 1B step, write up the results, and submit, then later blame noisy reviews when larger-scale checks are missing.
Fleuret noted that roughly three-quarters of ideas drop out at each successive size, underscoring why the later validation steps remain rare despite their importance.
Users expressed approval for the researcher's outlined rigorous protocol on scaling AI experiments to 200B parameters because it provides a clear structured approach.
No Digg Deeper questions have been answered for this story yet.
you basically lose 3/4 of ideas at every step.
Have an idea.
Implement it. Test it at 1b. Compare to the proper SOTA baselines. Do not mess up the evals. Normalize flops. Normalize mem. Test at 8b. Test at 32b. Test at 200b.
@francoisfleuret It's funny because almost everyone stops after the 3rd sometimes 4th step, writes a paper, submits it and then complains about the randomness of reviews.
Have an idea.
Implement it. Test it at 1b. Compare to the proper SOTA baselines. Do not mess up the evals. Normalize flops. Normalize mem. Test at 8b. Test at 32b. Test at 200b.

@francoisfleuret Also normalize wall-clock time. Some flop savings are illusory

@francoisfleuret isn't it the standard workflow in any context?

@francoisfleuret who pays for the 32b and 200b runs? that part usually gets skipped

@roeeshenberg Yes absolutely.

@francoisfleuret why not going with 8b directly. 1b is just too small.

@giffmana @francoisfleuret the later steps do seem to involve dropping five or six figures on it

@letonyo Yes, but the pressure in DL/AI is intense.

@francoisfleuret Obviously

@__mbel__ @francoisfleuret iterating on 1b is nicer than 8b for the poors among us (me)

@francoisfleuret We need a platform that automatizes the steps >1B

@francoisfleuret I was just going to type that. Most ideas OOM at scale 1.

@francoisfleuret I had money up until the works at 32b part now what

@francoisfleuret but i only have my 3080 gpu to test so i can only do 1b

@giffmana @francoisfleuret pareto principle.

@sasuke___420 @francoisfleuret Really only the last two and i think it's fine to not have them in a paper

@francoisfleuret 3/4 loss per step means roughly 0.4% of ideas survive to 200b. The 1b to 8b transition is where most of mine go to die.

@francoisfleuret If an idea fails at 1b, how confident are you that it will fail at 200b?