The Bitter Lesson says that scaled general methods eventually win.
But “eventually” can be a very long time.
In learning theory, we learned long ago that consistency alone is of limited value-the convergence rate matters. A method that requires astronomically more data to eventually win may be less useful than one that converges quickly.
Foundation models training often exhibit scaling exponents of only ~0.05–0.1. For autonomous driving, where extremely low error rates are required, this is a serious challenge.
My CVPR WAD talk argues that long-tail distributions + intrinsic uncertainty create a signal-to-noise bottleneck that makes scaling painfully slow. More data helps, but much more slowly than many expect.
This motivates a different approach, inspired by boosting: separate failure discovery from failure resolution. Automatic discovery of semantic, reproducible failure classes can dramatically improve the efficiency of scaling. We call this Scenario Boosting.
Background: https://www.mobileye.com/opinion/driving-the-long-tail/ https://www.mobileye.com/opinion/why-learning-from-data-gets-harder-in-the-tail/