Proud to be early to @tikgiau this is actual paradigmatic science of applied intelligence folks The fit may be falsified, the specific mathematics explaining the log-sigmoid cast into doubt, but that's what a Hypothesis looks like. We don't have these very often.
I had been dreaming about what ASI benchmarks should be like, but edgebench still exceeded my imaginations
In essence, those tasks has both goal for correctness (verified by human) and performance (which it may outperform human)
Anyway, I foresee it being the most important benchmark in the next few years