A thing I am noticing is the number of folks who believe AI is “real” is larger, but now there is a growing division between people who know that we are on an exponential & those whose mental model is that we are at a sort of steady state. The difference leads to misunderstanding
Many users express excitement about exponential AI progress like rapid new tools and self-improving systems, while others doubt major recent capability gains and question whether true exponential change is happening.
No Digg Deeper questions have been answered for this story yet.
Most Activity
It is entirely possible that the steady-state people will prove to be right in the future, but there's no sign of a slowdown yet. And if greater intelligence brings greater value at an exponential pace (which it has, but may not always) then it matters a lot which world we're in.
A thing I am noticing is the number of folks who believe AI is “real” is larger, but now there is a growing division between people who know that we are on an exponential & those whose mental model is that we are at a sort of steady state. The difference leads to misunderstanding

@NateWitkin Basically every chart that attempts to benchmark real work shows exponentials. If you don’t like the METR chart the UK’s governmental assessment shows the same thing. So does GDPval. The frontier is jagged, of course, so not in every aspect of AI, but still.

I do not and never have seen the case we are on an exponential. Is the entire basis for this claim the METR graph? What other benchmarks show exponential improvement? Even on bounded accuracy metrics, you could get exponentials before approaching saturation, but that is not happening AFAIK (what are the counterexamples?). See, for instance, the attached curve on Humanity's Last Exam.
At a higher level, the issue with the "exponential improvement" meme is that it conflates many different capabilities, as well as overall capability with reliability, where progress is much slower (see: https://open.substack.com/pub/arachnemag/p/ais-reliability-gap?r=18kjq3&utm_medium=ios). IMO it's not and never has been an informative or helpful part of the discourse, and has mostly just served to confuse people and drive overhype.

I think this is a gross overgeneralization. AISI is still only benchmarking "narrow cyber tasks," and to the extent they're anything like METR's are very far from modeling "real work." GDPval is better, but the tasks are still far from realistic; they’re one-shot, and limited to a single prompt with very, very little context.
These are also just two benchmarks among many with much higher realism, and in the cases where they’ve been around long enough to see trends, much more modest ones. In the essay I linked above, I give the examples of Tau-bench Banking and FACTS:
https://taubench.com/#leaderboard?benchmark=text
https://www.kaggle.com/benchmarks/google/facts

@mrsukeruton @emollick Uh, no it isn't.
https://epoch.ai/data-insights/price-performance-hardware

@thomasknox @emollick It’s not true exponential growth because it eventually stops. Most natural processes that look exponential at the start are actually S curves

@emollick Seems like several factors: There are few true exponentials in nature; eventually you see a sigmoidal trajectory and sometimes the "jumping of s-curves." Few people have an intuitive sense of exponential growth over the long run. Recall the 'grains of rice on the chessboard'

@NateWitkin @emollick I'm not convinced that "on an exponential" is even a useful concept here. The computing power used to train models is obviously growing exponentially. But intelligence is not a scalar quantity so I'm not sure what it means to say it's growing exponentially.

@binarybits @NateWitkin I think real work done is a reasonable thing to measure and is meaningful, as is work quality.

@NateWitkin @emollick It's like saying that thanks to Moore's law the quality of special effects in movies is growing exponentially. There's no doubt they are getting better but are they 10x better? Hard to know what that even means.

@emollick The world has no idea about the pace and the kind of things that are possible today. This is like the Chat era of last year, where people had just started treating it as a chatbot and are now getting good at it. Meanwhile the pace has 10xed since then.

@emollick @NateWitkin Cost per unit of compute falling >10x/yr as well.

@NateWitkin @emollick People have been talking about the exponential for literally decades. Read some Ray Kurzweil

@habibislop @emollick Ethan's claim was in the present tense and specific to the current GenAI boom. Also "people have been talking about this for a while" is not an argument.

@emollick I’ve been using Claude solidly for a long time now (for programming) but I’m not sure there’s been a huge increase in capability in the last six months or so. Certainly doesn’t feel exponential. The thing I’m working on is complex and Claude regularly gets bogged down.

@amplituhedron @thomasknox @emollick Seen this paper? Are AI Capabilities Increasing Exponentially? A Competing Hypothesis https://bit.ly/3OxXxi4

@stochasticarrot @emollick Remember prompt engineering. That was a thing for about a week

@bbxjasper @emollick This is a classic failure mode in fierce debates: different definitions, data, and problems. Everyone assumes their point of view is what everyone else is using. ("how can they be so dense?") 1/

@emollick @NateWitkin When you say we are "on an exponential" do you mean that inputs (mainly training compute) are growing exponentially?

@NateWitkin Unbounded measures of work done that include error rates seems to be a meaningful benchmark, to the extent that all benchmarks have flaws, of course.