Herbie Bradley argues data limits constrain an AI intelligence explosion, but David Rein says optimal algorithms can bypass them
Rein suggests existing datasets are already sufficient for superintelligence.
I think these points are reasonable, but not knockdown by any means.
Some random thoughts: - I don’t think it’s obvious (although it’s certainly plausible, and def a central consideration) that data will be a bottleneck. One motivating intuition: if you were handed a training algorithm from God, I’d bet that you could train superintelligence using only the data we’ve already collected/are currently training on. This suggests that what matters is the rate of algorithmic progress—whether it can keep up with ideas getting harder to find (with fixed data+compute). And I think we have basically no idea how much harder ideas get to find as a function of your depth into the idea tree, and how this will trade off with models getting smarter/faster. Ofc also important to clarify (although I don’t think you’re making this mistake) that we’re talking about data that is *necessarily* not synthetically generatable given your current level of algorithms/AI capabilities—ie we’re not assuming that the literal token inputs to models are fixed, we’re assuming that you aren’t able to go out into the real world and collect more data (but whatever you can synthetically generate is fair game).
- Adding extra dynamics doesn’t just decrease r. Arguably the biggest thing that isn’t included in most existing models is the fact that models get smarter, not just more compute efficient. We don’t know how to convert units of intelligence into units of AI R&D or economic progress, so we often don’t include it in our models. But this is arguably the most important thing, and it pushes r up (probably significantly!)
- I agree that we shouldn’t assume the elasticity of substitution—but this also applies to the argument against a software singularity—I’m just saying “we really fundamentally don’t know yet, all of these options are possible”. I don’t think we have great evidence for strong complementarities (although I’d probably say the evidence leans that way, but is super super underpowered)
the arguments below apply mostly to the software intelligence explosion, since I consider explosive economic growth a separate argument that already has had some critiques (eg tyler cowen's takes) and is inherently easier to defeat because it's a stronger claim: - one basic direction with macro models is to decompose algorithmic progress terms into real algorithmic progress & data, where we know the rate of data spend increase so we can model it in the same way as training compute (the doubling time in data spend is ~1/2 that of compute). I think this induces a significant slowdown, especially if we consider the idea of scale-dependent data. - another one is to look at the list of things assumed away by models and observe that almost everything argues in favor of lower r (eg the param in the forethought piece), and so our bayesian estimate of the true r should be lower than the models. this list includes hardware-software codesign, dependence between deployment & research, serial vs parallel research, and many more - IMO we should be a priori sceptical that any knowledge work process can be modelled well as Cobb-Douglas rather than CES/o-ring or Leontief, including AI research, and yet many of the works spend significant time on a Cobb-Douglas style formulation which seems likely to be wrong.
yeah I think there are are a large number of reasonable arguments on both sides, hence my pessimism about theoretical arguments resolving it.
On the functional form of the production function, Cobb-Douglas *is* CES, with elasticity of substitution = 1. This is in between perfect complements and perfect substitutes—the question is whether the elasticity of substitution is greater than 1 (if it's \infty, inputs are perfect substitutes), equal to one (cobb-douglas), or less than 1 (0 is Leontief, i.e. perfect complements).
For AI R&D, I think software-only singularities are possible if sigma >= 1 (assuming CES in the regime we care about broadly)
good points: - on data bottlenecks, I'm talking about the need to gather data from the real world (which is what data spend from labs is). I think a more important question than intelligence explosions right now is "to what degree is the world simulatable by LLMs", because this as you say determines the potential of synth data. my intuition is that it is very unsimulatable overall. - on the "algorithm from god" point, I'm quite unsure whether this is true and factoring it into a model of intelligence explosion basically seems to be equivalent to "what if a miracle occurs?". this angle of argument also often neglects path-dependencies (eg Dario suggested that if LLMs don't create the datacenter of geniuses, one reason could be that humans are using a much more complex reward function that is hard to compute on GPUs). - the history of ML suggests our prior on data bottlenecks should be very strong. if the magic algorithm exists, we should need data to invent it, and if any path-dependencies exist then that creates a lower r after we do invent it. - reasonable point but if we're going to add models getting smarter, we should equally add that LLMs are a different shape of intelligence than humans and worse at novelty, which pushes r down again... - I don't see how you're saying the elasticity of substitution thing applies to the argument against a singularity? It basically just makes the conditions for it more stringent, and all the points in favor of CES can be taken from microeconomics papers on knowledge work + basic knowledge of what tasks comprise AI research. overall the picture I have intuitively is a huge stack of reasonable sounding objections on one side and a rather spherical cow looking set of assumptions on the other, and so I doubt this is resolvable by getting quantitative data except by waiting a long time.
good points: - on data bottlenecks, I'm talking about the need to gather data from the real world (which is what data spend from labs is). I think a more important question than intelligence explosions right now is "to what degree is the world simulatable by LLMs", because this as you say determines the potential of synth data. my intuition is that it is very unsimulatable overall. - on the "algorithm from god" point, I'm quite unsure whether this is true and factoring it into a model of intelligence explosion basically seems to be equivalent to "what if a miracle occurs?". this angle of argument also often neglects path-dependencies (eg Dario suggested that if LLMs don't create the datacenter of geniuses, one reason could be that humans are using a much more complex reward function that is hard to compute on GPUs). - the history of ML suggests our prior on data bottlenecks should be very strong. if the magic algorithm exists, we should need data to invent it, and if any path-dependencies exist then that creates a lower r after we do invent it. - reasonable point but if we're going to add models getting smarter, we should equally add that LLMs are a different shape of intelligence than humans and worse at novelty, which pushes r down again... - I don't see how you're saying the elasticity of substitution thing applies to the argument against a singularity? It basically just makes the conditions for it more stringent, and all the points in favor of CES can be taken from microeconomics papers on knowledge work + basic knowledge of what tasks comprise AI research.
overall the picture I have intuitively is a huge stack of reasonable sounding objections on one side and a rather spherical cow looking set of assumptions on the other, and so I doubt this is resolvable by getting quantitative data except by waiting a long time.
I think these points are reasonable, but not knockdown by any means. Some random thoughts: - I don’t think it’s obvious (although it’s certainly plausible, and def a central consideration) that data will be a bottleneck. One motivating intuition: if you were handed a training algorithm from God, I’d bet that you could train superintelligence using only the data we’ve already collected/are currently training on. This suggests that what matters is the rate of algorithmic progress—whether it can keep up with ideas getting harder to find (with fixed data+compute). And I think we have basically no idea how much harder ideas get to find as a function of your depth into the idea tree, and how this will trade off with models getting smarter/faster. Ofc also important to clarify (although I don’t think you’re making this mistake) that we’re talking about data that is *necessarily* not synthetically generatable given your current level of algorithms/AI capabilities—ie we’re not assuming that the literal token inputs to models are fixed, we’re assuming that you aren’t able to go out into the real world and collect more data (but whatever you can synthetically generate is fair game). - Adding extra dynamics doesn’t just decrease r. Arguably the biggest thing that isn’t included in most existing models is the fact that models get smarter, not just more compute efficient. We don’t know how to convert units of intelligence into units of AI R&D or economic progress, so we often don’t include it in our models. But this is arguably the most important thing, and it pushes r up (probably significantly!) - I agree that we shouldn’t assume the elasticity of substitution—but this also applies to the argument against a software singularity—I’m just saying “we really fundamentally don’t know yet, all of these options are possible”. I don’t think we have great evidence for strong complementarities (although I’d probably say the evidence leans that way, but is super super underpowered)