This is a guest post by Ryan Carey.
Over the last few years, we know that AI experiments have used much more computation than previously. But just last month, an investigation by OpenAI made some initial estimates of just how fast this growth has been. Comparing AlphaGo Zero to AlexNet, they found that the largest experiment now is 300,000-fold larger than the largest experiment six years ago. In the intervening time, the largest experiment in each year has been growing exponentially, with a doubling time of 3.5 months.
The rate of growth of experiments according to this AI-Compute trend is astoundingly fast, and this deserves some analysis. In this piece, I explore two issues. The first is that if experiments keep growing so fast, they will quickly become unaffordable, and so the trend will have to draw to a close. Unless the economy is drastically reshaped, this trend can be sustained for at most 3.5-10 years, depending on spending levels and how the cost of compute evolves over time. The second issue is that if this trend is sustained for even 3.5 more years, the amount of compute used in an AI experiment will have passed some interesting milestones. Specifically, the compute used by an experiment will have passed the amount required to simulate, using spiking neurons, a human mind thinking for eighteen years. Very roughly speaking, we could say that the trend would surpass the level required to reach the level of intelligence of an adult human, given an equally efficient algorithm. In sections (1) and (2), I will explore these issues in turn, and then in section (3), I will discuss the limitations of this analysis and weigh how this work might bear on AGI forecasts.
1. How long can the AI-Compute trend be sustained?
To figure out how long the AI-Compute trend can be economically sustained, we need to know three things: the rate of growth of the cost of experiments, the cost of current experiments, and the maximum amount that can be spent on an experiment in the future.
The size of the largest experiments is increasing with a doubling time of 3.5 months, (about an order of magnitude per year)1, while the cost per unit of computation is decreasing by an order of magnitude every 4-12 years (the long-run trend has improved costs by 10x every 4 years, whereas recent trends have improved costs by 10x every 12 years)2. So the cost of the largest experiments is increasing by an order of magnitude every 1.1 – 1.4 years.3
The largest current experiment, AlphaGo Zero, probably cost about $10M.4
The largest that experiments can get depends who is performing them. The richest actor is probably the US government. Previously, the US spent 1% of annual GDP5 on the Manhattan Project, and ~0.5% of annual GDP on NASA during the Apollo program.6 So let’s suppose they could similarly spend at most 1% of GDP, or $200B, on one AI experiment. Given the growth of one order of magnitude per 1.1-1.4 years, and the initial experiment size of $10M, the AI-Compute trend predicts that we would see a $200B experiment in 5-6 years.7 So given a broadly similar economic situation to the present one, that would have to mark an end to the AI-Compute trend.
We can also consider how long the trend can last if government is not involved. Due to their smaller size, economic barriers hit a little sooner for private actors. The largest among these are tech companies: Amazon and Google have current research and development budgets of about ~20B/yr each8, so we can suppose that the largest individual experiment outside of government is $20B. Then the private sector can keep pace with the AI-Compute trend for around ¾ as long as government, or ~3.5-4.5 years.910
On the other hand, the development of specialized hardware could cheapen computation, and thereby cause the trend to be sustainable for a longer period. If some new hardware cheapened compute by 1000x over and above price-performance Moore’s Law, then the economic barriers bite a little later– after an extra 3-4 years.11
In order for the AI-Compute trend to be maintained for a really long time (more than about a decade), economic output would have to start growing by an order of magnitude or more per year. This is a really extreme scenario, but the main thing that would make it possible would presumably be some massive economic gains from some extremely powerful AI technology, that would also serve to justify the massive ongoing AI investment.
Of course, it’s important to be clear that these figures are upper bounds, and they do not preclude the possibility that the AI-Compute trend may halt sooner (e.g. if AI research proves less economically useful than expected) either in a sudden or more gradual fashion.
So we have shown one kind of conclusion from a rapid trend — that it cannot continue for very long, specifically, beyond 3.5-10 years.
2. When will the AI-Compute trend pass potentially AGI-relevant milestones?
The second conclusion that we can draw is that if the AI-Compute trend continues at its current rapid pace, it will pass some interesting milestones. If the AI-Compute trend continues for 3.5-10 more years, then the size of the largest experiment is projected to reach 107-5×1013 Petaflop/s-days, and so the question is which milestones arrive below that level.12 Which milestones might allow the development of AGI is a controversial topic, but three candidates are:
- The amount of compute required to simulate a human brain for the duration of a human childhood
- The amount of compute required to simulate a human brain to play the number of Go games Alphago Zero required to become superhuman
- The amount of compute required to simulate the evolution of the human brain
One natural guess for the amount of computation required to create artificial intelligence is the amount of computation used by the human brain. Suppose an AI had (compared to a human):
- a similarly efficient algorithm for learning to perform diverse tasks (with respect to both both compute and data),
- similar knowledge built in to its architecture,
- similar data, and
- enough computation to simulate a human brain running for eighteen years, at sufficient resolution to capture the intellectual performance of that brain.
Then, this AI should be able to learn to solve a similarly wide range of problems as an eighteen year-old can solve.13
There is a range of estimates for how many floating point operations per second are required to simulate a human brain for one second. Those collected by AI Impacts have a median of 1018 FLOPS (corresponding roughly to a whole-brain simulation using Hodgkin-Huxley neurons14), and ranging from 3×1013FLOPS (Moravec’s estimate) to 1×1025FLOPS (simulating the metabolome). Running such simulations for eighteen years would correspond to a median of 7 million Petaflop/s-days (range 200 – 7×1013 Petaflop/s-days).15
So for the shortest estimates, such as the Moravec estimate, we have already reached enough compute to pass the human-childhood milestone. For the median estimate, and the Hodgkin-Huxley estimates, we will have reached the milestone within 3.5 years. For the metabolome estimates, the required amount of compute cannot be reached within the coming ten year window before the AI-Compute trend is halted by economic barriers. After the AI-Compute trend is halted, it’s worth noting that Moore’s Law could come back to the fore, and cause the size of experiments to continue to slowly grow. But on Moore’s Law, milestones like the metabolome estimate are still likely decades away.
AlphaGo Zero-games milestone
One objection to the human-childhood milestone is that AI systems presently are “slower-learners” than humans. AlphaGo Zero used 2.5 million Go games to become superhuman16, which if each game took at hour, would correspond to 300 years of Go games17. We might ask how long it would take to run something as complex as the human brain, for 300 years, rather than just eighteen. In order for this milestone to be reached, the trend would have to continue for another 14 months longer than the human-childhood milestone18.
A more conservative milestone is the amount of compute required to simulate all neural evolution. One approach, described by Shulman and Bostrom 2012, is to look at the cost of simulating the evolution of nervous systems. This entails simulating 1025 neurons for one billion years.19 Shulman and Bostrom estimate the cost simulating a neuron for one second at 1-1010 floating point operations,20 and so the total cost for simulating evolution is 3×1021-3×1031 Petaflop/s-days21. This figure would not be reached until far beyond the time when the current AI-Compute trend must end. So the AI-Compute trend does not change the conclusion of Shulman and Bostrom that simulation of brain evolution on Earth is far away — even with a rapid increase in spending, this compute milestone would take many decades of advancement of Moore’s Law to be reached22.
Overall, we can see that although the brain-evolution milestone is well beyond the AI-Compute trend, the others are not necessarily. For some estimates — especially metabolome estimates — the human-childhood and AlphaGo Zero-games milestones cannot be reached either. But some of the human-childhood and AlphaGo Zero-games milestones will be reached if the AI Compute trend continues for the next few years.
3. Discussion and Limitations
In light of this analysis, a reasonable question to ask is: for the purpose of predicting AGI, which milestone should we care most about? This is very uncertain, but I would guess that building AGI is easier than the brain-evolution milestone would suggest, but that AGI could arrive either before, or after the AlphaGo Zero-games milestone is reached.
The first claim is because the brain-evolution milestone assumes that the process of algorithm discovery must be performed by the AI itself. It seems more likely to me that the appropriate algorithm is provided (or mostly provided) by the human designers at no computational cost (or at hardly any cost compared to simulating evolution).
The second matter — evaluating the difficulty of AGI relative to the AlphaGo Zero-games milestone — is more complex. One reason for thinking that the AlphaGo Zero-games milestone makes AGI look too easy is that more training examples ought to be required to teach general intelligence, than are required to learn the game of Go.23 In order to perform a wider range of tasks, it will be necessary to consider a larger range of dependencies and to learn a more intricate mapping from actions to utilities. This matter could be explored further by comparing the sample efficiency of various solved AI problems and extrapolating the sample efficiency of AGI based on how much more complicated general intelligence seems. However, there are also reasons the AlphaGo Zero-games milestone might make things look too hard. Firstly, AlphaGo Zero does not use any pre-existing knowledge, whereas AGI systems might. If we had looked instead at the original AlphaGo, this would have required an order of magnitude fewer games relative to AlphaGo Zero24, and further efficiency gains might be possible for more general learning tasks. Secondly, there might be one or more orders of magnitude of conservatism built-in to the approach of using simulations of the human brain. Simulating the human brain on current hardware may be a rather inefficient way to capture its computing function: that is, the human brain might only be using some fraction of the computation that is needed to simulate it. So it’s hard to judge whether the AlphaGo Zero-games milestone is too late or too soon for AGI.
There is another reason for some more assurance that AGI is more than six years away. We can simply look at the AI-Compute trend and ask ourselves: is AGI as close to AlphaGo Zero as AlphaGo Zero is to AlexNet? If we think that the difference (in terms of some combination of capabilities, compute, or AI research) between the first pair is larger than the second, then we should think that AGI is more than six years away.
In conclusion, we can see that the AI-Compute trend is an extraordinarily fast trend that economic forces (absent large increases in GDP) cannot sustain beyond 3.5-10 more years. Yet the trend is also fast enough that if it is sustained for even a few years from now, it will sweep past some compute milestones that could plausibly correspond to the requirements for AGI, including the amount of compute required to simulate a human brain thinking for eighteen years, using Hodgkin Huxley neurons. However, other milestones will not be reached before economic factors halt the AI-Compute trend. For example, this analysis shows that we will not have enough compute to simulate the evolution of the human brain for (at least) decades.
Thanks Jack Gallagher, Danny Hernandez, Jan Leike, and Carl Shulman for discussions that helped with this post.
- 3.5/(log10(2))/12 = 0.9689, the number of years over which each 10x increase in compute occurs.
- AI Impacts gives a recent-hardware-prices figure, and past quarter-century FLOPS/$ figure, and I use these as a range. “The cheapest hardware prices (for single precision FLOPS/$) appear to be falling by around an order of magnitude every 10-16 years. This rate is slower than the trend of FLOPS/$ observed over the past quarter century, which was an order of magnitude every 4 years. There is no particular sign of slowing between 2011 and 2017.” “Figure 4 shows the 95th percentile fits an exponential trendline quite well, with a doubling time of 3.7 years, for an order of magnitude every 12 years. This has been fairly consistent, and shows no sign of slowing by early 2017. This supports the 10-16 year time we estimated from the Wikipedia theoretical performance above.”
- 12/(12×.96889-1) = 1.129 years. 4/(4×.96889-1) = 1.391 years.
- $10M is a rough estimate of the costs of computation including pre-run experiments, based on two lines of evidence. First, the stated hardware costs. “During the development of Zero, Hassabis says the system was trained on hardware that cost the company as much as $35 million. The hardware is also used for other DeepMind projects.” If we allow for some non-AlphaGo usage of this hardware, and some other (e.g. electricity) costs for running AlphaGo, then the total compute cost for AlphaGo would be something like $10-30M. Second, the costs of the final training run, based on costs of public compute. If the TPU compute cost Google the $6.50/hr rate offered to the general public, then it would have cost $35M. Cloud compute is probably 10x more expensive than Google’s internal compute costs, so the cost of the final training run is probably around $3M, and the cost of the whole experiment something like $3-10M.
- Peak spending for the Manhattan project itself was around 1%, although total spending on nuclear weaponry in general was a little higher at 4.5% for several years following. “After getting fully underway in 1942, the Manhattan Project’s three-year cost of $2 billion (in 1940’s dollars) comprised nearly 1% of 1945 U.S. GDP. Extraordinary levels of spending and commitment of national resources to nuclear technology continued for many decades afterward. From 1947-1952, spending on nuclear weapons averaged 30% of total defense spending, which in 1952 was 15% of U.S. GDP.”
- NASA spending peaked at $4.5B/yr in 1966 (during the Apollo program), while US GDP was $815B https://history.nasa.gov/SP-4029/Apollo_18-16_Apollo_Program_Budget_Appropriations.htm
- 1.129×log10(200B/10M) = 4.86 years. log10(200B/10M)×1.391 = 5.98 years.
- Amazon and Google, the two largest players, have research and development budgets of $22.6B and $16.6B respectively. (Confirmed also from the primary source, “Research and development expenses 16,625 [in 2017]” on p36 of GOOG 10k).
- log10(200B/10M)/log10(200B/10M) = 0.767 times as long. 1.129×log10(2000)=3.727 years. 1.391×log10(2000) = 4.591 years.
- Greg Brockman of OpenAI recently said of the AI-Compute trend at a House Committee Hearing: “We expect this to continue for the next 5 years, using only today’s proven hardware technologies and not assuming any breakthroughs like quantum or optical.” This is a bold prediction according to my models. After five years, private organizations have already become unable to keep pace with the trend, and government would be unable to keep pace for more than another year.
- 1.129 years×log10(1000) = 3.387 years. 1.391×log10(1000) = 4.173 years.
- 2^((3.727×12)/3.5) = 7025-fold growth of experiment.
2^(((5.98 + 4.173)×12)/3.5) = 3.01×1010-fold growth of experiment.
- An implicit premise is that the amount of computation used by the human brain is less than the amount used to simulate the human brain. The idea here is that some fraction of the resources used to simulate a human brain are actually used for thinking.
- Table 9 of the WBE roadmap shows that a spiking neural network would use 1×1011 neurons. Table 7 shows that a Hodgkin-Huxley model would use about 1.2 million floating point operations to simulate a neuron for one second. Their product yields the 1×1018 figure.
- 3×1013×365×18 = 1.97×1017. 1×1018×365×18 = 6.57×1021. 1×1025×365×18 = 6.57×1028.
- From figure 3a, in the Alphago Zero paper, reinforcement learning surpassed AlphaGo Lee (which in turn defeated Lee Sedol) around halfway through a series of 4.9 million games of self-play. This analysis is from a guesstimate model by Median Group.
- 2.5×106 / 24 / 365 = 285 years
- 0.9689×log10(285/18) = 1.16
- From How Hard is Artificial Intelligence? “Erring on the side of conservatively high, if we assigned all 1019 insects fruit-fly numbers of neurons the total would be 1024 insect neurons in the world. This could be augmented with an additional order of magnitude, to reflect aquatic copepods, birds, reptiles, mammals, etc., to reach 1025”
- “The computational cost of simulating one neuron depends on the level of detail… Extremely simple neuron models use about 1,000 floating-point operations per second (FLOPS) to simulate one neuron (for one second of simulated time); an electrophysiologically realistic Hodgkin-Huxley model uses 1,200,000 FLOPS; a more detailed multicompartmental model would add another 3-4 orders of magnitude, while higher-level models that abstract systems of neurons could subtract 2-3 orders of magnitude from the simple models.” This range is 2-3 orders of magnitude lower than the per-neuron costs implied by the range of collated AI Impacts estimates for brain simulation.
- 1025×365×109×1 = 3.65×1036 FLOP/s-days. 1025×365×109×1010 = 3.65×1046 FLOP/s-days
- This only slightly shortens the timelines compared to Shulman and Bostrom’s remarks: “The computing resources to match historical numbers of neurons in straightforward simulation of biological evolution on Earth are severely out of reach, even if Moore’s law continues for a century. The argument from evolutionary algorithms depends crucially on the magnitude of efficiency gains from clever search, with perhaps as many as thirty orders of magnitude required.”
- Analogously to the “Difficulty ratio” in the guesstimate model by Median Group.
- AlphaGo used hundreds of thousands, rather than single-digit millions of games. From Mastering the game of Go with deep neural networks and tree search: “We trained the policy network to classify positions according to expert moves played in the KGS data set. This data set contains 29.4 million positions from 160,000 games played by KGS 6 to 9 dan human players; 35.4% of the games are handicap games… The policy network was trained in this way for 10,000 minibatches of 128 games, using 50 GPUs, for one day”