- 372 years (2392), based on responses collected in Robin Hanson’s informal 2012-2017 survey.
- 36 years (2056), based on all responses collected in the 2016 Expert Survey on Progress in AI.
- 142 years (2162), based on the subset of responses to the 2016 Expert Survey on Progress in AI who had been in their subfield for at least 20 years.
- 32 years (2052), based on the subset of responses to the 2016 Expert Survey on Progress in AI about progress in deep learning or machine learning as a whole rather than narrow subfields.

67% of respondents of the 2016 expert survey on AI and 44% of respondents who answered from Hanson’s informal survey said that progress was accelerating.

One way of estimating how many years something will take is to estimate what fraction of progress toward it has been made over a fixed number of years, then to extrapolate the number of years needed for full progress. As suggested by Robin Hanson,^{1} this method can provide an estimate for when human-level AI will be developed, if we have data on what fraction of progress toward human-level AI has been made and whether it is proceeding at a constant rate.

We know of two surveys that ask about fractional progress and acceleration in specific AI subfields: an informal survey conducted by Robin Hanson in 2012 – 2017, and our 2016 Expert Survey on Progress in AI. We use them to extrapolate progress to human-level AI, assuming that:

- AI progresses at the average rate that people have observed so far.
- Human-level AI will be achieved when the median subfield reaches human-level.

The naive extrapolation method described above assumes that AI progresses at the average rate that people have observed so far, but some respondents perceived acceleration or deceleration. If we guess that this change in the rate of the progress continues into the future, this suggests that a truer extrapolation of each person’s observations would place human-level performance in their subfield either before or after the naively extrapolated date.

Both surveys asked respondents about fractional progress in their subfields. Extrapolating out these estimates to get to human-level performance gives some evidence for when AGI may come, but is not a perfect proxy. It may turn out that we get human-level performance in a small number of subfields much earlier than others, such that we count the resulting AI as ‘AGI’, or it may be the case that certain subfields important to AGI do not exist yet.

Hanson’s survey informally asked ~15 AI experts to estimate how far we’ve come in their own subfield of AI research in the last twenty years, compared to how far we have to go to reach human level abilities. The subfields represented were analogical reasoning, knowledge representation, computer-assisted training, natural language processing, constraint satisfaction, robotic grasping manipulation, early-human vision processing, constraint reasoning, and “no particular subfield”. Three respondents said the rate of progress was staying the same, four said it was getting faster, two said it was slowing down, and six did not answer (or may not have been asked).

The naive extrapolations^{2} of the answers from Hanson’s survey give a median time from 2020 to human-level AI (HLAI) of 372 years (2392). See the survey data and our calculations here.

The 2016 Expert Survey on Progress in AI (2016 ESPAI) asked machine learning researchers which subfield they were in, how long they had been in their subfield, and what fraction of the remaining path to human-level performance (in their subfield) they thought had been traversed in that time.^{3} 107 out of 111 responses were used in our calculation.^{4} 42 subfields were reported, including “Machine learning”, “Graphical models”, “Speech recognition”, “Optimization”, “Bayesian Learning”, and “Robotics”.^{5} Notably, Hanson’s survey included subfields that weren’t represented in 2016 ESPAI, including analogic reasoning and knowledge representation. Since 2016 ESPAI was restricted to machine learning researchers, it may exclude non-machine-learning subfields that turn out to be important to fully human-level capabilities.

67% of all respondents said progress in their subfield was accelerating (see Figure 1). Most respondents said progress in their subfield was accelerating in each of the subsets we look at below (ML vs narrow subfield, and time in field).

Most respondents think progress is accelerating. If this acceleration continues, our naively extrapolated estimates below may be overestimates for time to human-level performance.

We calculated estimated years from 2020 until human-level subfield performance by naively extrapolating the reported fractions of the subfield already traversed.^{6} Figure 2 below shows the implied estimates for time until human-level performance for all respondents’ answers. These estimates give a median time from 2020 until HLAI of 36 years (2056).

Some respondents reported broad ‘subfields’, which encompassed all of machine learning, in particular “Machine learning” or “Deep learning”, while others reported narrow subfields, e.g. “Natural language processing” or “Robotics”. We split the survey data based on this subfield narrowness, guessing that progress on machine learning overall may be a better proxy for AGI overall. Among the 69 respondents who gave answers corresponding to the entire field of machine learning, the median implied time was 32 years (2052). Among the 70 respondents who gave narrow answers, the median implied time was 44 years (2064). Figures 3 and 4 show these estimates.

Figure 3: Implied estimates for human-level performance based on respondents who specified broad answers, e.g. “Machine learning” when asked about their subfield. The last three responses are above 1000 but have been cut off.

The median implied estimate until human-level performance for machine learning broadly was 12 years sooner than the median estimate for specific subfields. This is counter to what we might expect, if human-level performance in machine learning broadly implies human-level performance on each individual subfield.

Robin Hanson has suggested that his survey may get longer implied forecasts than 2016 ESPAI because he asks exclusively people who have spent at least 20 years in their field.^{7} Filtering for people who have spent at least 20 years in their field, we have eight responses, and get a median implied time until HLAI of 142 years from 2020 (2162). Filtering for people who have spent at least 10 years in their field, we have 38 responses, and get a median implied time of 86 years (2106). Filtering for people who have spent less than 10 years in their field, we have 69 responses, and get a median implied time of 24 years (2044). Figures 5, 6 and 7 show estimates for each respondent, for each of these classes of time in field.

The median implied estimate from 2020 until human-level performance suggested by responses from 2016 ESPAI (36 years) is an order of magnitude smaller than the one suggested by the Hanson survey (372 years). This appears to be at least partly explained by more experienced researchers giving responses that imply longer estimates. Hanson asks exclusively people who have spent at least 20 years in their subfield, whereas the 2016 survey does not filter based on experience. If we filter 2016 survey respondents for researchers who have spent at least 20 years in their subfield we instead get a median estimate of 142 years.

More experienced researchers may generate longer implied estimates because the majority of progress has happened recently– many people think progress accelerated, which is some evidence of this. It could also be that less-experienced researchers feel that progress is more significant than it actually is.

If AI research is accelerating and is going to continue accelerating until we get to human-level AI, the time to HLAI may be sooner than these estimates. If AI research is accelerating now but is not representative of what progress will look like in the future, longer naive estimates by more experienced researchers may be more appropriate.

2016 ESPAI also asked people to estimate time until human-level machine intelligence (HLMI) by asking them how many years they would give until a 50% chance of HLMI. The median answer for this question in 2016 was 40 years, or 36 years from 2020 (2056), exactly the same as the median answer of 36 years implied by extrapolating fractional progress. The survey also asked about time to HLMI in other ways, which yielded less consistent answers.

*Primary author: Asya Bergal*

We looked at Geekbench 5,^{1} a benchmark for CPU performance. We combined Geekbench’s multi-core scores on its ‘Processor Benchmarks’ page^{2} with release dates and prices that we scraped from Wikichip and Wikipedia.^{3} All our data and plots can be found here.^{4} We then calculated score per dollar and adjusted for inflation using the consumer price index.^{5} For every year, we calculated the 95th percentile score per dollar. We then fit linear and exponential trendlines to those scores.

Figure 1 shows all our data for Geekbench score per CPU price.

The data is well-described by a linear or an exponential trendline. Assuming an exponential trend,^{6} Geekbench score per CPU price grew by around 16% per year between 2006 and 2020, a rate that would yield a factor of ten every 16 years.^{7}

This is a markedly slower growth rate than those observed for CPU price performance trends in the past, however since it is for a different performance metric to any used earlier, it is unclear how similar one should expect them to be– from 1940 to 2008, Sandberg and Bostrom found that CPU price performance grew by a factor of ten every 5.6 years when measured in MIPS per dollar, and by a factor of ten every 7.7 years when measured in FLOPS per dollar.^{8}

*Primary author: Asya Bergal*

In 2014, we found conjectures referenced on Wikipedia, and recorded the dates that they were proposed and resolved, if they were resolved. We updated this list of conjectures in 2020, marking any whose status had changed. We then used a Kaplan-Meier estimator^{1} to approximate the survivorship function.^{2}

The results of this exercise are recorded here.^{3} Figure 1 below shows the survivorship function for the mathematical conjectures we found. The data is fit closely by an exponential function with a half-life of 117 years.^{4}

We are using resolution times for remembered conjectures as a proxy for resolution times for all conjectures. Resolution time for remembered conjectures might be biased in several ways: old conjectures are perhaps more likely to be remembered if they are solved than if they are not, very recently solved conjectures are probably more likely to be remembered (though this only matters because the rate of conjecture posing has probably changed over time), and conjectures that were especially hard to solve might also be more notable. The latter hundred years contains few data points, which makes it particularly easy for it to be inaccurate.

To the extent that open theoretical problems in AI are similar to math problems, time to solve math problems may be informative for forming a prior on time to solve AI problems.

*Corresponding author: Asya Bergal*

DRAM, “dynamic random-access memory”, is a type of semiconductor memory. It is used as the main memory in modern computers and graphic cards.^{1}

We found two sources for historic pricing of DRAM. One was a dataset of DRAM prices and sizes from 1957 to 2018 collected by technologist and retired Computer Science professor^{2} John C. McCallum.^{3} The other dataset was extracted from a graph generated by Objective Analysis,^{4} a group that sells “third-party independent market research and data” to investors in the semiconductor industry.^{5} We have not checked where their data comes from and don’t have evidence about whether they are a trustworthy source.

Figure 1 shows McCallum’s data.^{6}

Figure 2 shows the average price per gigabyte of DRAM from 1991 to 2019, according to the Objective Analysis graph.^{8}

The two datasets appear to line up (see Figure 3 below),^{9} though we don’t know where the data in the Objective Analysis report came from– it could itself be referencing the McCallum dataset, or both could share data sources.

For both sources, the data appears to follow an exponential trendline. In the McCallum dataset, we calculate that the price / GB of DRAM has fallen at around 36% per year, for a factor of ten every 5.1 years and a doubling time of 1.5 years on average. The Objective Analysis data is similar, with the price / GB of DRAM falling around 33% per year, for a factor of ten every 5.8 years and a doubling time of 1.7 years.

The 1.5 and 1.7 year doubling times are close to the rate at which Moore’s law observed that transistors in an integrated circuit double.^{10} It seems possible to us that cheaper and denser transistors following this law are what enabled the cheaper prices of DRAM, though we haven’t investigated this theory.^{11}

Both datasets show slower progress in recent years. From 2010 onwards, the McCallum dataset falls in price by only 15% a year, for a rate that would yield a factor of ten every 14 years, and the Objective Analysis dataset falls by 12% a year, for a rate that would yield a factor of ten every 18.5 years.

*Primary author: Asya Bergal*

- 17 years for single-precision FLOPS
- 10 years for half-precision FLOPS
- 5 years for half-precision fused multiply-add FLOPS

GPUs (graphics processing units) are specialized electronic circuits originally used for computer graphics.^{1} In recent years, they have been popularly used for machine learning applications.^{2} One measure of GPU performance is FLOPS, the number of operations on floating-point numbers a GPU can perform in a second.^{3} This page looks at the trends in GPU price / FLOPS of theoretical peak performance over the past 13 years. It does not include the cost of operating the GPUs, and it does not consider GPUs rented through cloud computing.

‘Theoretical peak performance’ numbers appear to be determined by adding together the theoretical performances of the processing components of the GPU, which are calculated by multiplying the clock speed of the component by the number of instructions it can perform per cycle.^{4} These numbers are given by the developer and may not reflect actual performance on a given application.^{5}

We collected data on multiple slightly different measures of GPU price and FLOPS performance.

GPU prices are divided into release prices, which reflect the manufacturer suggested retail prices that GPUs are originally sold at, and active prices, which are the prices at which GPUs are actually sold at over time, often by resellers.

We expect that active prices better represent prices available to hardware users, but collect release prices also, as supporting evidence.

Several varieties of ‘FLOPS’ can be distinguished based on the specifics of the operations they involve. Here we are interested in single-precision FLOPS, half-precision FLOPS, and half-precision fused-multiply add FLOPS.

‘Single-precision’ and ‘half-precision’ refer to the number of bits used to specify a floating point number.^{6} Using more bits to specify a number achieves greater precision at the cost of more computational steps per calculation. Our data suggests that GPUs have largely been improving in single-precision performance in recent decades,^{7} and half-precision performance appears to be increasingly popular because it is adequate for deep learning.^{8}

Nvidia, the main provider of chips for machine learning applications,^{9} recently released a series of GPUs featuring Tensor Cores,^{10} which claim to deliver “groundbreaking AI performance”. Tensor Core performance is measured in FLOPS, but they perform exclusively certain kinds of floating-point operations known as fused multiply-adds (FMAs).^{11} Performance on these operations is important for certain kinds of deep learning performance,^{12} so we track ‘GPU price / FMA FLOPS’ as well as ‘GPU price / FLOPS’.

In addition to purely half-precision computations, Tensor Cores are capable of performing mixed-precision computations, where part of the computation is done in half-precision and part in single-precision.^{13} Since explicitly mixed-precision-optimized hardware is quite recent, we don’t look at the trend in mixed-precision price performance, and only look at the trend in half-precision price performance.

Any GPU that performs multiple kinds of computations (single-precision, half-precision, half-precision fused multiply add) trades off performance on one for performance on the other, because there is limited space on the chip, and transistors must be allocated to either one type of computation or the other.^{14} All current GPUs that perform half-precision or TensorCore fused-multiply-add computations also do single-precision computations, so they are splitting their transistor budget. For this reason, our impression is that half-precision FLOPS could be much cheaper now if entire GPUs were allocated to each one alone, rather than split between them.

We collected data on theoretical peak performance (FLOPS), release date, and price from several sources, including Wikipedia.^{15} (Data is available in this spreadsheet). We found GPUs by looking at Wikipedia’s existing large lists^{16} and by Googling “popular GPUs” and “popular deep learning GPUs”. We included any hardware that was labeled as a ‘GPU’. We adjusted prices for inflation based on the consumer price index.^{17}

We were unable to find price and performance data for many popular GPUs and suspect that we are missing many from our list. In our search, we did not find any GPUs that beat our 2017 minimum of $0.03 (release price) / single-precision GFLOPS. We put out a $20 bounty on a popular Facebook group to find a cheaper GPU / FLOPS, and the bounty went unclaimed, so we are reasonably confident in this minimum.^{18}

Figure 1 shows our collected dataset for GPU price / single-precision FLOPS over time.^{19}

To find a clear trend for the prices of the cheapest GPUs / FLOPS, we looked at the running minimum prices every 10 days.^{20}

The cheapest GPU price / FLOPS hardware using release date pricing has not decreased since 2017. However there was a similar period of stagnation between early 2009 and 2011, so this may not represent a slowing of the trend in the long run.

Based on the figures above, the running minimums seem to follow a roughly exponential trend. If we do not include the initial point in 2007, (which we suspect is not in fact the cheapest hardware at the time), we get that the cheapest GPU price / single-precision FLOPS fell by around 17% per year, for a factor of ten in ~12.5 years.^{21}

Figure 3 shows GPU price / half-precision FLOPS for all the GPUs in our search above for which we could find half-precision theoretical performance.^{22}

Again, we looked at the running minimums of this graph every 10 days, shown in Figure 4 below.^{23}

If we assume an exponential trend with noise,^{24} cheapest GPU price / half-precision FLOPS fell by around 26% per year, which would yield a factor of ten after ~8 years.^{25}

Figure 5 shows GPU price / half-precision FMA FLOPS for all the GPUs in our search above for which we could find half-precision FMA theoretical performance.^{26} (Note that this includes all of our half-precision data above, since those FLOPS could be used for fused-multiply adds in particular). GPUs with TensorCores are marked in red.

Figure 6 shows the running minimums of GPU price / HP FMA FLOPS.^{27}

GPU price / Half-Precision FMA FLOPS appears to be following an exponential trend over the last four years, falling by around 46% per year, for a factor of ten in ~4 years.^{28}

GPU prices often go down from the time of release, and some popular GPUs are older ones that have gone down in price.^{29} Given this, it makes sense to look at active price data for the same GPU over time.

We collected data on peak theoretical performance in FLOPS from TechPowerUp^{30} and combined it with active GPU price data to get GPU price / FLOPS over time.^{31} Our primary source of historical pricing data was Passmark, though we also found a less trustworthy dataset on Kaggle which we used to check our analysis. We adjusted prices for inflation based on the consumer price index.^{32}

We scraped pricing data^{33} on GPUs between 2011 and early 2020 from Passmark.^{34} Where necessary, we renamed GPUs from Passmark to be consistent with TechPowerUp.^{35} The Passmark data consists of 38,138 price points for 352 GPUs. We guess that these represent most popular GPUs.

Looking at the ‘current prices’ listed on individual Passmark GPU pages, prices appear to be sourced from Amazon, Newegg, and Ebay. Passmark’s listed pricing data does not correspond to regular intervals. We don’t know if prices were pulled at irregular intervals, or if Passmark pulls prices regularly and then only lists major changes as price points. When we see a price point, we treat it as though the GPU is that price only at that time point, not indefinitely into the future.

The data contains several blips where a GPU is briefly sold very unusually cheaply. A random checking of some of these suggests to us that these correspond to single or small numbers of GPUs for sale, which we are not interested in tracking, because we are trying to predict AI progress, which presumably isn’t influenced by temporary discounts on tiny batches of GPUs.

This Kaggle dataset contains scraped data of GPU prices from price comparison sites PriceSpy.co.uk, PCPartPicker.com, Geizhals.eu from the years 2013 – 2018. The Kaggle dataset has 319,147 price points for 284 GPUs. Unfortunately, at least some of the data is clearly wrong, potentially because price comparison sites include pricing data from untrustworthy merchants.^{36} As such, we don’t use the Kaggle data directly in our analysis, but do use it as a check on our Passmark data. The data that we get from Passmark roughly appears to be a subset of the Kaggle data from 2013 – 2018,^{37} which is what we would expect if the price comparison engines picked up prices from the merchants Passmark looks at.

There are a number of reasons why we think this analysis may in fact not reflect GPU price trends:

- We effectively have just one source of pricing data, Passmark.
- Passmark appears to only look at Amazon, Newegg, and Ebay for pricing data.
- We are not sure, but we suspect that Passmark only looks at the U.S. versions of Amazon, Newegg, and Ebay, and pricing may be significantly different in other parts of the world (though we guess it wouldn’t be different enough to change the general trend much).
- As mentioned above, we are not sure if Passmark pulls price data regularly and only lists major price changes, or pulls price data irregularly. If the former is true, our data may be overrepresenting periods where the price changes dramatically.
- None of the price data we found includes quantities of GPUs which were available at that price, which means some prices may be for only a very limited number of GPUs.
- We don’t know how much the prices from these datasets reflect the prices that a company pays when buying GPUs in bulk, which we may be more interested in tracking.

A better version of this analysis might start with more complete data from price comparison engines (along the lines of the Kaggle dataset) and then filter out clearly erroneous pricing information in some principled way.

The original scraped datasets with cards renamed to match TechPowerUp can be found here. GPU price / FLOPS data is graphed on a log scale in the figures below. Price points for the same GPU are marked in the same color. We adjusted prices for inflation using the consumer price index.^{38} All points below are in 2019 dollars.

To try to filter out noisy prices that didn’t last or were only available in small numbers, we took out the lowest 5% of data in every several day period^{39} to get the 95th percentile cheapest hardware. We then found linear and exponential trendlines of best fit through the available hardware with the lowest GPU price / FLOPS every several days.^{40}

Figures 7-10 show the raw data, 95th percentile data, and trendlines for single-precision GPU price / FLOPS for the Passmark dataset. This folder contains plots of all our datasets, including the Kaggle dataset and combined Passmark + Kaggle dataset.^{41}

The cheapest 95th percentile data every 10 days appears to fit relatively well to both a linear and exponential trendline. However we assume that progress will follow an exponential, because previous progress has followed an exponential.

In the Passmark dataset, the exponential trendline suggested that from 2011 to 2020, 95th-percentile GPU price / single-precision FLOPS fell by around 13% per year, for a factor of ten in ~17 years,^{45} bootstrap^{46} 95% confidence interval 16.3 to 18.1 years.^{47} We believe the rise in price / FLOPS in 2017 corresponds to a rise in GPU prices due to increased demand from cryptocurrency miners.^{48} If we instead look at the trend from 2011 through 2016, before the cryptocurrency rise, we instead get that 95th-percentile GPU price / single-precision FLOPS price fell by around 13% per year, for a factor of ten in ~16 years.^{49}

This is slower than the order of magnitude every ~12.5 years we found when looking at release prices. If we restrict the release price data to 2011 – 2019, we get an order of magnitude decrease every ~13.5 years instead,^{50} so part of the discrepancy can be explained because of the different start times of the datasets. To get some assurance that our active price data wasn’t erroneous, we spot checked the best active price at the start of 2011, which was somewhat lower than the best release price at the same time, and confirmed that its given price was consistent with surrounding pricing data.^{51} We think active prices are likely to be closer to the prices at which people actually bought GPUs, so we guess that ~17 years / order of magnitude decrease is a more accurate estimate of the trend we care about.

Figures 11-14 show the raw data, 95th percentile data, and trendlines for half-precision GPU price / FLOPS for the Passmark dataset. This folder contains plots of the Kaggle dataset and combined Passmark + Kaggle dataset.

If we assume the trend is exponential, the Passmark trend seems to suggest that from 2015 to 2020, 95th-percentile GPU price / half-precision FLOPS of GPUs has fallen by around 21% per year, for a factor of ten over ~10 years,^{55} bootstrap^{56} 95% confidence interval 8.8 to 11 years.^{57} This is fairly close to the ~8 years / order of magnitude decrease we found when looking at release price data, but we treat active prices as a more accurate estimate of the actual prices at which people bought GPUs. As in our previous dataset, there is a noticeable rise in 2017, which we think is due to GPU prices increasing as a result of cryptocurrency miners. If we look at the trend from 2015 through 2016, before this rise, we get that 95th-percentile GPU price / half-precision FLOPS has fallen by around 14% per year, which would yield a factor of ten over ~8 years.^{58}

Figures 15-18 show the raw data, 95th percentile data, and trendlines for half-precision GPU price / FMA FLOPS for the Passmark dataset. GPUs with Tensor Cores are marked in black. This folder contains plots of the Kaggle dataset and combined Passmark + Kaggle dataset.

If we assume the trend is exponential, the Passmark trend seems to suggest the 95th-percentile GPU price / half-precision FMA FLOPS of GPUs has fallen by around 40% per year, which would yield a factor of ten in ~4.5 years,^{62} with a bootstrap^{63} 95% confidence interval 4 to 5.2 years.^{64} This is fairly close to the ~4 years / order of magnitude decrease we found when looking at release price data, but we think active prices are a more accurate estimate of the actual prices at which people bought GPUs.

The figures above suggest that certain GPUs with Tensor Cores were a significant (~half an order of magnitude) improvement over existing GPU price / half-precision FMA FLOPS.

We summarize our results in the table below.

Release Prices | 95th-percentile Active Prices | 95th-percentile Active Prices (pre-crypto price rise) | |

11/2007 – 1/2020 | 3/2011 – 1/2020 | 3/2011 – 12/2016 | |

$ / single-precision FLOPS | 12.5 | 17 | 16 |

9/2014 – 1/2020 | 1/2015 – 1/2020 | 1/2015 – 12/2016 | |

$ / half-precision FLOPS | 8 | 10 | 8 |

$ / half-precision FMA FLOPS | 4 | 4.5 | — |

Release price data seems to generally support the trends we found in active prices, with the notable exception of trends in GPU price / single-precision FLOPS, which cannot be explained solely by the different start dates.^{65} We think the best estimate of the overall trend for prices at which people recently bought GPUs is the 95th-percentile active price data from 2011 – 2020, since release price data does not account for existing GPUs becoming cheaper over time. The pre-crypto trends are similar to the overall trends, suggesting that the trends we are seeing are not anomalous due to cryptocurrency.

Given that, we guess that GPU prices as a whole have fallen at rates that would yield an order of magnitude over roughly:

- 17 years for single-precision FLOPS
- 10 years for half-precision FLOPS
- 5 years for half-precision fused multiply-add FLOPS

Half-precision FLOPS seem to have become cheaper substantially faster than single-precision in recent years. This may be a “catching up” effect as more of the space on GPUs was allocated to half-precision computing, rather than reflecting more fundamental technological progress.

*Primary author: Asya Bergal*

We are comparing naturally evolved and engineered solutions to problems, to learn about regularities that might let us make inferences about artificial intelligence from what we know about naturally evolved intelligence.

Engineers and evolution have faced many similar design problems. For instance, the problem of designing an efficient flying machine. Another instance of a design problem that engineers and evolution have both worked on is designing intelligent machines. We hope that by looking at other instances of engineers and evolution working on similar problems, we will be able to learn more about how future AI systems will compare to evolved intelligences.

We will collect examples of optimization problems that engineers and evolution would perform better on if they could. Here are some candidate examples of such problems:

- Flying
- Hovering
- Swimming
- Running
- Traveling long distances
- Traveling quickly
- Jumping
- Balancing
- Height of structure
- Piercing
- Applying compressive force
- Striking
- Tensile strength
- Pumping blood
- Breathing
- Liver function
- Detecting light
- Recording light
- Producing light
- Detecting sound
- Recording sound
- Producing sound
- Heat insulation
- Determining chemical composition of a substance
- Detecting chemical composition in the air
- Adhesiveness
- Picking heavy things up
- Joint activation
- Elasticity
- Toxicity
- Extracting energy from sunlight
- Storing energy

We will then collect the best solutions we can readily find to these design problems, made by human engineers and by evolution respectively, and quantitative data on their performances. We will try to collect this over time, for engineered solutions.

We will use the data to answer the following questions for different design problems:

- How long does it take engineers to half, match, double, triple, etc. the performance of evolution’s current best designs?
- What does the shape of engineers’ performance curve look like around the point where engineers’ solutions first match evolution’s?
- How efficient (in terms of performance per energy or mass used) are the first solutions that match evolution’s performance compared to evolution’s best solutions?
- How long does it take engineers to find a more efficient solution after finding an equally good solution in terms of absolute performance?
- From a design perspective, how similar are engineers’ first equally good solutions to evolution’s best solutions?

We will use patterns in the answers to these questions across technologies to make inferences about the answers for natural and artificial intelligence.

In general, the more similar the answers to these questions turn out to be across design problems, the more strongly we will expect the answers for problems addressed by future AI developments to fit the same patterns.

We expect to make the data publicly available, so that others can check our conclusions, investigate related questions, or use it in other investigations of technology and evolution.

Probability of HLMI | Group of survey respondents | ||

AI experts | Robotics experts | Non-experts | |

10% | 2035 | 2033 | 2026 |

50% | 2061 | 2065 | 2039 |

90% | 2109 | 2118 | 2060 |

Toby Walsh, professor of AI at the University of New South Wales and Technical University of Berlin, conducted a poll of AI experts, robotics experts, and non-experts from late January to early February 2017. The survey focused on the potential automation of various occupations and the arrival of high-level machine intelligence (HLMI).

There were 849 total survey respondents composing three separate groups: AI experts, robotics experts, and non-experts.

The AI experts consisted of 200 authors from two AI conferences: the 2015 meeting of the Association for the Advancement of AI (AAAI) and the 2011 International Joint Conference on AI (IJCAI).

The robotics experts consisted of 101 individuals who were either Fellows of the Institute for Electrical and Electronics Engineers (IEEE) Robotics & Automation Society or authors from the 2016 meeting of the IEEE Conference on Robotics & Automation (ICRA).

The non-experts consisted of 548 readers of an article about AI on the website The Conversation. While it seems data on their possible expertise in AI or robotics was not collected, Walsh writes that “it is reasonable to suppose that most are not experts in AI & robotics, and that they are unlikely to be publishing in the top venues in AI and robotics like IJCAI, AAAI or ICRA” (p. 635). Some additional demographic data was collected and reported (for this survey group only):

**Geographic distribution:**36% Australia, 29% United States, 7% United Kingdom, 4% Canada, and 24% rest of the world**Education:**85% have an undergraduate degree or higher**Age:**>33% are 34 or under, 59% are under 44, and 11% are 65 or older**Employment status:**>66% are employed and 25% are in or about to enter higher education**Income:**40% reported an annual income of >$100,000

The first seven survey questions (out of eight total) asked respondents to classify occupations as either at risk of automation in the next two decades or not (binary response). For each occupation, respondents were provided with information about the work involved and skills required. There were 70 total occupations, which came from a previous study that had used a machine learning (ML) classifier to rank them in terms of their risk for automation. These rankings were then used in the present survey: Each question had respondents classify 10 occupations, starting with the five most likely and five least likely at risk of automation according to the ML classifier. This continued through subsequent questions until respondents classified all 70 occupations.

The last survey question asked by what year there would be a 10%, 50%, and 90% chance of HLMI, which was defined as “when a computer might be able to carry out most human professions at least as well as a typical human” (p. 634). For each probability respondents chose from among eight options: 2025, 2030, 2040, 2050, 2075, 2100, After 2100, and Never. Median responses were calculated by interpolating the cumulative distribution function between the two nearest dates.

Table 1 below summarized the median responses and is reproduced here for convenience.

**Table 1**

Probability of HLMI | Group of survey respondents | ||

AI experts | Robotics experts | Non-experts | |

10% | 2035 | 2033 | 2026 |

50% | 2061 | 2065 | 2039 |

90% | 2109 | 2118 | 2060 |

Figures 1-3 below show the cumulative distribution functions (CDFs) for 10%, 50%, and 90% probability of HLMI (respectively) at different years.

**Figure 1**

**Figure 2**

**Figure 3**

Table 2 below contains descriptive statistics about the number of occupations (out of 70 total) classified as being at risk of automation in the next two decades. Confidence intervals (last column) are at the 95% level. It is unclear why the sample size for Non-experts is listed as 473 when earlier in the article the number reported is 548.

**Table 2**

The difference in means between the Robotics (29.0) and AI experts (31.1) was not statistically significant (two-sided t-test, p = 0.096), while the differences in means between both expert groups and the non-expert group (36.5) separately were significant (two-sided t-test, both p’s < 0.0001).

Table 3 below lists some of the largest differences in the proportion of experts (AI and robotics combined) compared to non-experts who classified occupations as at risk for automation.

**Table 3**

Occupation | Proportion of respondents predicting risk for automation | |

Experts | Non-experts | |

Economist | 12% | 39% |

Electrical engineer | 6% | 33% |

Technical writer | 31% | 54% |

Civil engineer | 6% | 30% |

Figure 4 below shows that respondents who predicted that HLMI would arrive earlier also classified more occupations as being at risk of automation (and vice versa).

**Figure 4**

Oren Etzioni, CEO of the Allen Institute for AI,^{1} reported on a survey in an MIT Tech Review article published on 20 Sep 2016.^{2} The rest of this article summarizes information from that source, except where noted.

In March 2016, on behalf of Etzioni, the American Association for AI (AAAI) sent out an anonymous survey to 193 of their Fellows (“individuals who have made significant, sustained contributions — usually over at least a ten-year period — to the field of artificial intelligence.”^{3}).

The survey contained one question:

“In his book, Nick Bostrom has defined Superintelligence as ‘an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills.’ When do you think we will achieve Superintelligence?”

It seems that responses were entered by selecting one of four categories^{4} although it is possible that they were entered as real numbers and then grouped.

There were 80 responses, for a response rate of 41%. They were:

- “In the next 10 years”: 0%
- “In the next 10-15 years”: 7.5%
- “In more than 25 years”: 67.5%
- “Never.”: 25%

Artificial Intelligence Index reports on this, from data they collected from conferences directly.^{1} We extended their spreadsheet to measure precise growth rates, and visualize the data differently (data here). They are missing 2018 data for IROS, so while we include it in Figure 1, we excluded it from the growth rate calculations.

From their spreadsheet we calculate:

Large conferences (>2000 2018 participants) total participants | 27,396 |

Small conferences (<2000 2018 participants) total participants | 4,754 |

This means large conferences have 5.9 times as many participants as smaller conferences.

According to this data, total large conference participation has grown by a factor 3.76 between 2011 and 2019, which is equivalent to a factor of 1.21 per year during that period.

Bradford DeLong has published estimates for historical world GDP, piecing together data on recent GDP, historical population estimates, and crude estimates for historical per capita GDP. We have not analyzed these estimates in depth, but they appear to be plausible. (Robin Hanson has expressed complaints with the population estimates from before 10,000 BC, but our overall conclusions do not seem to be sensitive to these estimates.)

The raw data produced by DeLong, together with log-scale graphs of that data, are available here (augmented with one data point for 2013 found in the CIA world factbook, population data from the US census bureau via Wikipedia, and the website usinflationcalculator). Note that brief periods of negative growth have not been indicated, and that we have used what DeLong refers to as “ex-nordhaus” data, neglecting quality-of-life adjustments arising from improvements in the diversity of goods.

The data suggest that (proportional) rates of economic and population growth increase roughly linearly with the size of the world economy and population. Certainly, a constant rate of growth is a poor model for the data, as growth rates range over 5 orders of magnitude; rather, the data appear to be consistent with substantially superlinear returns to scale, such that doubling the size of the world multiplies the absolute rate of growth by 2^{1.5} – 2^{1.75 }(as opposed to 2, which would be expected by exponential growth).

Extrapolating this model implies that at a time when the economy is growing 1% per year, growth will diverge to infinity after about 200 years. This outcome of course seems impossible, but this does suggest that the historical record is consistent with relatively large changes in growth rate, and in fact rates of economic growth experienced today are radically larger (even proportionally) than those experienced prior to the industrial revolution.

From around 0 to 500 CE, the predicted divergence occurs between 1700 and 2000, from 500 to 1000 CE it occurs around 2100, and from 1300 to 1950 it occurred in the later part of the 20th century.

In fact growth has fallen substantially behind this trend over the course of the 20th century; growth has continued but the acceleration of growth has slowed substantially (indeed reversing itself over the last 50 years). Moreover, it is unclear to us whether historically increasing returns to scale reflect returns to *economic* scale, or *population* scale, and if the latter then a profound slowdown seems likely–population growth rates seem to robustly fall at very high levels of development, and at any rate doubling times much shorter than 10-20 years would require radical changes in fertility patterns.^{1} That said, any such biologically contingent dynamics might be modified in a world where machine intelligence can substitute for human labor. Our impression is that this slowdown has been the subject of extensive inquiry by economists, but we have not reviewed this literature.

Overall, it seems unclear how much weight one should place on historical trends in predicting the future, and it seems unclear whether we should focus on very long-term trends of accelerating growth or short-term trends of stagnant growth (at least as measured by GDP). However, at a minimum it seems that extrapolation from history is consistent with extreme increases in the growth rate.

]]>