We know of thirteen surveys on the predicted timing of human-level AI. If we collapse a few slightly different meanings of ‘human-level AI’, then:
- Median estimates for when there will be a 10% chance of human-level AI are all in the 2020s (from seven surveys), except for the 2016 ESPAI, which found median estimates ranging from 2013 to long after 2066, depending on question framing.
- Median estimates for when there will be a 50% chance of human-level AI range between 2035 and 2050 (from seven surveys), except for the 2016 ESPAI, which found median estimates ranging from 2056 to at least 2106, depending on question framing.
- Of three surveys in recent decades asking for predictions but not probabilities, two produced median estimates of when human-level AI will arrive in the 2050s, and one in 2085.
Participants appear to mostly be experts in AI or related areas, but with a large contingent of others. Several groups of survey participants seem likely over-represent people who are especially optimistic about human-level AI being achieved soon.
List of surveys
These are the surveys that we know of on timelines to human-level AI:
- Michie (1973)
- FHI Winter Intelligence (2011)
- AI@50 (2006)
- AGI-09 (2009)
- Klein (2007)
- Hanson (2012 onwards)
- Kruel (2011-12)
- Bainbridge (2005)
- Müller and Bostrom: AGI-12, TOP100, EETN, PTAI (2012-2013)
- 2016 ESPAI (2016)
Ordered by year authors began surveying participants
|Year||Survey||#||10%||50%||90%||Other key ‘Predictions’||Participants||Response rate||Link to original document|
|1972||Michie||67||Median 50y (2022) (vs 20 or >50)||AI, CS||–||link|
|2006||AI@50||median >50y (2056)||AI conf||–||link|
|2007||Klein||888||median 2030-2050||Futurism?||–||link and link|
|2009||AGI-09||21||2020||2040||2075||AGI conf; AI||–||link|
|2011||FHI Winter Intelligence||35||2028||2050||2150||AGI impacts conf; 44% related technical||41%||link|
|2011-2012||Kruel interviews||37||2025||2035||2070||AGI, AI||–||link|
|2012||FHI: AGI||72||2022||2040||2065||AGI & AGI impacts conf; AGI, technical work||65%||link|
|2012||FHI:PT-AI||43||2023||2048||2080||Philosophy & theory of AI conf; not technical AI||49%||link|
|2012-present||Hanson||~10||≤ 10% progress to human level in past 20y||AI||–||link|
|2013||FHI: TOP100||29||2022||2040||2075||Top AI||29%||link|
|2013||FHI:EETN||26||2020||2050||2093||Greek assoc. for AI; AI||10%||link|
Time to a 10% chance and a 50% chance of human-level AI
The FHI Winter Intelligence, Müller and Bostrom, AGI-09, Kruel, and 2016 ESPAI surveys asked for years when participants expected 10%, 50% and 90% probabilities of human-level AI (or a similar concept). All of these surveys were taken between 2009 and 2012, except the 2016 ESPAI.
Survey participants’ median estimates for when there will be a 10% chance of human-level AI are all in the 2020s or 2030s. Until the 2016 ESPAI survey, median estimates for when there will be a 50% chance of human-level AI ranged between 2035 and 2050. The 2016 ESPAI asked about human-level AI using both very similar questions to previous surveys, and a different style of question based on automation of specific human occupations. The former questions found median dates of at least 2056, and the latter question prompted median dates of at least 2106.
Three surveys (Bainbridge, Klein, and AI@50) asked about predictions, rather than confidence levels. These produced median predictions of >2056 (AI@50), 2030-50 (Klein), and 2085 (Bainbridge). It is unclear how participants interpret the request to estimate when a thing will happen; these responses may mean the same as the 50% confidence estimate discussed above. These surveys together appear to contain a high density of people who don’t work in AI, compared to the other surveys.
Michie’s survey is unusual in being much earlier than the others (1972). In it, less than a third of participants expected human-level AI by 1992, another almost third estimated 2022, and the rest expected it later. Thus less than a third have been demonstrated wrong so far. Note also that the participants median expectation (50 years away) was further from their present time than those of contemporary survey participants. Both of these points conflict with a common perception that early AI predictions were shockingly optimistic, and quickly undermined.
Hanson’s survey is unusual in its methodology. Hanson informally asked some AI experts what fraction of the way to human-level capabilities we had come in 20 years, in their subfield. He also asked about apparent acceleration. Around half of answers were in the 5-10% range, and all except one which hadn’t passed human-level already were less than 10%. Of six who reported on acceleration, only one saw positive acceleration.
These estimates suggest human-level capabilities in most fields will take more than 200 years, if progress proceeds as it has (i.e. if we progress at 10% per twenty years, it will take 200 years to get to 100%). This estimate is quite different from those obtained from most of the other surveys.
The 2016 ESPAI attempted to replicate this methodology, and did not appear to find similarly long implied timelines, however little attention has been paid to analyzing that data.
This methodology is discussed more in the methods section below.
In assessing the quality of predictions, we are interested in the expertise of the participants, the potential for biases in selecting them, and the degree to which a group of well-selected experts generally tend to make good predictions. We will leave the third issue to be addressed elsewhere, and here describe the participants’ expertise and the surveys’ biases. We will see that the participants have much expertise relevant to AI, but – relatedly – their views are probably biased toward optimism because of selection effects as well as normal human optimism about projects.
Summary of participant backgrounds
The FHI (2011), AGI-09, and one of the four FHI collection surveys are from AGI (artificial general intelligence) conferences, so will tend to include a lot of people who work directly on trying to create human-level intelligence, and others who are enthusiastic or concerned about that project. At least two of the aforementioned surveys draw some participants from the ‘impacts’ section of the AGI conference, which is likely to select for people who think the effects of human-level intelligence are worth thinking about now.
Kruel’s participants are not from the AGI conferences, but around half work in AGI. Klein’s participants are not known, except they are acquaintances of a person who is enthusiastic about AGI (his site is called ‘AGI-world’). Thus many participants either do AGI research, or think about the topic a lot.
Many more participants are AI researchers from outside AGI. Hanson’s participants are experts in narrow AI fields. Michie’s participants are computer scientists working close to AI. Müller and Bostrom’s surveys of the top 100 artificial intelligence researchers, and Members of the Greek Association for Artificial Intelligence, would be almost entirely AI researchers, and there is little reason to expect them to be in AGI. AI@50 seems to include a variety of academics interested in AI rather than those in the narrow field of AGI, though also includes others, such as several dozen graduate and post-doctoral students. 2016 ESPAI is everyone publishing in two top machine learning conferences, so largely machine learning researchers.
The remaining participants appear to be mostly highly educated people from academia and other intellectual areas. The attendees at the 2011 Conference on Philosophy and Theory of AI appear to be a mixture of philosophers, AI researchers, and academics from related fields such as brain sciences. Bainbridge’s participants are contributors to ‘converging technology’ reports, on topics of nanotechnology, biotechnology, information technology, and cognitive science. From looking at what appears to be one of these reports, these seem to be mostly experts from government and national laboratories, academia, and the private sector. Few work in AI in particular. An arbitrary sample includes the Director of the Division of Behavioral and Cognitive Sciences at NSF, a person from the Defense Threat Reduction Agency, and a person from HP laboratories.
As noted above, many survey participants work in AGI – the project to create general intelligent agents, as opposed to narrow AI applications. In general, we might expect people working on a given project to be unusually optimistic about its success, for two reasons. First, those who are most optimistic initially will more likely find the project worth investing in. Secondly, people are generally observed to be especially optimistic about the time needed for their own projects to succeed. So we might expect AGI researchers to be biased toward optimism, for these reasons.
On the other hand, AGI researchers are working on projects most closely related to human-level AI, so probably have the most relevant expertise.
Other AI researchers
Just as AGI researchers work on topics closer to human-level AI than other AI researchers – and so may be more biased but also more knowledgeable – AI researchers work on more relevant topics than everyone else. Similarly, we might expect them to both be more accurate due to their additional expertise, but more biased due to selection effects and optimism about personal projects.
Hanson’s participants are experts in narrow AI fields, but are also reporting on progress in their own fields of narrow AI (rather than on general intelligence), so we might expect them to be more like the AGI researchers – especially expert and especially biased. On the other hand, Hanson asks about past progress rather than future expectations, which should diminish both the selection effect and the effect from the planning fallacy, so we might expect the bias to be weaker.
Definitions of human-level AI
A few different definitions of human-level AI are combined in this analysis.
The AGI-09 survey asked about four benchmarks; the one reported here is the Turing-test capable AI. Note that ‘Turing test capable’ seems to sometimes be interpreted as merely capable of holding a normal human discussion. It isn’t clear that the participants had the same definition in mind.
Kruel only asked that the AI be as good as humans at science, mathematics, engineering and programming, and asks conditional on favorable conditions continuing (e.g. no global catastrophes). This might be expected prior to fully human-level AI.
Even where people talk about ‘human-level’ AI, they can mean a variety of different things. For instance, it is not clear whether a machine must operate at human cost to be ‘human-level’, or to what extent it must resemble a human.
At least three surveys use the acronym ‘HLMI’, but it can stand for either ‘human-level machine intelligence’ or ‘high level machine intelligence’ and is defined differently in different surveys.
Here is a full list of exact descriptions of something like ‘human-level’ used in the surveys:
- Michie: ‘computing system exhibiting intelligence at adult human level’
- Bainbridge: ‘The computing power and scientific knowledge will exist to build
machines that are functionally equivalent to the human brain’
- Klein: ‘When will AI surpass human-level intelligence?’
- AI@50: ‘When will computers be able to simulate every aspect of human intelligence?’
- FHI 2011: ‘Assuming no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of human-level machine intelligence? Feel free to answer ‘never’ if you believe such a milestone will never be reached.’
- Müller and Bostrom: ‘[machine intelligence] that can carry out most human professions at least as well as a typical human’
- Hanson: ‘human level abilities’ in a subfield (wording is probably not consistent, given the long term and informal nature of the poll)
- AGI-09: ‘Passing the Turing test’
- Kruel: Variants on, ‘Assuming beneficial political and economic development and that no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of artificial intelligence that is roughly as good as humans (or better, perhaps unevenly) at science, mathematics, engineering and programming?’
- 2016 ESPAI (our emboldening):
- Say we have ‘high level machine intelligence’ when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.
- Say an occupation becomes fully automatable when unaided machines can accomplish it better and more cheaply than human workers. Ignore aspects of occupations for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.
- Say we have reached ‘full automation of labor’ “when all occupations are fully automatable. That is, when for any occupation, machines could be built to carry out the task better and more cheaply than human workers.”
Inside vs. outside view methods
Hanson’s survey was unusual in that it asked participants for their impressions of past rates of progress, from which extrapolation could be made (an ‘outside view’ estimate), rather than asking directly about expected future rates of progress (an ‘inside view’ estimate). It also produced much later median dates for human-level AI, suggesting that this outside view methodology in general produces much later estimates (rather than for instance, Hanson’s low sample size and casual format just producing a noisy or biased estimate that happened to be late).
If so, this would be important because outside view estimates in general are often informative.
However the 2016 ESPAI included a set of questions similar to Hanson’s, and did not at a glance find similarly long implied timelines, though the data has not been carefully analyzed. This is some evidence against the outside view style methodology systematically producing longer timelines, though arguably not enough to overturn the hypothesis.
We might expect Hanson’s outside view method to be especially useful in AI forecasting because a key merit is that asking people about the past means asking questions more closely related to their expertise, and the future of AI is arguably especially far from anyone’s expertise (relative to say asking a dam designer how long it will take for their dam to be constructed) . On the other hand, AI researchers’ expertise may include a lot of information about AI other than how far we have come, and translating what they have seen into what fraction of the way we have come may be difficult and thus introduce additional error.