We know of twelve surveys on the predicted timing of human-level AI. If we collapse a few slightly different meanings of ‘human-level AI’:
- Median estimates for when there will be a 10% chance of human-level AI are all in the 2020s (from seven surveys).
- Median estimates for when there will be a 50% chance of human-level AI range between 2035 and 2050 (from seven surveys)
- Of three surveys in recent decades asking for predictions but not probabilities, two produced median estimates of when human-level AI will arrive in the 2050s, and one in 2085.
- One small, informal survey asking about how far we have come rather than how far we have to go implies over a century until human-level AI, at odds with the other surveys.
Participants appear to mostly be experts in AI or related areas, but with a large contingent of others. Several groups of survey participants seem likely over-represent people who are especially optimistic about human-level AI being achieved soon.
List of surveys
These are the surveys that we know of on timelines to human-level AI:
- Michie (1973)
- FHI Winter Intelligence (2011)
- AI@50 (2006)
- AGI-09 (2009)
- Klein (2007)
- Hanson (2012 onwards)
- Kruel (2011-12)
- Bainbridge (2005)
- Müller and Bostrom: AGI-12, TOP100, EETN, PTAI (2012-2013)
Ordered by year authors began surveying participants
|Year||Survey||#||10%||50%||90%||Other key ‘Predictions’||Participants||Response rate||Link to original document|
|1972||Michie||67||Median 50y (2022) (vs 20 or >50)||AI, CS||–||link|
|2006||AI@50||median >50y (2056)||AI conf||–||link|
|2007||Klein||888||median 2030-2050||Futurism?||–||link and link|
|2009||AGI-09||21||2020||2040||2075||AGI conf; AI||–||link|
|2011||FHI Winter Intelligence||35||2028||2050||2150||AGI impacts conf; 44% related technical||41%||link|
|2011-2012||Kruel interviews||37||2025||2035||2070||AGI, AI||–||link|
|2012||FHI: AGI||72||2022||2040||2065||AGI & AGI impacts conf; AGI, technical work||65%||link|
|2012||FHI:PT-AI||43||2023||2048||2080||Philosophy & theory of AI conf; not technical AI||49%||link|
|2012-present||Hanson||~10||≤ 10% progress to human level in past 20y||AI||–||link|
|2013||FHI: TOP100||29||2022||2040||2075||Top AI||29%||link|
|2013||FHI:EETN||26||2020||2050||2093||Greek assoc. for AI; AI||10%||link|
Time to a 10% chance and a 50% chance of human-level AI
The FHI Winter Intelligence, Müller and Bostrom, AGI-09, and Kruel surveys asked for years when participants expected 10%, 50% and 90% probabilities of human-level AI (or a similar concept). All of these surveys were taken between 2009 and 2012.
Survey participants’ median estimates for when there will be a 10% chance of human-level AI are all in the 2020s. Their median estimates for when there will be a 50% chance of human-level AI range between 2035 and 2050. All but one median estimate for when there will be a 90% chance of human-level AI fell between 2065-2093, with the last one in 2150.
Three recent surveys (Bainbridge, Klein, and AI@50) asked about predictions, rather than confidence levels. These produced median predictions of >2056 (AI@50), 2030-50 (Klein), and 2085 (Bainbridge). It is unclear how participants interpret the request to estimate when a thing will happen; these responses may mean the same as the 50% confidence estimate discussed above. These surveys together appear to contain a high density of people who don’t work in AI, compared to the other surveys.
Michie’s survey is unusual in being much earlier than the others (1972). In it, less than a third of participants expected human-level AI by 1992, another almost third estimated 2022, and the rest expected it later. Thus less than a third have been demonstrated wrong so far. Note also that the participants median expectation (50 years away) was further from their present time than those of contemporary survey participants. Both of these points conflict with a common perception that early AI predictions were shockingly optimistic, and quickly undermined.
Hanson’s survey is unusual in its methodology. Hanson informally asked some AI experts what fraction of the way to human-level capabilities we had come in 20 years, in their subfield. He also asked about apparent acceleration. Around half of answers were in the 5-10% range, and all except one which hadn’t passed human-level already were less than 10%. Of six who reported on acceleration, only one saw positive acceleration.
These estimates suggest human-level capabilities in most fields will take more than 200 years, if progress proceeds as it has (i.e. if we progress at 10% per twenty years, it will take 200 years to get to 100%). This estimate is quite different from those obtained from the other surveys. This is discussed more in the methods section below.
In assessing the quality of predictions, we are interested in the expertise of the participants, the potential for biases in selecting them, and the degree to which a group of well-selected experts generally tend to make good predictions. We will leave the third issue to be addressed elsewhere, and here describe the participants’ expertise and the surveys’ biases. We will see that the participants have much expertise relevant to AI, but – relatedly – their views are probably biased toward optimism because of selection effects as well as normal human optimism about projects.
Summary of participant backgrounds
The FHI (2011), AGI-09, and one of the four FHI collection surveys are from AGI (artificial general intelligence) conferences, so will tend to include a lot of people who work directly on trying to create human-level intelligence, and others who are enthusiastic or concerned about that project. At least two of the aforementioned surveys draw some participants from the ‘impacts’ section of the AGI conference, which is likely to select for people who think the effects of human-level intelligence are worth thinking about now.
Kruel’s participants are not from the AGI conferences, but around half work in AGI. Klein’s participants are not known, except they are acquaintances of a person who is enthusiastic about AGI (his site is called ‘AGI-world’). Thus many participants either do AGI research, or think about the topic a lot.
Many more participants are AI researchers from outside AGI. Hanson’s participants are experts in narrow AI fields. Michie’s participants are computer scientists working close to AI. Müller and Bostrom’s surveys of the top 100 artificial intelligence researchers, and Members of the Greek Association for Artificial Intelligence, would be almost entirely AI researchers, and there is little reason to expect them to be in AGI. AI@50 seems to include a variety of academics interested in AI rather than those in the narrow field of AGI, though also includes others, such as several dozen graduate and post-doctoral students.
The remaining participants appear to be mostly highly educated people from academia and other intellectual areas. The attendees at the 2011 Conference on Philosophy and Theory of AI appear to be a mixture of philosophers, AI researchers, and academics from related fields such as brain sciences. Bainbridge’s participants are contributors to ‘converging technology’ reports, on topics of nanotechnology, biotechnology, information technology, and cognitive science. From looking at what appears to be one of these reports, these seem to be mostly experts from government and national laboratories, academia, and the private sector. Few work in AI in particular. An arbitrary sample includes the Director of the Division of Behavioral and Cognitive Sciences at NSF, a person from the Defense Threat Reduction Agency, and a person from HP laboratories.
As noted above, many survey participants work in AGI – the project to create general intelligent agents, as opposed to narrow AI applications. In general, we might expect people working on a given project to be unusually optimistic about its success, for two reasons. First, those who are most optimistic initially will more likely find the project worth investing in. Secondly, people are generally observed to be especially optimistic about the time needed for their own projects to succeed. So we might expect AGI researchers to be biased toward optimism, for these reasons.
On the other hand, AGI researchers are working on projects most closely related to human-level AI, so probably have the most relevant expertise.
Other AI researchers
Just as AGI researchers work on topics closer to human-level AI than other AI researchers – and so may be more biased but also more knowledgeable – AI researchers work on more relevant topics than everyone else. Similarly, we might expect them to both be more accurate due to their additional expertise, but more biased due to selection effects and optimism about personal projects.
Hanson’s participants are experts in narrow AI fields, but are also reporting on progress in their own fields of narrow AI (rather than on general intelligence), so we might expect them to be more like the AGI researchers – especially expert and especially biased. On the other hand, Hanson asks about past progress rather than future expectations, which should diminish both the selection effect and the effect from the planning fallacy, so we might expect the bias to be weaker.
Definitions of human-level AI
A few different definitions of human-level AI are combined in this analysis.
The AGI-09 survey asked about four benchmarks; the one reported here is the Turing-test capable AI. Note that ‘Turing test capable’ seems to sometimes be interpreted as merely capable of holding a normal human discussion. It isn’t clear that the participants had the same definition in mind.
Kruel only asked that the AI be as good as humans at science, mathematics, engineering and programming, and asks conditional on favorable conditions continuing (e.g. no global catastrophes). This might be expected prior to fully human-level AI.
Even where people talk about ‘human-level’ AI, they can mean a variety of different things. For instance, it is not clear whether a machine must operate at human cost to be ‘human-level’, or to what extent it must resemble a human.
Here is a full list of exact descriptions of something like ‘human-level’ used in the surveys:
- Michie: ‘computing system exhibiting intelligence at adult human level’
- Bainbridge: ‘The computing power and scientific knowledge will exist to build
machines that are functionally equivalent to the human brain’
- Klein: ‘When will AI surpass human-level intelligence?’
- AI@50: ‘When will computers be able to simulate every aspect of human intelligence?’
- FHI 2011: ‘Assuming no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of human-level machine intelligence? Feel free to answer ‘never’ if you believe such a milestone will never be reached.’
- Müller and Bostrom: ‘[machine intelligence] that can carry out most human professions at least as well as a typical human’
- Hanson: ‘human level abilities’ in a subfield (wording is probably not consistent, given the long term and informal nature of the poll)
- AGI-09: ‘Passing the Turing test’
- Kruel: Variants on, ‘Assuming beneficial political and economic development and that no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of artificial intelligence that is roughly as good as humans (or better, perhaps unevenly) at science, mathematics, engineering and programming?’
Inside vs. outside view methods
As noted above, estimates made from Hanson’s survey are quite different from those obtained from the other surveys. The main difference in methodology between them is that Hanson asks about progress to date, and other surveys ask about expected future progress.
It is not obvious which type of methodology should be more reliable. On one hand, outside view estimates – extrapolating from past experience – are often considered more accurate than direct (‘inside view’) estimates of future performance. Hanson’s method might also have the merit that it asks people questions more closely related to their expertise. On the other hand, AI researchers’ expertise may include a lot of information about AI other than how far we have come, and translating what they have seen into what fraction of the way we have come may be difficult and thus introduce additional error.