Automation of music production

Most machine learning researchers expect machines will be able to create top quality music by 2036.

Details

Evidence from survey data

In the 2016 ESPAI, participants were asked two relevant questions:

[Top forty] Compose a song that is good enough to reach the US Top 40. The system should output the complete song as an audio file.

[Taylor] Produce a song that is indistinguishable from a new song by a particular artist, e.g. a song that experienced listeners can’t distinguish from a new song by Taylor Swift.

Summary results

Answers were as follows, suggesting these milestones are likely to be reached in ten years, and quite likely to be reached in twenty years.

10 years 20 years 50 years
Top forty 27.5% 50% 90%
Taylor 60% 75% 99%
10% 50% 90%
Top forty 5 years 10 years 20 years
Taylor 5 years 10 years 20 years

Distributions of answers to Taylor question

The three figures below show how respondents were spread between different answers over time, for the respondents who answered the ‘fixed years’ framing.

10

20

50

2016 ESPAI questions printout

This page is a printout of the survey questions provided by the Qualtrics website as a word document, and then copied here, for searchability. It contains formatting differences with the survey as received by participants, and probably typographic errors, due to the importing process. It also contains only parts of the randomization logic, while missing other parts. The survey questions are available as a pdf here.

Printout

16-05-17 AI Survey 12 – final

 

consent_tim Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

consent      2016 Expert Survey on Progress in AI   Welcome. We are conducting a study of progress in artificial intelligence and are interested in your understanding of developments in the field.   Our estimated median time for completing this survey is 12 minutes. Your responses will be kept confidential.   Many of the questions involve substantial uncertainties. Please just give us your current best guesses.   There are no known risks associated with this study. Although this study may not benefit you personally, we hope that our results will add to the knowledge about progress in AI technology. If you have questions about your rights as a research participant, you may contact the Yale University Human Subjects Committee: 203-785-4688, human.subjects@yale.edu.    Additional information is available at: http://www.yale.edu/hrpp/participants/index.html Participation in this study is completely voluntary. You are free to decline to participate and to end participation at any time for any reason. By continuing to the next page, you agree to participate in the survey.

 

hb_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

hb_def   1 of 7  The following questions ask about ‘high–level machine intelligence’ (HLMI).   Say we have ‘high-level machine intelligence’ when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.

 

Display This Question:

If fixedprobabilities Is Equal to 1

hb_a For the purposes of this question, assume that human scientific activity continues without major negative disruption. How many years until you expect:

  (1)
a 10% probability of HLMI existing? (1)
a 50% probability of HLMI existing? (2)
a 90% probability of HLMI existing? (3)

 

 

Display This Question:

If fixedprobabilities Is Equal to 0

hb_b For the purposes of this question, assume that human scientific activity continues without major negative disruption. How likely is it that HLMI exists:

in 10 years? (1)

in 20 years? (2)

in 40 years? (3)

 

Display This Question:

If random Is Greater Than 20

And random Is Less Than or Equal to 30

hb_comment Do you have any comments on your interpretation of this question? (optional)

 

Display This Question:

If random Is Greater Than 30

And random Is Less Than or Equal to 40

hb_consider Which considerations were important in your answers to this question? (optional)

 

hj_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

hj_a_jobs 1 of 7   Say an occupation becomes fully automatable when unaided machines can accomplish it better and more cheaply than human workers. Ignore aspects of occupations for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.    We want to know how many years you think will pass before the following present-day occupations will be fully automatable. Please tell us your best guess of when you think there will be a small chance (10% chance), a roughly even chance (50% chance), and a high chance (90% chance).

Years until small chance (10%) (1) Years until even chance (50%) (2) Years until high chance (90%) (3)
Truck driver (hj_a_jobs_1)
Surgeon (hj_a_jobs_2)
Retail salesperson (hj_a_jobs_3)
AI researcher (hj_a_jobs_4)

 

 

hj_a_final What is an existing human occupation that you think will be among the final ones to be fully automatable? Remember to consider feasibility, not adoption.

 

hj_a_final_pred How many years do you expect to pass before you think there is a small/even/high chance that this occupation will be fully automatable?

Small chance (10%) (1)

Even chance (50%) (2)

High chance (90%) (3)

 

hj_a_full Say we have reached ‘full automation of labor’ when all occupations are fully automatable. That is, when for any occupation, machines could be built to carry out the task better and more cheaply than human workers.In how many years do you expect full automation of labor, with small/even/high chance?

Small chance (10%) (1)

Even chance (50%) (2)

High chance (90%) (3)

 

Display This Question:

If random Is Greater Than 20

And random Is Less Than or Equal to 30

hj_a_comment Do you have any comments on your interpretation of these questions? (optional)

 

Display This Question:

If random Is Greater Than 30

And random Is Less Than or Equal to 40

hj_a_consider Which considerations were important in your answers to these questions? (optional)

 

hj_b_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

hj_b_jobs   1 of 7   Say an occupation becomes fully automatable when unaided machines can accomplish it better and more cheaply than human workers. Ignore aspects of occupations for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.    We want to know how likely you think it is that the following present-day occupations will be fully automatable at future dates. Please tell us your best guess of the chance that they will be fully automatable within the next 10 years, within the next 20 years, and within the next 50 years.

% chance in 10 years (1) % chance in 20 years (2) % chance in 50 years (3)
Truck driver (hj_b_jobs_1)
Surgeon (hj_b_jobs_2)
Retail salesperson (hj_b_jobs_3)
AI researcher (hj_b_jobs_4)

 

 

hj_b_final What is an existing human occupation that you think will be among the final ones to be fully automatable? Remember to consider feasibility, not adoption.

 

hj_b_final_pred How likely do you think it is that this occupation will be fully automatable within the next 10/20/50 years?

10 years (1)

20 years (2)

50 years (3)

 

hj_b_full Say we have reached ‘full automation of labor’ when all occupations are fully automatable. That is, when for any occupation, machines could be built to carry out the task better and more cheaply than human workers. How likely do you think it is that full automation of labor will happen within the next 10/20/50 years?

10 years (1)

20 years (2)

50 years (3)

 

Display This Question:

If random Is Greater Than 20

And random Is Less Than or Equal to 30

hj_b_comment Do you have any comments on your interpretation of these questions? (optional)

 

Display This Question:

If random Is Greater Than 30

And random Is Less Than or Equal to 40

hj_b_consier Which considerations were important in your answers to these questions? (optional)

 

ie_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

ie_time_def   2 of 7   The following questions ask about ‘high–level machine intelligence’ (HLMI).     Say we have ‘high-level machine intelligence’ when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.

 

ie_1 Assume that HLMI will exist at some point. How likely do you then think it is that the rate of global technological improvement will dramatically increase (e.g. by a factor of ten) as a result of machine intelligence:

Within two years of that point? (1)

Within thirty years of that point? (2)

 

ie_2 Assume that HLMI will exist at some point. How likely do you think it is that there will be machine intelligence that is vastly better than humans at all professions (i.e. that is vastly more capable or vastly cheaper):

Within two years of that point? (1)

Within thirty years of that point? (2)

 

ie_3 Some people have argued the following:   If AI systems do nearly all research and development, improvements in AI will accelerate the pace of technological progress, including further progress in AI.   Over a short period (less than 5 years), this feedback loop could cause technological progress to become more than an order of magnitude faster.   How likely do you find this argument to be broadly correct?

  • Quite unlikely (0-20%) (5)
  • Unlikely (21-40%) (4)
  • About even chance (41-60%) (3)
  • Likely (61-80%) (2)
  • Quite likely (81-100%) (1)

 

Display This Question:

If random Is Greater Than 40

And random Is Less Than 50

ie_comment Do you have any comments on your interpretation of these questions? (optional)

 

Display This Question:

If random Is Greater Than 50

And random Is Less Than 60

ie_consider Which considerations were important in your answers to these questions? (optional)

 

vb_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

vb_def   3 of 7   The following questions ask about ‘high–level machine intelligence’ (HLMI).   Say we have ‘high-level machine intelligence’ when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.

 

vb_1 Assume for the purpose of this question that HLMI will at some point exist. How positive or negative do you expect the overall impact of this to be on humanity, in the long run? Please answer by saying how probable you find the following kinds of impact, with probabilities adding to 100%:

______ Extremely good (e.g. rapid growth in human flourishing) (1)

______ On balance good (2)

______ More or less neutral (3)

______ On balance bad (4)

______ Extremely bad (e.g. human extinction) (5)

 

Display This Question:

If random Is Greater Than 60

And random Is Less Than 70

vb_comment Do you have any comments on your interpretation of this question? (optional)

 

Display This Question:

If random Is Greater Than 70

And random Is Less Than 80

vb_consider Which considerations were important in your answers to this question? (optional)

 

c_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

c_def 4 of 7   The next questions ask about the sensitivity of progress in AI capabilities to changes in inputs.    ‘Progress in AI capabilities’ is an imprecise concept, so we are asking about progress as you naturally conceive of it, and looking for approximate answers.

 

c_1 Imagine that over the past decade, only half as much researcher effort had gone into AI research. For instance, if there were actually 1,000 researchers, imagine that there had been only 500 researchers (of the same quality).How much less progress in AI capabilities would you expect to have seen? e.g. If you think progress is linear in the number of researchers, so 50% less progress would have been made, write ’50’. If you think only 20% less progress would have been made write ’20’.

 

c_2 Over the last 10 years the cost of computing hardware has fallen by a factor of 20. Imagine instead that the cost of computing hardware had fallen by only a factor of 5 over that time (around half as far on a log scale).   How much less progress in AI capabilities would you expect to have seen? e.g. If you think progress is linear in 1/cost, so that 1-5/20=75% less progress would have been made, write ’75’. If you think only 20% less progress would have been made write ’20’.

 

c_3 Imagine that over the past decade, there had only been half as much effort put into increasing the size and availability of training datasets. For instance, perhaps there are only half as many datasets, or perhaps existing datasets are substantially smaller or lower quality.How much less progress in AI capabilities would you expect to have seen? e.g. If you think 20% less progress would have been made, write ‘20’

 

c_4 Imagine that over the past decade, AI research had half as much funding (in both academic and industry labs). For instance, if the average lab had a budget of $20 million each year, suppose their budget had only been $10 million each year.  How much less progress in AI capabilities would you expect to have seen? e.g. If you think 20% less progress would have been made, write ‘20’

 

c_5 Imagine that over the past decade, there had been half as much progress in AI algorithms. You might imagine this as conceptual insights being half as frequent.  How much less progress in AI capabilities would you expect to have seen? e.g. If you think 20% less progress would have been made, write ‘20’

 

Display This Question:

If random Is Greater Than 80

And random Is Less Than or Equal to 90

c_comment Do you have any comments on your interpretation of these questions? (optional)

 

Display This Question:

If random Is Greater Than 90

And random Is Less Than or Equal to 100

c_consider Which considerations were important in your answers to these question? (optional)

 

hh_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

hh_area 4 of 7   Which AI research area have you worked in for the longest time?

 

hh_howlong How long have you worked in this area?

 

hh_1 Consider three levels of progress or advancement in this area:   A. Where the area was when you started working in it B. Where it is now C. Where it would need to be for AI software to have roughly human level abilities at the tasks studied in this area   What fraction of the distance between where progress was when you started working in the area (A) and where it would need to be to attain human level abilities in the area (C) have we come so far (B)?

 

hh_2 Divide the period you have worked in the area into two halves: the first and the second. In which half was the rate of progress in your area higher?

  • The first half (1)
  • The second half (2)
  • They were about the same (3)

 

Display This Question:

If random Is Greater Than 20

And random Is Less Than 30

hh_comment Do you have any comments on your interpretation of these questions? (optional)

 

Display This Question:

If random Is Greater Than 30

And random Is Less Than 40

hh_consider Which considerations were important in your answers to these questions? (optional)

 

ms_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

ms_1 4 of 7   To what extent do you think you disagree with the typical AI researcher about when HLMI will exist?

  • A lot (17)
  • A moderate amount (18)
  • Not much (19)

 

ms_2 If you disagree, why do you think that is?

 

ms_3 To what extent do you think people’s concerns about future risks from AI are due to misunderstandings of AI research?

  • Almost entirely (1)
  • To a large extent (2)
  • Somewhat (4)
  • Not much (3)
  • Hardly at all (5)

 

ms_4 What do you think are the most important misunderstandings, if there are any?

 

Display This Question:

If random Is Greater Than 80

And random Is Less Than or Equal to 90

ms_comment Do you have any comments on your interpretation of these questions? (optional)

 

Display This Question:

If random Is Greater Than 90

And random Is Less Than or Equal to 100

ms_consider Which considerations were important in your answers to these questions? (optional)

 

ta_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

ta_def 5 of 7   How many years until you think the following AI tasks will be feasible with:     a small chance (10%)? an even chance (50%)? a high chance (90%)?   Let a task be ‘feasible’ if one of the best resourced labs could implement it in less than a year if they chose to. Ignore the question of whether they would choose to.      Tasks

 

ta_1 Translate a text written in a newly discovered language into English as well as a team of human experts, using a single other document in both languages (like a Rosetta stone). Suppose all of the words in the text can be found in the translated document, and that the language is a difficult one.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_2 Translate speech in a new language given only unlimited films with subtitles in the new language. Suppose the system has access to training data for other languages, of the kind used now (e.g. same text in two languages for many languages and films with subtitles in many languages).

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_3 Perform translation about as good as a human who is fluent in both languages but unskilled at translation, for most types of text, and for most popular languages (including languages that are known to be difficult, like Czech, Chinese and Arabic).

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_4 Provide phone banking services as well as human operators can, without annoying customers more than humans. This includes many one-off tasks, such as helping to order a replacement bank card or clarifying how to use part of the bank website to a customer.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_5 Correctly group images of previously unseen objects into classes, after training on a similar labeled dataset containing completely different classes. The classes should be similar to the ImageNet classes.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_6 One-shot learning: see only one labeled image of a new object, and then be able to recognize the object in real world scenes, to the extent that a typical human can (i.e. including in a wide variety of settings). For example, see only one image of a platypus, and then be able to recognize platypuses in nature photos. The system may train on labeled images of other objects.   Currently, deep networks often need hundreds of examples in classification tasks1, but there has been work on one-shot learning for both classification2 and generative tasks3.   1 Lake et al. (2015). Building Machines That Learn and Think Like People 2 Koch (2015). Siamese Neural Networks for One-Shot Image Recognition 3 Rezende et al. (2016). One-Shot Generalization in Deep Generative Models

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_7 See a short video of a scene, and then be able to construct a 3D model of the scene good enough to create a realistic video of the same scene from a substantially different angle.For example, constructing a short video of walking through a house from a video taking a very different path through the house.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_8 Transcribe human speech with a variety of accents in a noisy environment as well as a typical human can.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_9 Take a written passage and output a recording that can’t be distinguished from a voice actor, by an expert listener.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_10 Routinely and autonomously prove mathematical theorems that are publishable in top mathematics journals today, including generating the theorems to prove.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_11 Perform as well as the best human entrants in the Putnam competition—a math contest whose questions have known solutions, but which are difficult for the best young mathematicians.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_12 Defeat the best Go players, training only on as many games as the best Go players have played.     For reference, DeepMind’s AlphaGo has probably played a hundred million games of self-play, while Lee Sedol has probably played 50,000 games in his life1.     1 Lake et al. (2015). Building Machines That Learn and Think Like People

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_13 Beat the best human Starcraft 2 players at least 50% of the time, given a video of the screen.   Starcraft 2 is a real time strategy game characterized by:   Continuous time play Huge action space Partial observability of enemies Long term strategic play, e.g. preparing for and then hiding surprise attacks.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_14 Play a randomly selected computer game, including difficult ones, about as well as a human novice, after playing the game less than 10 minutes of game time. The system may train on other games.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_15 Play new levels of Angry Birds better than the best human players. Angry Birds is a game where players try to efficiently destroy 2D block towers with a catapult. For context, this is the goal of the IJCAI Angry Birds AI competition1.     1 aibirds.org

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_16 Outperform professional game testers on all Atari games using no game-specific knowledge. This includes games like Frostbite, which require planning to achieve sub-goals and have posed problems for deep Q-networks1, 2.     1 Mnih et al. (2015). Human-level control through deep reinforcement learning 2 Lake et al. (2015). Building Machines That Learn and Think Like People

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_17 Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.   For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games1, but used hundreds of hours of play to train2.   1 Mnih et al. (2015). Human-level control through deep reinforcement learning 2 Lake et al. (2015). Building Machines That Learn and Think Like People

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_18 Fold laundry as well and as fast as the median human clothing store employee.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_19 Beat the fastest human runners in a 5 kilometer race through city streets using a bipedal robot body.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_20 Physically assemble any LEGO set given the pieces and instructions, using non-specialized robotics hardware.   For context, Fu 20161 successfully joins single large LEGO pieces using model based reinforcement learning and online adaptation.   1 Fu et al. (2016). One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_21 Learn to efficiently sort lists of numbers much larger than in any training set used, the way Neural GPUs can do for addition1, but without being given the form of the solution.   For context, Neural Turing Machines have not been able to do this2, but Neural Programmer-Interpreters3 have been able to do this by training on stack traces (which contain a lot of information about the form of the solution).   1 Kaiser & Sutskever (2015). Neural GPUs Learn Algorithms   2 Zaremba & Sutskever (2015). Reinforcement Learning Neural Turing Machines   3 Reed & de Freitas (2015). Neural Programmer-Interpreters

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_22 Write concise, efficient, human-readable Python code to implement simple algorithms like quicksort. That is, the system should write code that sorts a list, rather than just being able to sort lists.   Suppose the system is given only:   A specification of what counts as a sorted list Several examples of lists undergoing sorting by quicksort

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_23 Answer any “easily Googleable” factoid questions posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.   Examples of factoid questions:     “What is the poisonous substance in Oleander plants?” “How many species of lizard can be found in Great Britain?”

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_24 Answer any “easily Googleable” factual but open ended question posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.   Examples of open ended questions:     “What does it mean if my lights dim when I turn on the microwave?” “When does home insurance cover roof replacement?”

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_25 Give good answers in natural language to factual questions posed in natural language for which there are no definite correct answers. For example:”What causes the demographic transition?”, “Is the thylacine extinct?”, “How safe is seeing a chiropractor?”

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_26 Write an essay for a high-school history class that would receive high grades and pass plagiarism detectors. For example answer a question like ‘How did the whaling industry affect the industrial revolution?’

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_27 Compose a song that is good enough to reach the US Top 40. The system should output the complete song as an audio file.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_28 Produce a song that is indistinguishable from a new song by a particular artist, e.g. a song that experienced listeners can’t distinguish from a new song by Taylor Swift.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_29 Write a novel or short story good enough to make it to the New York Times best-seller list.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_30 For any computer game that can be played well by a machine, explain the machine’s choice of moves in a way that feels concise and complete to a layman.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_31 Play poker well enough to win the World Series of Poker.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

ta_32 After spending time in a virtual world, output the differential equations governing that world in symbolic form.For example, the agent is placed in a game engine where Newtonian mechanics holds exactly and the agent is then able to conduct experiments with a ball and output Newton’s laws of motion.

small chance (10%) (1)

even chance (50%) (2)

high chance (90%) (3)

 

Display This Question:

If random Is Greater Than 0

And random Is Less Than 10

ta_comment Do you have any comments on your interpretation of these questions? (optional)

 

Display This Question:

If random Is Greater Than 10

And random Is Less Than 20

ta_consider Which considerations were important in your answers to these questions? (optional)

 

tb_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

tb_def 5 of 7   How likely do you think it is that the following AI tasks will be feasible within the next:     10 years? 20 years? 50 years?   Let a task be ‘feasible’ if one of the best resourced labs could implement it in less than a year if they chose to. Ignore the question of whether they would choose to.      Tasks

 

tb_1 Translate a text written in a newly discovered language into English as well as a team of human experts, using a single other document in both languages (like a Rosetta stone). Suppose all of the words in the text can be found in the translated document, and that the language is a difficult one.

10 years (1)

20 years (2)

50 years (3)

 

tb_2 Translate speech in a new language given only unlimited films with subtitles in the new language. Suppose the system has access to training data for other languages, of the kind used now (e.g. same text in two languages for many languages and films with subtitles in many languages).

10 years (4)

20 years (5)

50 years (6)

 

tb_3 Perform translation about as good as a human who is fluent in both languages but unskilled at translation, for most types of text, and for most popular languages (including languages that are known to be difficult, like Czech, Chinese and Arabic).

10 years (1)

20 years (2)

50 years (3)

 

tb_4 Provide phone banking services as well as human operators can, without annoying customers more than humans. This includes many one-off tasks, such as helping to order a replacement bank card or clarifying how to use part of the bank website to a customer.

10 years (1)

20 years (2)

50 years (3)

 

tb_5 Correctly group images of previously unseen objects into classes, after training on a similar labeled dataset containing completely different classes. The classes should be similar to the ImageNet classes.

10 years (1)

20 years (2)

50 years (3)

 

tb_6 One-shot learning: see only one labeled image of a new object, and then be able to recognize the object in real world scenes, to the extent that a typical human can (i.e. including in a wide variety of settings). For example, see only one image of a platypus, and then be able to recognize platypuses in nature photos. The system may train on labeled images of other objects.   Currently, deep networks often need hundreds of examples in classification tasks1, but there has been work on one-shot learning for both classification2 and generative tasks3.   1 Lake et al. (2015). Building Machines That Learn and Think Like People 2 Koch (2015). Siamese Neural Networks for One-Shot Image Recognition 3 Rezende et al. (2016). One-Shot Generalization in Deep Generative Models

10 years (1)

20 years (2)

50 years (3)

 

tb_7 See a short video of a scene, and then be able to construct a 3D model of the scene that is good enough to create a realistic video of the same scene from a substantially different angle.For example, constructing a short video of walking through a house from a video taking a very different path through the house.

10 years (1)

20 years (2)

50 years (3)

 

tb_8 Transcribe human speech with a variety of accents in a noisy environment as well as a typical human can.

10 years (1)

20 years (2)

50 years (3)

 

tb_9 Take a written passage and output a recording that can’t be distinguished from a voice actor, by an expert listener.

10 years (1)

20 years (2)

50 years (3)

 

tb_10 Routinely and autonomously prove mathematical theorems that are publishable in top mathematics journals today, including generating the theorems to prove.

10 years (1)

20 years (2)

50 years (3)

 

tb_11 Perform as well as the best human entrants in the Putnam competition—a math contest whose questions have known solutions, but which are difficult for the best young mathematicians.

10 years (1)

20 years (2)

50 years (3)

 

tb_12 Defeat the best Go players, training only on as many games as the best Go players have played.     For reference, DeepMind’s AlphaGo has probably played a hundred million games of self-play, while Lee Sedol has probably played 50,000 games in his life1.     1 Lake et al. (2015). Building Machines That Learn and Think Like People

10 years (1)

20 years (2)

50 years (3)

 

tb_13 Beat the best human Starcraft 2 players at least 50% of the time, given a video of the screen.   Starcraft 2 is a real time strategy game characterized by:   Continuous time play Huge action space Partial observability of enemies Long term strategic play, e.g. preparing for and then hiding surprise attacks.

10 years (1)

20 years (2)

50 years (3)

 

tb_14 Play a randomly selected computer game, including difficult ones, about as well as a human novice, after playing the game less than 10 minutes of game time. The system may train on other games.

10 years (1)

20 years (2)

50 years (3)

 

tb_15 Play new levels of Angry Birds better than the best human players. Angry Birds is a game where players try to efficiently destroy 2D block towers with a catapult. For context, this is the goal of the IJCAI Angry Birds AI competition1.     1 aibirds.org

10 years (1)

20 years (2)

50 years (3)

 

tb_16 Outperform professional game testers on all Atari games using no game-specific knowledge. This includes games like Frostbite, which require planning to achieve sub-goals and have posed problems for deep Q-networks1, 2.     1 Mnih et al. (2015). Human-level control through deep reinforcement learning 2 Lake et al. (2015). Building Machines That Learn and Think Like People

10 years (1)

20 years (2)

50 years (3)

 

tb_17 Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.   For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games1, but used hundreds of hours of play to train2.   1 Mnih et al. (2015). Human-level control through deep reinforcement learning 2 Lake et al. (2015). Building Machines That Learn and Think Like People

10 years (1)

20 years (2)

50 years (3)

 

tb_18 Fold laundry as well and as fast as the median human clothing store employee.

10 years (1)

20 years (2)

50 years (3)

 

tb_19 Beat the fastest human runners in a 5 kilometer race through city streets using a bipedal robot body.

10 years (1)

20 years (2)

50 years (3)

 

tb_20 Physically assemble any LEGO set given the pieces and instructions, using non-specialized robotics hardware.   For context, Fu 20161 successfully joins single large LEGO pieces using model based reinforcement learning and online adaptation.   1 Fu et al. (2016). One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors

10 years (1)

20 years (2)

50 years (3)

 

tb_21 Learn to efficiently sort lists of numbers much larger than in any training set used, the way Neural GPUs can do for addition1, but without being given the form of the solution.   For context, Neural Turing Machines have not been able to do this2, but Neural Programmer-Interpreters3 have been able to do this by training on stack traces (which contain a lot of information about the form of the solution).   1 Kaiser & Sutskever (2015). Neural GPUs Learn Algorithms   2 Zaremba & Sutskever (2015). Reinforcement Learning Neural Turing Machines   3 Reed & de Freitas (2015). Neural Programmer-Interpreters

10 years (1)

20 years (2)

50 years (3)

 

tb_22 Write concise, efficient, human-readable Python code to implement simple algorithms like quicksort. That is, the system should write code that sorts a list, rather than just being able to sort lists.   Suppose the system is given only:   A specification of what counts as a sorted list Several examples of lists undergoing sorting by quicksort

10 years (1)

20 years (2)

50 years (3)

 

tb_23 Answer any “easily Googleable” factoid questions posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.   Examples of factoid questions:     “What is the poisonous substance in Oleander plants?” “How many species of lizard can be found in Great Britain?”

10 years (1)

20 years (2)

50 years (3)

 

tb_24 Answer any “easily Googleable” factual but open ended question posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.   Examples of open ended questions:     “What does it mean if my lights dim when I turn on the microwave?” “When does home insurance cover roof replacement?”

10 years (1)

20 years (2)

50 years (3)

 

tb_25 Give good answers in natural language to factual questions posed in natural language for which there are no definite correct answers. For example:”What causes the demographic transition?”, “Is the thylacine extinct?”, “How safe is seeing a chiropractor?”

10 years (1)

20 years (2)

50 years (3)

 

tb_26 Write an essay for a high-school history class that would receive high grades and pass plagiarism detectors. For example answer a question like ‘How did the whaling industry affect the industrial revolution?’

10 years (1)

20 years (2)

50 years (3)

 

tb_27 Compose a song that is good enough to reach the US Top 40. The system should output the complete song as an audio file.

10 years (1)

20 years (2)

50 years (3)

 

tb_28 Produce a song that is indistinguishable from a new song by a particular artist, e.g. a song that experienced listeners can’t distinguish from a new song by Taylor Swift.

10 years (1)

20 years (2)

50 years (3)

 

tb_29 Write a novel or short story good enough to make it to the New York Times best-seller list.

10 years (1)

20 years (2)

50 years (3)

 

tb_30 For any computer game that can be played well by a machine, explain the machine’s choice of moves in a way that feels concise and complete to a layman.

10 years (1)

20 years (2)

50 years (3)

 

tb_31 Play poker well enough to win the World Series of Poker.

10 years (1)

20 years (2)

50 years (3)

 

tb_32 After spending time in a virtual world, output the differential equations governing that world in symbolic form.For example, the agent is placed in a game engine where Newtonian mechanics holds exactly and the agent is then able to conduct experiments with a ball and output Newton’s laws of motion.

10 years (1)

20 years (2)

50 years (3)

 

Display This Question:

If random Is Greater Than 0

And random Is Less Than 10

tb_comment Do you have any comments on your interpretation of these questions? (optional)

 

Display This Question:

If random Is Greater Than 10

And random Is Less Than 20

tb_consider Which considerations were important in your answers to these questions? (optional)

 

sq_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

sq_def 6 of 7   Stuart Russell summarizes an argument for why highly advanced AI might pose a risk as follows:   The primary concern [with highly advanced AI] is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken […]. Now we have a problem:   1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down. 2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.   A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.  This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.

 

sq_1 Do you think this argument points at an important problem?

  • No, not a real problem. (1)
  • No, not an important problem. (2)
  • Yes, a moderately important problem. (3)
  • Yes, a very important problem. (5)
  • Yes, among the most important problems in the field. (4)

 

aq_2 How valuable is it to work on this problem today, compared to other problems in AI?

  • Much less valuable (1)
  • Less valuable (2)
  • As valuable as other problems (3)
  • More valuable (4)
  • Much more valuable (5)

 

sq_3 How hard do you think this problem is compared to other problems in AI?

  • Much easier (1)
  • Easier (2)
  • As hard as other problems (3)
  • Harder (4)
  • Much harder (5)

 

sq_comment Do you have any comments on your interpretation of this question? (optional)

 

Display This Question:

If random Is Greater Than 90

And random Is Less Than or Equal to 100

sq_consider Which considerations were important in your answers to this question? (optional)

 

sr_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

sr_def 6 of 7   Let ‘AI safety research’ include any AI-related research that, rather than being primarily aimed at improving the capabilities of AI systems, is instead primarily aimed at minimizing potential risks of AI systems (beyond what is already accomplished for those goals by increasing AI system capabilities).   Examples of AI safety research might include:   Improving the human-interpretability of machine learning algorithms for the purpose of improving the safety and robustness of AI systems, not focused on improving AI capabilities Research on long-term existential risks from AI systems  AI-specific formal verification research Policy research about how to maximize the public benefits of AI

 

sr_1 How much should society prioritize AI safety research, relative to how much it is currently prioritized?

  • Much less (1)
  • Less (2)
  • About the same (3)
  • More (4)
  • Much more (5)

 

Display This Question:

If random Is Greater Than 80

And random Is Less Than or Equal to 90

sr_comment Do you have any comments on your interpretation of this question? (optional)

 

Display This Question:

If random Is Greater Than 90

And random Is Less Than or Equal to 100

sr_consider Which considerations were important in your answer to this question? (optional)

 

dem_time Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

dem_1 7 of 7   How much thought have you given in the past to when HLMI (or something similar) will be developed?

  • Very little. e.g. “I can’t remember thinking about this.” (1)
  • A little. e.g. “It has come up in conversation a few times” (2)
  • A moderate amount. e.g. “I read something about it now and again” (3)
  • A lot. e.g. “I have thought enough to have my own views on the topic” (4)
  • A great deal. e.g. “This has been a particular interest of mine” (5)

 

dem_2 How much thought have you given in the past to social impacts of smarter-than-human machines?

  • Very little. e.g. “I can’t remember thinking about this.” (1)
  • A little. e.g. “It has come up in conversation a few times” (2)
  • A moderate amount. e.g. “I read something about it now and again” (3)
  • A lot. e.g. “I have thought enough to have my own views on the topic” (4)
  • A great deal. e.g. “This has been a particular interest of mine” (5)

 

dem_3 Are you an AI researcher?

  • Yes (4)
  • No (5)

 

dem_4 What are your main areas of research?

 

dem_5 Where do you work?

  • Industry (1)
  • Academia (2)
  • Other (3)

 

Q203 Timing

First Click (1)

Last Click (2)

Page Submit (3)

Click Count (4)

 

end Thank you for contributing to the Expert Survey on Progress in AI!   We will send you output from this research as it becomes available. We are sending every 10th person $250 as an expression of gratitude for completing the survey. We are a group of researchers interested in measuring and understanding AI progress and its implications. If you are interested in learning more about this research, please click here or email us.     Katja Grace, Machine Intelligence Research Institute John Salvatier, Machine Intelligence Research Institute Allan Dafoe, Yale University Baobao Zhang, Massachusetts Institute of Technology

 

print Would you like to be recognized in print as an expert participant in this survey?

  • Yes (1)
  • No (2)

 

autojobs A group from Oxford University will soon conduct a survey on Automation and Jobs. It will explore some of the topics in this survey in more depth. Would you like to receive an invitation to participate in it?

  • Yes (1)
  • No (2)

 

comment_box If you have any questions or comments for us, please feel free to share them below.

 

Media discussion of 2016 ESPAI

The 2016 Expert Survey on Progress in AI was discussed in at least 20 media outlets, popular blogs, and industry-specific sites that we know of. Most of them were summaries of the survey findings. Commonly emphasized details included the difference between Asian and North American estimates, the 5% chance of catastrophically bad outcomes (usually described as low), and likely effects on the economy in the event of widespread automation.

List of media mentions

  1. Newsweek: A major general news outlet. Listed many shorter-term estimates from the survey, and discussed practical responses to inevitable automation, such as universal basic income.
  2. Yahoo: A major general news outlet. Cautiously reported the survey’s predictions, with the caveat that predictions about distant technology are often not very precise. Mentioned policy recommendations from Katja.
  3. BBC: A major British news channel. Described the current state of AI’s capabilities, including a detailed summary of modern clothes-folding robots. Discussed the survey’s predictions in this context. (Reposted by OECD Forum.)
  4. Daily Mail: A major British news outlet. Summarized the study, paying special attention to number of years estimated for AI to overtake humans on specific tasks. Considered the 5% chance of a catastrophically bad outcome to be relatively good news.
  5. MIT Technology Review: A technology review at MIT. Offered a critical review of the survey’s results, including the theory that ‘40 years’ can be a red flag in terms of estimates, because it can symbolize the approximate end of a surveyed person’s working life. (We do not find evidence to support this theory.) (Reposted by Business Insider.)
  6. Business Insider: The second article from about the AI survey from this major business news outlet. Summarized the study findings, including a graph showing estimated times that AI will overtake humans at a wide range of tasks.
  7. Daily Kos: A highly prominent American liberal group blog. Provided an in-depth summary of the survey, its political ramifications, and potential solutions to the problem of massively increased automation in the workforce.
  8. ZDNet: A business technology news website. Reported the study’s findings, with special emphasis into the case of Go, and the mitigating circumstances of AlphaGo’s win – AlphaGo had played many more games than any human opponent, while the survey question about Go included the assumption that the human and AI players would have played an equal number of practice games. Included a brief summary video.
  9. The Register: A British tech publication. Mentioned both predictions from the survey and the high variance between estimates, including specifically named researchers.
  10. Leading Britain’s Conversation: A blog by Nick Abbot, a presenter for a British radio station. Covered the study’s results in a humorous, informal tone.
  11. Tech Republic: A technology blog. Described the survey findings in detail, including a video. Included three takeaways at the end: that there is a 50% chance of AI exceeding human capability at all jobs within 45 years, that there is a 50% chance of AI automating all jobs within 122 years, and that global catastrophic risk is possible from AI and should be guarded against.
  12. Daily News and Analysis: An Indian news source. Addressed the practical needs for India to modernize its jobs and stay ahead of the curve as AI makes current jobs obsolete.
  13. CTV News: A Canadian news outlet. Listed estimates for AI to beat humans at a range of specific tasks, and briefly summarized the chances for AI’s overall impact to be good or bad.
  14. 2OceansVibe: A large solely owned South African media site. Described the study in detail, including a note on which sorts of jobs may not be automated even though they could be. (Reposted by Nigeria Today.)
  15. Tekniikan Maailma: A Finnish technology news source. Focused on the predicted 5% chance of a catastrophic outcome as good news, and went into detail on reasons why AI might not be too dangerous – including that any given current AI’s capacities are very specific. Also touched on possibilities for transhumanism involving AI.
  16. New Scientist: A significant science news site. Noted the high chance of major social impact, but considered the 5% chance of catastrophic outcomes less alarming. Focused somewhat on recent achievements of AI, as well as future projections. Included a brief summary video.
  17. Slate Star Codex: A popular blog which frequently posts about AI. Emphasized the high variance between surveyed opinions, and the fact that many prominent AI researchers now support AI safety research.
  18. Slator: A linguistics industry website. Mentioned the survey’s predictions with respect to language translation specifically.
  19. Fossbytes: A general technology blog. Reported the survey’s findings in brief.
  20. Tech XPlore: An outlet for science writing. A lengthy, technical summary of the survey, which gave special notice to the difference between Asian and North American responses.
  21. Nature World News: A general natural sciences website. Mentioned the survey’s results, as well as Elon Musk’s belief in a faster AI timeline.
  22. Axios: An online media outlet with a “Future of Work” section. Described the study in brief, with emphasis on the risks entailed by AI ascendance.
  23. Human Cusp: A blog focusing on possibilities for superintelligent AI. Touched on the special case of Go, and how AI in fact surpassed humans much faster than AI researchers predicted.

 

Guide to pages on AI timeline predictions

This page is an informal outline of the other pages on this site about AI timeline predictions made by others. Headings link to higher level pages, intended to summarize the evidence from pages below them. This list was complete on 7 April 2017 (here is a category that may contain newer entries, though not conveniently organized).

Guide

Topic synthesis: AI timeline predictions as evidence (page)

The predictions themselves:

—from surveys (page):
  1. 2016 Expert survey on progress in AI: our own survey, results forthcoming.
  2. Müller and Bostrom AI Progress Poll: the most recent survey with available results, including 29 of the most cited AI researchers as participants.
  3. Hanson AI Expert Survey: in which researchers judge fractional progress toward human-level performance over their careers, in a series of informal conversations.
  4. Kruel AI survey: in which experts give forecasts and detailed thoughts, interview style.
  5. FHI Winter Intelligence Survey: in which impacts-concerned AGI conference attendees forecast AI in 2011.
  6. AGI-09 Survey: in which AGI conference attendees forecast various human-levels of AI in 2009.
  7. Klein AGI survey: in which a guy with a blog polls his readers.
  8. AI@50 survey: in which miscellaneous conference goers are polled informally.
  9. Bainbridge Survey: in which 26 expert technologists expect human-level AI in 2085 and give it a 5.6/10 rating on benefit to humanity.
  10. Michie Survey: in which 67 AI and CS researchers are not especially optimistic in the ‘70s.
—from public statements:
  1. MIRI AI predictions dataset: a big collection of public predictions gathered from the internet.
—from written analyses (page), for example:
  1. The Singularity is Near: in which a technological singularity is predicted in 2045, based on when hardware is extrapolated to compute radically more than human minds in total.
  2. The Singularity Isn’t Near: in which it is countered that human-level AI requires software as well as hardware, and none of the routes to producing software will get there by 2045.
  3. (Several others are listed in the analyses page above, but do not have their own summary pages.)

On what to infer from the predictions

Some considerations regarding accuracy and bias (page):
  1. Contra a common view that past AI forecasts were unreasonably optimistic, AI predictions look fairly similar over time, except a handful of very early somewhat optimistic ones.
  2. The Maes Garreau Law claims that people tend to predict AI near the end of their own expected lifetime. It is not true.
  3. We expect publication biases to favor earlier forecasts.
  4. Predictions made in surveys seem to be overall a bit later than those made in public statements, (maybe because surveys prevent some publication biases).
  5. People who are inclined toward optimism about AI are more likely to become AI researchers, leading to a selection bias from optimistic experts.
  6. We know of some differences in forecasts made by different groups.

Blog posts on these topics:

Progress in general purpose factoring

The largest number factored to date grew by about 4.5 decimal digits per year over the past roughly half-century. Between 1988, when we first have good records, and 2009, when the largest number to date was factored, progress was roughly 6 decimal digits per year.

Progress was relatively smooth during the two decades for which we have good records, with half of new records being less than two years after the previous record, and the biggest advances between two records representing about five years of prior average progress.

Computing hardware used in record-setting factorings increased ten-thousand-fold over the records (roughly in line with falling computing prices), and we are unsure how much of overall factoring progress is due to this.

Support

Background

To factor an integer N is to find two integers l and m such that l*m = N. There is no known efficient algorithm for the general case of this problem. While there are special purpose algorithms for some kinds of numbers with particular forms, here we are interested in ‘general purpose‘ factoring algorithms. These factor any composite number with running time only dependent on the size of that number.1

Factoring numbers large enough to break records frequently takes months or years, even with a large number of computers. For instance, RSA-768, the largest number to be factored to date, had 232 decimal digits and was factored over multiple years ending in 2009, using the equivalent of almost 2000 years of computing on a single 2.2 GHz AMD Opteron processor with 2GB RAM.2

It is important to know what size numbers can be factored with current technology, because factoring large numbers is central to cryptographic security schemes such as RSA.3 Much of the specific data we have on progress in factoring is from the RSA Factoring Challenge: a contest funded by RSA Laboratories, offering large cash prizes for factoring various numbers on their list of 54 (the ‘RSA Numbers‘).4 Their numbers are semiprimes (i.e. each has only two prime factors), and each number has between 100 and 617 decimal digits.5

The records collected on this page are for factoring specific large numbers, using algorithms whose performance depends on nothing but the scale of the number (general purpose algorithms). So a record for a specific N-digit number does not imply that that was the first time any N-digit number was factored. However if it is an important record, it strongly suggests that it was difficult to factor arbitrary N-digit numbers at the time, so others are unlikely to have been factored very much earlier, using general purpose algorithms. For a sense of scale, our records here include at least eight numbers that were not the largest ever factored at the time yet appear to have been considered difficult to factor, and many of those were factored five to seven years after the first time a number so large was factored. For numbers that are the first of their size in our records, we expect that they mostly are the first of that size to have been factored, with the exception of early records.6

Once a number of a certain size has been factored, it will still take years before numbers at that scale can be cheaply factored. For instance, while 116 digit numbers had been factored by 1991 (see below), it was an achievement in 1996 to factor a different specific 116 digit number, which had been on a ‘most wanted list’.

If we say a digit record is broken in some year (rather than a record for a particular number), we mean that that is the first time that any number with at least that many digits has been factored using a general purpose factoring algorithm.

Our impression is that length of numbers factored is a measure people were trying to make progress on, so that progress here is the result of relatively consistent effort to make progress, rather than occasional incidental overlap with some other goal.

Most of the research on this page is adapted from Katja Grace’s 2013 report, with substantial revision.

Recent rate of progress

Data sources

These are the sources of data we draw on for factoring records:

  • Wikipedia’s table of ‘RSA numbers’, from the RSA factoring challenge (active ’91-’07). Of these 19 have been factored, of 54 total. The table includes numbers of digits, dates, and cash prizes offered. (data)
  • Scott Contini’s list of ‘general purpose factoring records’ since 1990’, which includes the nine RSA numbers that set digit records, and three numbers that appear to have set digit records as part of the ‘Cunningham Project‘. This list contains digits, date, solution time, and algorithm used.
  • Two estimates of how many digits could be factored in earlier decades from an essay by Carl Pomerance 7
  • This announcement, that suggests that the 116 digit general purpose factoring record was set in January 1991, contrary to Factorworld and Pomerance 8 We will ignore it, since the dates are too close to matter substantially, and the other two sources agree.
  • A 1988 paper discussing recent work and constraints on possibility at that time. It lists four ‘state of the art’ efforts at factorization, among which the largest number factored using a general purpose algorithm (MPQS) has 95 digits.9 It claims that 106 digits had been factored by 1988, which implies that the early RSA challenge numbers were not state of the art. 10 Together these suggest that the work in the paper is responsible for moving the record from 95 to 106 digits, and this matches our impressions from elsewhere though we do not know of a specific claim to this effect.
  • This ‘RSA honor roll‘ contains meta-data for the RSA solutions.
  • Cryptography and Computational Number Theory (1989), Carl Pomerance and Shafi Goldwasser.11

Excel spreadsheet containing our data for download: Factoring data 2017

Trends

Digits by year

Figure 1 shows how the scale of numbers that could be factored (using general purpose methods) grew over the last half-century (as of 2017). In red are the numbers that broke the record for the largest number of digits, as far as we know.

From it we see that since 1970, the numbers that can be factored have increased from around twenty digits to 232 digits, for an average of about 4.5 digits per year.

After the first record we have in 1988, we know of thirteen more records being set, for an average of one every 1.6 years between 1988 and 2009. Half of these were set the same or the following year as the last record, and the largest gap between records was four years. As of 2017, seven years have passed without further records being set.

The largest amount of progress seen in a single step is the last one—32 additional digits at once, or over five years of progress at the average rate seen since 1988 just prior to that point. The 200 digit record was also around five years of progress.

Figure 1: Size of numbers (in decimal digits) that could be factored over recent history. Green ‘approximate state of the art’ points do not necessarily represent specific numbers or the very largest that could be at that time—they are qualitative estimates. The other points represent specific large numbers being factored, either as the first number of that size ever to be factored (red) or not (orange). Dates are accurate to the year. Some points are annotated with decimal digit size, for ease of reading.

Hardware inputs

New digit records tend to use more computation, which makes progress in software alone hard to measure. At any point in the past it was in principle possible to factor larger numbers with more hardware. So the records we see are effectively records for what can be done with however much hardware anyone is willing to purchase for the purpose. Which grows from a combination of software improvements, hardware improvements, and increasing wealth among other things. Figure 2 shows how computing used for solutions increased with time.

CPU time to factor numbers. Measured in 1,000 MIPS-years before 2000, and in GHz-years after 2000; note that these are not directly comparable, and the conversion here is approximate. Data from Contini (2010).

Figure 2: CPU time to factor digit-record breaking numbers. Measured in 1,000 MIPS-years before 2000, and in GHz-years after 2000. These are similar, but not directly comparable, so the conversion here is approximate. Data from Contini (2010). Figure from Grace (2013). (Original caption: “Figure 31 shows CPU times for the FactorWorld records. These times have increased by a factor of around ten thousand since 1990. At 2000 the data changes from being in MIPS-years to 1 GHz CPU-years. These aren’t directly comparable. The figure uses 1 GHz CPU-year = 1,000 MIPS-years, because it is in the right ballpark and simple, and no estimates were forthcoming. The figure suggests that a GHz CPU-year is in fact worth a little more, given that the data seems to dip around 2000 with this conversion.”)

In the two decades between 1990 and 2010, figure 2 suggests that computing used has increased by about four orders of magnitude. During that time computing available per dollar has probably increased by a factor of ten every four years or so, for about five orders of magnitude. So we are seeing something like digits that can be factored at a fixed expense.

Further research

  • Discover how computation used is expected to scale with the number of digits factored, and use that to factor out increased hardware use from this trendline, and so measure non-hardware progress alone.
  • This area appears to have seen a small number of new algorithms, among smaller changes in how they are implemented. Check how much the new algorithms affected progress, and similarly for anything else with apparent potential for large impacts (e.g. a move to borrowing other people’s spare computing hardware via the internet, rather than paying for hardware).
  • Find records from earlier times
  • Some numbers had large prizes associated with their factoring, and others of similar sizes had none. Examine the relationship between progress and financial incentives in this case.
  • The Cunningham Project maintains a vast collection of recorded factorings of numbers, across many scales, along with dates, algorithms used, and people or projects responsible. Gather that data, and use to make similar inferences to the data we have here (see Relevance section below for more on that).

 

Relevance

We are interested in factoring, because it is an example of an algorithmic problem on which there has been well-documented progress. Such examples should inform our expectations for algorithmic problems in general (including problems in AI), regarding:

  • How smooth or jumpy progress tends to be, and related characteristics of its shape
  • How much warning there is of rapid progress
  • How events that are qualitatively considered ‘conceptual insights’ or ‘important progress’ relate to measured performance progress.
  • How software progress interacts with hardware (for instance, does a larger step of software progress cause a disproportionate increase in overall software output, because of redistribution of hardware?)
  • If performance is improving, how much of that is because of better hardware, and how much is because of better algorithms or other aspects of software

 

Assorted sources


 

Trends in algorithmic progress

Algorithmic progress has been estimated to contribute fifty to one hundred percent as much as hardware progress to overall performance progress, with low confidence.

Algorithmic improvements appear to be relatively incremental.

Details

We have not recently examined this topic carefully ourselves. This page currently contains relevant excerpts and sources.

Algorithmic Progress in Six Domains1 measured progress in the following areas, as of 2013:

  • Boolean satisfiability
  • Chess
  • Go
  • Largest number factored (our updated page)
  • MIP algorithms
  • Machine learning

Some key summary paragraphs from the paper:

Many of these areas appear to experience fast improvement, though the data are often noisy. For tasks in these areas, gains from algorithmic progress have been roughly fifty to one hundred percent as large as those from hardware progress. Improvements tend to be incremental, forming a relatively smooth curve on the scale of years

In recent Boolean satisfiability (SAT) competitions, SAT solver performance has increased 5–15% per year, depending on the type of problem. However, these gains have been driven by widely varying improvements on particular problems. Retrospective surveys of SAT performance (on problems chosen after the fact) display significantly faster progress.

Chess programs have improved by around fifty Elo points per year over the last four decades. Estimates for the significance of hardware improvements are very noisy but are consistent with hardware improvements being responsible for approximately half of all progress. Progress has been smooth on the scale of years since the 1960s, except for the past five.

Go programs have improved about one stone per year for the last three decades. Hardware doublings produce diminishing Elo gains on a scale consistent with accounting for around half of all progress. Improvements in a variety of physics simulations (selected after the fact to exhibit performance increases due to software) appear to be roughly half due to hardware progress.

The largest number factored to date has grown by about 5.5 digits per year for the last two decades; computing power increased ten-thousand-fold over this period, and it is unclear how much of the increase is due to hardware progress.

Some mixed integer programming (MIP) algorithms, run on modern MIP instances with modern hardware, have roughly doubled in speed each year. MIP is an important optimization problem, but one which has been called to attention after the fact due to performance improvements. Other optimization problems have had more inconsistent (and harder to determine) improvements.

Various forms of machine learning have had steeply diminishing progress in percentage accuracy over recent decades. Some vision tasks have recently seen faster progress.

Note that these points have not been updated for developments since 2013, and machine learning in particular is generally observed to have seen more progress very recently (as of 2017).

Figures

Below are assorted figures mass-extracted from Algorithmic Progress in Six Domains, some more self-explanatory than others. See the paper for their descriptions.

Page-27-Image-1311 Page-28-Image-1394 Page-29-Image-1395 Page-31-Image-1396 Page-32-Image-1397 Page-40-Image-1827 Page-41-Image-1828 Page-42-Image-1829 Page-43-Image-1830 Page-43-Image-1831 Page-47-Image-1832 Page-48-Image-1833 Page-48-Image-1834 Page-49-Image-1835 Page-50-Image-1836 Page-51-Image-1837 Page-51-Image-1838 Page-52-Image-1839 Page-52-Image-1840 Page-53-Image-1841 Page-53-Image-1842 Page-54-Image-1843 Page-54-Image-1844

Funding of AI Research

Provisional data suggests:

  • Equity deals made with startups in AI were worth about $5bn in 2016, and this value has been growing by around 50% per year in recent years.
  • The number of equity deals in AI startups globally is growing at around 30% per year, and was estimated at 658 in 2016.
  • NSF funding of IIS, a section of computer science that appears to include AI and two other areas, has increased at around 9% per year over the past two decades.

(Updated February 2017)

Background

Artificial Intelligence research is funded both publicly and privately. This page currently contains some data on private funding globally, public funding in the US,  and national government announcements of plans relating to AI funding. This page should not currently be regarded as an exhaustive summary of data available on these topics or on AI funding broadly.

Details 

AI startups

According to CB Insights, between the start of 2012 and the end of 2016, the number of equity deals being made with startups in artificial intelligence globally grew by a factor of four to 658 (around 30% per year), and the value of funding grew by a factor of over eight to $5 billion (around 50% per year).1 Their measure includes both startups developing AI techniques, and those applying existing AI techniques to problems in areas such as healthcare or advertising. They provide Figure 1 below, with further details of the intervening years. We have not checked the trustworthiness or completeness of CB Insights’ data.

Figure 1: Number of new equity deals supporting AI-related startups, and dollar values of disclosed investments over 2012-2015, according to CB Insights.

US National Science Foundation

In 2014 Muehlhauser and Sinick wrote:

In 2011, the National Science Foundation (NSF) received $636 million for funding CS research (through CISE). Of this, $169 million went to Information and Intelligent Systems (IIS). IIS has three programs: Cyber-Human Systems (CHS), Information Integration and Informatics (III) and Robust Intelligence (RI). If roughly 1/3 of the funding went to each of these, then $56 million went to Robust Intelligence, so 9% of the total CS funding. (Some CISE funding may have gone to AI work outside of IIS — that is, via ACI, CCF, or CNS — but at a glance, non-IIS AI funding through CISE looks negligible.)

The NSF Budget for Information and Intelligent Systems (IIS) has generally increased between 4% and 20% per year since 1996, with a one-time percentage boost of 60% in 2003, for a total increase of 530% over the 15 year period between 1996 and 2011.[14 {See table with upper left-hand corner A367 in the spreadsheet.}] “Robust Intelligence” is one of three program areas covered by this budget.

As of February 2017, CISE (Computer and Information Science and Engineering) covers five categories, and IIS appears to be the most relevant one.2 IIS still has three programs, of which Robust Intelligence is one.3

NSF funding into both CISE and IIS (the relevant subcategory) from 2009 to 2017 shows a steady rise.4 IIS funding as a percentage of CISE funding fluctuates, and has gone down in this time period. The following table summarizes data from NSF, collected by Finan Adamson in 2016. The figures below it (2 and 3) combine this data with some collected previously in this spreadsheet linked by Muehlhauser and Sinick. Over 21 years, IIS funding has increased fairly evenly, at 9% per year overall.

Fiscal Year IIS (Information and Intelligent Systems) Funding

In Millions of $

Total CISE (Computer and Information Science and Engineering Funding in Millions of $ IIS Funding as a % of total CISE Funding
2017 (Requested) 207.20 994.80 20.8
2016 (Estimate) 194.90 935.82 20.8
2015 (Actual) 194.58 932.98 20.9
2014 (Actual) 184.87 892.60 20.7
2013 (Actual) 176.23 858.13 20.5
2012 (Actual) 176.58 937.16 18.8
2011 (Actual) 169.14 636.06 26.5
2010 (Actual) 163.21 618.71 26.4
2009 (Actual) 150.93 574.50 26.3

IIS funding combined sources

Figure 2: Annual NSF funding to IIS

IIS funding growth

Figure 3: Yearly growth in NSF funding to IIS

National governments

US

On May, 3 2016 white house Deputy U.S. Chief Technology Officer Ed Felten announced a series of workshops and an interagency group to learn more about the benefits and risks of artificial intelligence.5

The Pentagon intended to include a request for $12-15 Billion to fund AI weapon technology in its 2017 fiscal year budget.6

Japan

Ms Kurata from the Embassy of Japan introduced Japan’s fifth Science and Technology Basic Plan, a ¥26 trillion government investment that will run between 2016-2020 and aims to promote R&D to establish a super smart society.7

China

The Chinese government announced in 2016 that it plans to create a “100 billion level” ($15 billion USD) artificial intelligence market by 2018. In their statement, the Chinese government defined artificial intelligence as a “branch of computer science where machines have human-like intelligence” and includes robots, natural language processing, and image recognition.8

South Korea

The South Korean government announced on March 17, 2016 that it would spend 1 trillion won (US$840 million) by 2020 on Artificial Intelligence. They plan to fund a high profile research center joined by Samsung and LG Electronics, SKT, KT, Naver and Hyundai Motor.9

Relevance

Financial investment in AI research is interesting because as an input to AI progress, it may help in forecasting progress. To further that goal, we are also interested in examining the relationship of funding to progress.

Investment can also be read as an indicator of investors’ judgments of the promise of AI.

Notable missing data

  • Private funding of AI other than equity deals
  • Public funding of AI research in relevant nations other than the US
  • Funding for internationally collaborative AI projects.


 

2016 Expert Survey on Progress in AI

The 2016 Expert Survey on Progress in AI is a survey of machine learning researchers that AI Impacts ran in collaboration with others in 2016.

Details

Some results are reported in When Will AI Exceed Human Performance? Evidence from AI Experts, and others have not yet been published. This page should be updated with more results soon, as of August 2017. Missing questions include several on attitudes to AI safety research, and many on the time remaining until various narrow AI milestones are met.

The full list of questions is available here as a neat pdf and here as a less neat page that is more amenable to copy-pasting. Participants received randomized subsets of these questions.

Throughout the survey, ‘HLMI’ was defined as follows:

The following questions ask about ‘high–level machine intelligence’ (HLMI).   Say we have ‘high-level machine intelligence’ when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.

Short summary of results

Below is a table of summary results from the paper (the paper contains more results).

Table S4 in Grace et al 2017.

Some key interesting results, from our blog post:

  • Comparable forecasts seem to be later than in past surveys. in the other surveys we know of, the median dates for a 50% chance of something like High-Level Machine Intelligence (HLMI) range from 2035 to 2050. Here the median answer to the most similar question puts a 50% chance of HLMI in 2057 (this isn’t in the paper—it is just the median response to the HLMI question asked using the ‘fixed probabilities framing’, i.e. the way it has been asked before). This seems surprising to me given the progress machine learning has seen since last survey, but less surprising because we changed the definition of HLMI, in part fearing it had previously been interpreted to mean a relatively low level of performance.
  • Asking people about specific jobs massively changes HLMI forecasts. When we asked some people when AI would be able to do several specific human occupations, and then all human occupations (presumably a subset of all tasks), they gave very much later timelines than when we just asked about HLMI straight out. For people asked to give probabilities for certain years, the difference was a factor of a thousand twenty years out! (10% vs. 0.01%) For people asked to give years for certain probabilities, the normal way of asking put 50% chance 40 years out, while the ‘occupations framing’ put it 90 years out. (These are all based on straightforward medians, not the complicated stuff in the paper.)
  • People consistently give later forecasts if you ask them for the probability in N years instead of the year that the probability is M. We saw this in the straightforward HLMI question, and most of the tasks and occupations, and also in most of these things when we tested them on mturk people earlier. For HLMI for instance, if you ask when there will be a 50% chance of HLMI you get a median answer of 40 years, yet if you ask what the probability of HLMI is in 40 years, you get a median answer of 30%.
  • Lots of ‘narrow’ AI milestones are forecast in the next decade as likely as not. These are interesting, because most of them haven’t been forecast before to my knowledge, and many of them have social implications. For instance, if in a decade machines can not only write pop hits as well as Taylor Swift can, but can write pop hits that sound like Taylor Swift as well as Taylor Swift can—and perhaps faster, more cheaply, and on Spotify—then will that be the end of the era of superstar musicians? This perhaps doesn’t rival human extinction risks for importance, but human extinction risks do not happen in a vacuum (except one) and there is something to be said for paying attention to big changes in the world other than the one that matters most.
  • There is broad support among ML researchers for the premises and conclusions of AI safety arguments. Two thirds of them say the AI risk problem described by Stuart Russell is at least moderately important, and a third say it is at least as valuable to work on as other problems in the field. The median researcher thinks AI has a one in twenty chance of being extremely bad on net. Nearly half of researchers want to see more safety research than we currently have (compared to only 11% who think we are already prioritizing safety too much). There has been a perception lately that AI risk has moved to being a mainstream concern among AI researchers, but it is hard to tell from voiced opinion whether one is hearing from a loud minority or the vocal tip of an opinion iceberg. So it is interesting to see this perception confirmed with survey data.
  • Researchers’ predictions vary a lot. That is pretty much what I expected, but it is still important to know. Interestingly (and not in the paper), researchers don’t seem to be aware that their predictions vary a lot. More than half of respondents guess that they disagree ‘not much’ with the typical AI researcher about when HLMI will exist (vs. a moderate amount, or a lot).
  • Researchers who studied in Asia have much shorter timelines than those who studied in North Amercia. In terms of the survey’s ‘aggregate prediction’ thing, which is basically a mean, the difference is 30 years (Asia) vs. 74 years (North America). (See p5)
  • I feel like any circumstance where a group of scientists guesses that the project they are familiar with has a 5% chance of outcomes near ‘human extinction’ levels of bad is worthy of special note, though maybe it is not actually that surprising, and could easily turn out to be misuse of small probabilities or something.

Results

Human-level intelligence

Questions

We sought forecasts for something like human-level AI in three different ways, to reduce noise from unknown framing biases:

  • Directly, using a question much like Müller and Bostrom’s, though with a refined definition of High-Level Machine Intelligence (HLMI).
  • At the end of a sequence of questions about the automation of specific human occupations.
  • Indirectly, with an ‘outside view’ approximation: by asking each person how long it has taken to make the progress to date in their subfield, and what fraction of the ground has been covered. This is Robin Hanson‘s approach, which he found suggested much longer timelines than those reached directly.

For the first two of these, we split people in half, and asked one half how many years until a certain chance of the event would obtain, and the other half what the chance was of the event occurring by specific dates. We call these ‘fixed probabilities’ and ‘fixed years’ framings throughout.

For the (somewhat long and detailed) specifics of these questions, see here or here (pdf).

Answers

The table and figure below show the median dates and probabilities given for the direct ‘HLMI’ question, and in the ‘via occupations’ questions, under both the fixed probabilities and fixed years framings.

10% 50% 90% 10 years 20 years 50 years
Truck Driver 5 10 20 50% 75% 95%
Surgeon 10 30 50 5% 20% 50%
Retail Salesperson 5 13.5 20 30% 60% 91.5%
AI Researcher 25 50 100 0% 1% 10%
Existing occupation among final to be automated 50 100 200 0% 0% 3.5%
Full Automation of labor 50 90 200 0% 0.01% 3%
HLMI 15 40 100 1% 10% >30%* (30% in 40y)

*Due to a typo, this question asked about 40 years rather than 50 years, so doesn’t match the others.

Figure 1: Median answers to questions about probabilities by dates (‘fixed year’) and dates for probabilities (‘fixed probability’), for different occupations, all current occupations, and all tasks (HLMI).

Interesting things to note:

  • Fixed years framings (‘Fyears —‘, labeled with stars) universally produce later timelines. xxx[can include colored in version]
  • HLMI (thick blue lines) is logically required to be after full automation of labor (‘Occ’) yet is forecast much earlier than it, and earlier even than the specific occupation ‘AI researcher’.
  • Even the more pessimistic Fyears estimates suggest retail salespeople have a good chance of being automated within 20 years, and are very likely to be in fifty.

Intelligence Explosion

Probability of dramatic technological speedup
Question

Assume that HLMI will exist at some point. How likely do you then think it is that the rate of global technological improvement will dramatically increase (e.g. by a factor of ten) as a result of machine intelligence:

Within two years of that point?       ___% chance

Within thirty years of that point?    ___% chance

[NB. If I understand correctly, a small number of respondents answered a slightly different version of this question in an initial round, and we changed it (probably to make it easier to understand), and those first answers aren’t included here.]

Answers

Median P(…within two years) = 20%

Median P(…within thirty years) = 80%

 

Probability of superintelligence
Question

Assume that HLMI will exist at some point. How likely do you think it is that there will be machine intelligence that is vastly better than humans at all professions (i.e. that is vastly more capable or vastly cheaper):

Within two years of that point?       ___% chance

Within thirty years of that point?    ___% chance

 

Answers

Median P(…within two years) = 10%

Median P(…within thirty years) = 50%

This is the distribution of answers  to the former:

Chance that the intelligence explosion argument is about right
Question

Some people have argued the following:

If AI systems do nearly all research and development, improvements in AI will accelerate the pace of technological progress, including further progress in AI.

.

Over a short period (less than 5 years), this feedback loop could cause technological progress to become more than an order of magnitude faster.

How likely do you find this argument to be broadly correct?

  • Quite unlikely (0-20%)
  • Unlikely (21-40%)
  • About even chance (41-60%)
  • Likely (61-80%)
  • Quite likely (81-100%)
Answers

These are the Pearson product-moment correlation coefficients for the different answers, among people who received both of a pair of questions:

Impacts of HLMI

Question

Assume for the purpose of this question that HLMI will at some point exist. How positive or negative do you expect the overall impact of this to be on humanity, in the long run? Please answer by saying how probable you find the following kinds of impact, with probabilities adding to 100%:

______ Extremely good (e.g. rapid growth in human flourishing) (1)

______ On balance good (2)

______ More or less neutral (3)

______ On balance bad (4)

______ Extremely bad (e.g. human extinction) (5)

Answers

Sensitivity of progress to changes in inputs

Question

The next questions ask about the sensitivity of progress in AI capabilities to changes in inputs.    ‘Progress in AI capabilities’ is an imprecise concept, so we are asking about progress as you naturally conceive of it, and looking for approximate answers.

[The participant received a random three of the following five parts]

Imagine that over the past decade, only half as much researcher effort had gone into AI research. For instance, if there were actually 1,000 researchers, imagine that there had been only 500 researchers (of the same quality).How much less progress in AI capabilities would you expect to have seen? e.g. If you think progress is linear in the number of researchers, so 50% less progress would have been made, write ’50’. If you think only 20% less progress would have been made write ’20’.

……% less

Over the last 10 years the cost of computing hardware has fallen by a factor of 20. Imagine instead that the cost of computing hardware had fallen by only a factor of 5 over that time (around half as far on a log scale).   How much less progress in AI capabilities would you expect to have seen? e.g. If you think progress is linear in 1/cost, so that 1-5/20=75% less progress would have been made, write ’75’. If you think only 20% less progress would have been made write ’20’.

……% less

Imagine that over the past decade, there had only been half as much effort put into increasing the size and availability of training datasets. For instance, perhaps there are only half as many datasets, or perhaps existing datasets are substantially smaller or lower quality.How much less progress in AI capabilities would you expect to have seen? e.g. If you think 20% less progress would have been made, write ‘20’

……% less

Imagine that over the past decade, AI research had half as much funding (in both academic and industry labs). For instance, if the average lab had a budget of $20 million each year, suppose their budget had only been $10 million each year.  How much less progress in AI capabilities would you expect to have seen? e.g. If you think 20% less progress would have been made, write ‘20’

……% less

Imagine that over the past decade, there had been half as much progress in AI algorithms. You might imagine this as conceptual insights being half as frequent.  How much less progress in AI capabilities would you expect to have seen? e.g. If you think 20% less progress would have been made, write ‘20’

……% less

Answers

The following five figures are histograms, showing the number of people who gave different answers to the five question parts above.

Sample sizes
Researcher effort Cost computing Training data Funding Algorithm progress
71 64 71 68 59
Medians

The following figure shows median answers to the above questions.

Researcher effort Cost computing Training data Funding Algorithm progress
30 50 40 40 50
Correlations

In sum:

  • Opinions vary widely on the importance of these inputs, especially training data.
  • On researcher effort, about half of people say 20-40%
  • Training data seems fairly evenly spread across the board
  • More than a third of people said >80% for computing cost. There are basically two spikes, but see the question.
  • Funding : evenly spread in the space between 20% and 80%
  • Algorithmic progress: evenly spread between 10% and 90%, but with a large peak of a quarter of respondents at 50-60%
  • How important people think funding is is highly correlated with how important they think research effort and algorithmic progress are.
  • How important people think hardware is is highly correlated with how important they think training data is.

Outside view implied HLMI forecasts

Questions

Which AI research area have you worked in for the longest time?

————————————

How long have you worked in this area?

———years

Consider three levels of progress or advancement in this area:

A. Where the area was when you started working in it

B. Where it is now

C. Where it would need to be for AI software to have roughly human level abilities at the tasks studied in this area

What fraction of the distance between where progress was when you started working in the area (A) and where it would need to be to attain human level abilities in the area (C) have we come so far (B)?

———%

Divide the period you have worked in the area into two halves: the first and the second. In which half was the rate of progress in your area higher?

  • The first half
  • The second half
  • They were about the same
Answers

Each person told us how long they had been in their subfield, and what fraction of the remaining path to human-level performance (in their subfield) they thought had been traversed in that time. From this, we can estimate when the subfield should reach ‘human-level performance’, if progress continued at the same rate. The following graph shows those forecast dates.

Disagreements and Misunderstandings

Questions

To what extent do you think you disagree with the typical AI researcher about when HLMI will exist?

  • A lot (17)
  • A moderate amount (18)
  • Not much (19)

 

If you disagree, why do you think that is?

_______________________________________________________

To what extent do you think people’s concerns about future risks from AI are due to misunderstandings of AI research?

  • Almost entirely (1)
  • To a large extent (2)
  • Somewhat (4)
  • Not much (3)
  • Hardly at all (5)

 

What do you think are the most important misunderstandings, if there are any?

________________________________________________________

Answers

 

One hundred and eighteen people responded to the question on misunderstandings, and 74 of them described what they thought the most important misunderstandings were. The table and figures below show our summary of the responses. We are not yet able to share the raw data, so only one person has categorized these responses at present, and we have not checked thoroughly for errors.

 Most important misunderstandings  Number  Fraction of non-empty responses
Underestimate distance from generality, open-ended tasks 9 12%
Overestimate state of the art (other) 10 14%
Underestimate distance from AGI at this rate 13 18%
Think AI will be in control of us or in conflict with us 11 15%
Expect humans to be obsoleted 7 9%
Overly influenced by fiction 7 9%
Expect AI to be human-like or sentient 6 8%
Expect sudden or surprising events 5 7%
Think AI will go outside its programming 5 7%
Influenced by poor reporting 5 7%
Wrongly equate intelligence with something else 4 5%
Underestimate systemic social risks 2 3%
Overestimate distance to strong AI 2 3%
Other ignorance of AI 4 5%
Other 12 16%
Empty 44 59%

 

Narrow tasks

Questions

Respondents were each asked one of the following two questions:

Fixed years framing:

How likely do you think it is that the following AI tasks will be feasible within the next:

  • 10 years?
  • 20 years?
  • 50 years?

Let a task be ‘feasible’ if one of the best resourced labs could implement it in less than a year if they chose to. Ignore the question of whether they would choose to.

Fixed probabilities framing:

How many years until you think the following AI tasks will be feasible with:

  • a small chance (10%)?
  • an even chance (50%)?
  • a high chance (90%)?

Let a task be ‘feasible’ if one of the best resourced labs could implement it in less than a year if they chose to. Ignore the question of whether they would choose to.

Each researcher was then presented with a random four of the following tasks:

[Rosetta] Translate a text written in a newly discovered language into English as well as a team of human experts, using a single other document in both languages (like a Rosetta stone). Suppose all of the words in the text can be found in the translated document, and that the language is a difficult one.

[Subtitles] Translate speech in a new language given only unlimited films with subtitles in the new language. Suppose the system has access to training data for other languages, of the kind used now (e.g. same text in two languages for many languages and films with subtitles in many languages).

[Translate] Perform translation about as good as a human who is fluent in both languages but unskilled at translation, for most types of text, and for most popular languages (including languages that are known to be difficult, like Czech, Chinese and Arabic).

[Phone bank] Provide phone banking services as well as human operators can, without annoying customers more than humans. This includes many one-off tasks, such as helping to order a replacement bank card or clarifying how to use part of the bank website to a customer.

[Class] Correctly group images of previously unseen objects into classes, after training on a similar labeled dataset containing completely different classes. The classes should be similar to the ImageNet classes.

[One-shot] One-shot learning: see only one labeled image of a new object, and then be able to recognize the object in real world scenes, to the extent that a typical human can (i.e. including in a wide variety of settings). For example, see only one image of a platypus, and then be able to recognize platypuses in nature photos. The system may train on labeled images of other objects.   Currently, deep networks often need hundreds of examples in classification tasks1, but there has been work on one-shot learning for both classification2 and generative tasks3.

1 Lake et al. (2015). Building Machines That Learn and Think Like People
2 Koch (2015). Siamese Neural Networks for One-Shot Image Recognition
3 Rezende et al. (2016). One-Shot Generalization in Deep Generative Models

[Video scene] See a short video of a scene, and then be able to construct a 3D model of the scene that is good enough to create a realistic video of the same scene from a substantially different angle.

For example, constructing a short video of walking through a house from a video taking a very different path through the house.

[Transcribe] Transcribe human speech with a variety of accents in a noisy environment as well as a typical human can.

[Read aloud] Take a written passage and output a recording that can’t be distinguished from a voice actor, by an expert listener.

[Theorems] Routinely and autonomously prove mathematical theorems that are publishable in top mathematics journals today, including generating the theorems to prove.

[Putnam] Perform as well as the best human entrants in the Putnam competition—a math contest whose questions have known solutions, but which are difficult for the best young mathematicians.

[Go low] Defeat the best Go players, training only on as many games as the best Go players have played.

For reference, DeepMind’s AlphaGo has probably played a hundred million games of self-play, while Lee Sedol has probably played 50,000 games in his life1.

1 Lake et al. (2015). Building Machines That Learn and Think Like People

[Starcraft] Beat the best human Starcraft 2 players at least 50% of the time, given a video of the screen.

Starcraft 2 is a real time strategy game characterized by:

  • Continuous time play
  • Huge action space
  • Partial observability of enemies Long term strategic play, e.g. preparing for and then hiding surprise attacks.

[Rand game] Play a randomly selected computer game, including difficult ones, about as well as a human novice, after playing the game less than 10 minutes of game time. The system may train on other games.

[Angry birds] Play new levels of Angry Birds better than the best human players. Angry Birds is a game where players try to efficiently destroy 2D block towers with a catapult. For context, this is the goal of the IJCAI Angry Birds AI competition1.

1 aibirds.org

[Atari] Outperform professional game testers on all Atari games using no game-specific knowledge. This includes games like Frostbite, which require planning to achieve sub-goals and have posed problems for deep Q-networks1, 2.

1 Mnih et al. (2015). Human-level control through deep reinforcement learning
2 Lake et al. (2015). Building Machines That Learn and Think Like People

[Atari fifty] Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.

For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games1, but used hundreds of hours of play to train2.

1 Mnih et al. (2015). Human-level control through deep reinforcement learning
2 Lake et al. (2015). Building Machines That Learn and Think Like People

[Laundry] Fold laundry as well and as fast as the median human clothing store employee.

[Race] Beat the fastest human runners in a 5 kilometer race through city streets using a bipedal robot body.

[Lego] Physically assemble any LEGO set given the pieces and instructions, using non-specialized robotics hardware.

For context, Fu 20161 successfully joins single large LEGO pieces using model based reinforcement learning and online adaptation.

1 Fu et al. (2016). One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors

[Sort] Learn to efficiently sort lists of numbers much larger than in any training set used, the way Neural GPUs can do for addition1, but without being given the form of the solution.

For context, Neural Turing Machines have not been able to do this2, but Neural Programmer-Interpreters3 have been able to do this by training on stack traces (which contain a lot of information about the form of the solution).

1 Kaiser & Sutskever (2015). Neural GPUs Learn Algorithms
2 Zaremba & Sutskever (2015). Reinforcement Learning Neural Turing Machines
3 Reed & de Freitas (2015). Neural Programmer-Interpreters

[Python] Write concise, efficient, human-readable Python code to implement simple algorithms like quicksort. That is, the system should write code that sorts a list, rather than just being able to sort lists.

Suppose the system is given only:

  • A specification of what counts as a sorted list
  • Several examples of lists undergoing sorting by quicksort

[Factoid] Answer any “easily Googleable” factoid questions posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.

Examples of factoid questions:

  •  “What is the poisonous substance in Oleander plants?”
  • “How many species of lizard can be found in Great Britain?”

[Open quest] Answer any “easily Googleable” factual but open ended question posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.

Examples of open ended questions:

  • “What does it mean if my lights dim when I turn on the microwave?”
  • “When does home insurance cover roof replacement?”

[Unkn quest] Give good answers in natural language to factual questions posed in natural language for which there are no definite correct answers.

For example:”What causes the demographic transition?”, “Is the thylacine extinct?”, “How safe is seeing a chiropractor?”

[Essay] Write an essay for a high-school history class that would receive high grades and pass plagiarism detectors.

For example answer a question like ‘How did the whaling industry affect the industrial revolution?’

[Top forty] Compose a song that is good enough to reach the US Top 40. The system should output the complete song as an audio file.

[Taylor] Produce a song that is indistinguishable from a new song by a particular artist, e.g. a song that experienced listeners can’t distinguish from a new song by Taylor Swift.

[Novel] Write a novel or short story good enough to make it to the New York Times best-seller list.

[Explain] For any computer game that can be played well by a machine, explain the machine’s choice of moves in a way that feels concise and complete to a layman.

[Poker] Play poker well enough to win the World Series of Poker.

[Laws phys] After spending time in a virtual world, output the differential equations governing that world in symbolic form.

For example, the agent is placed in a game engine where Newtonian mechanics holds exactly and the agent is then able to conduct experiments with a ball and output Newton’s laws of motion.

Answers
Fixed years framing

Probabilities by year

10 years 20 years 50 years
Rosetta 20 50 95
Subtitles 30 50 90
Translate 50 65 94.5
Phone bank 40 75 99
Class 50 75 99
One-shot 25 60 90
Video scene 50 70 99
Transcribe 65 95 99
Read aloud 50 90 99
Theorems 5 20 40
Putnam 5 20 50
Go low 10 25 60
Starcraft 70 90 99
Rand game 25 50 80
Angry birds 90 95 99.4995
Atari 50 60 92.5
Atari fifty 40 75 95
Laundry 55 95 99
Race 30 70 95
Lego 57.5 85 99
Sort 50 90 95
Python 50 79 90
Factoid 50 82.5 99
Open quest 50 65 90
Unkn quest 40 70 90
Essay 25 50 90
Top forty 27.5 50 90
Taylor 60 75 99
Novel 1 25 62.5
Explain 30 60 90
Poker 70 90 99
Laws phys 20 40 80
Fixed probabilities  framing

Years by probability

10 percent 50 percent 90 percent
Rosetta 10 20 50
Subtitles 5 10 15
Translate 3 7 15
Phone bank 3 6 10
Class 2 4.5 6.5
One-shot 4.5 8 20
Video scene 5 10 20
Transcribe 5 10 20
Read aloud 5 10 15
Theorems 10 50 90
Putnam 15 35 55
Go low 3.5 8.5 19.5
Starcraft 2 5 10
Rand game 5 10 15
Angry birds 2 4 6
Atari 5 10 15
Atari fifty 2 5 10
Laundry 2 5.5 10
Race 5 10 20
Lego 5 10 15
Sort 3 5 10
Python 3 10 20
Factoid 3 5 10
Open quest 5 10 15
Unkn quest 4 10 17.5
Essay 2 7 15
Top forty 5 10 20
Taylor 5 10 20
Novel 10 30 50
Explain 5 10 15
Poker 1 3 5.5
Laws phys 5 10 20

 

Safety

Stuart Russell’s problem
Question

Stuart Russell summarizes an argument for why highly advanced AI might pose a risk as follows:

The primary concern [with highly advanced AI] is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken […]. Now we have a problem:

1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.  This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.

 

Do you think this argument points at an important problem?

  • No, not a real problem.
  • No, not an important problem.
  • Yes, a moderately important problem.
  • Yes, a very important problem.
  • Yes, among the most important problems in the field.

 

How valuable is it to work on this problem today, compared to other problems in AI?

  • Much less valuable
  • Less valuable
  • As valuable as other problems
  • More valuable
  • Much more valuable

 

How hard do you think this problem is compared to other problems in AI?

  • Much easier
  • Easier
  • As hard as other problems
  • Harder
  • Much harder

 

Answers

 

General safety
Question

Let ‘AI safety research’ include any AI-related research that, rather than being primarily aimed at improving the capabilities of AI systems, is instead primarily aimed at minimizing potential risks of AI systems (beyond what is already accomplished for those goals by increasing AI system capabilities).

Examples of AI safety research might include:

  • Improving the human-interpretability of machine learning algorithms for the purpose of improving the safety and robustness of AI systems, not focused on improving AI capabilities
  • Research on long-term existential risks from AI systems
  • AI-specific formal verification research
  • Policy research about how to maximize the public benefits of AI

How much should society prioritize AI safety research, relative to how much it is currently prioritized?

  • Much less
  • Less
  • About the same
  • More
  • Much more

 

Answers

 

Paper

Some results are reported in When Will AI Exceed Human Performance? Evidence from AI Experts, by Grace et al.

Some notes on interpreting the paper (originally in our blog post):

  1. The milestones in the timeline and in the abstract are from three different sets of questions. There seems to be a large framing effect between two of them—full automation of labor is logically required to be before HLMI, and yet it is predicted much later—and it is unclear whether people answer the third set of questions (about narrow tasks) more like the one about HLMI or more like the one about occupations. Plus even if there were no framing effect to worry about, we should expect milestones about narrow tasks to be much earlier than milestones about very similar sounding occupations. For instance, if there were an occupation ‘math researcher’, it should be later than the narrow task summarized here as ‘math research’. So there is a risk of interpreting the figure as saying AI research is harder than math research, when really the ‘-er’ is all-important. So to help avoid confusion, here is the timeline colored in by which set of questions each milestone came from. The blue one was asked on its own. The orange ones were always asked together: first all four occupations, then they were asked for an occupation they expected to be very late, and when they expected it, then full automation of labor. The pink milestones were randomized, so that each person got four. There are a lot more pink milestones not included here, but included in the long table at the end of the paper.
  2. In Figure 2 and Table S5 I believe the word ‘median’ means we are talking about the ‘50% chance of occurring’ number, and the dates given are this ‘median’ (50% chance) date for a distribution that was made by averaging together all of the different people’s distributions (or what we guess their distributions are like from three data points).

 

  1. “Our analysis includes companies applying AI algorithms to verticals like healthcare, security, advertising, and finance as well as those developing general-purpose AI tech. Our list excludes robotics (hardware-focused) and AR/VR startups, which we’ve analyzed separately here and here. Our analysis includes all equity funding rounds and convertible notes. This post was updated on 1/19/2017 to include deals through the end of 2016….Deals reached a 5-year high last year, from 160 deals in 2012 to 658 in 2016. Dollars invested also rose considerably in 2016, up about 60%.” – CB Insights, https://www.cbinsights.com/blog/artificial-intelligence-startup-funding/  (See also Figure 1)
  2. “…IIS also invests in research on artificial intelligence, computer vision, natural language processing, robotics, machine learning, computational neuroscience, cognitive science, and areas leading to the computational understanding and modeling of intelligence in complex, realistic contexts.” – CISE Funding, Directorate for Computer and Information Science and Engineering (CISE).
  3. CISE’s Division of Information and Intelligent Systems (IIS) supports research and education projects that develop new knowledge in three core programs:

    • The Cyber-Human Systems (CHS) program;
    • The Information Integration and Informatics (III) program; and
    • The Robust Intelligence (RI) program.

    Information and Intelligent Systems (IIS): Core Programs

  4. NSF Budget:

    http://www.nsf.gov/about/budget/fy2017/pdf/18_fy2017.pdf

    https://www.nsf.gov/about/budget/fy2016/pdf/18_fy2016.pdf

    http://www.nsf.gov/about/budget/fy2015/pdf/18_fy2015.pdf

    https://www.nsf.gov/about/budget/fy2014/pdf/18_fy2014.pdf

    http://www.nsf.gov/about/budget/fy2013/pdf/06-CISE_fy2013.pdf

    https://www.nsf.gov/about/budget/fy2012/pdf/17_fy2012.pdf

    https://www.nsf.gov/about/budget/fy2011/pdf/06-CISE_fy2011.pdf

  5. White House Office of Science and Technology Policy, https://www.whitehouse.gov/blog/2016/05/03/preparing-future-artificial-intelligence
  6. Business Insider, http://www.businessinsider.com/the-pentagon-wants-at-least-12-billion-to-fund-ai-weapon-technology-in-2017-2015-12
  7. UK-RAS Network, http://hamlyn.doc.ic.ac.uk/uk-ras/news/japan-uk-collaboration
  8. Technode, http://technode.com/2016/05/27/chinese-goverment-wants-100-billion-level-artificial-intelligence-market-2018/
  9. Yonhap News Agency, http://english.yonhapnews.co.kr/news/2016/03/17/0200000000AEN20160317003751320.html
  10. “At the time Of writing this paper we have factored two 93, one 96, one 100, one 102, and 358 one 106 digit number using mpqs, and we m working on a 103 digit number, for all these numbers extensive ecm attempts had failed. “- Lenstra et al 1988 
  11. e.g. p44 mentions the 107-digit record and some details.

One Comment

  1. AI Impacts – The AI Impacts Blog Says :

    2015-04-01 at 7:27 AM

    […] Articles […]