Guide to pages on AI timeline predictions

This page is an informal outline of the other pages on this site about AI timeline predictions made by others. Headings link to higher level pages, intended to summarize the evidence from pages below them. This list was complete on 7 April 2017 (here is a category that may contain newer entries, though not conveniently organized).

Guide

Topic synthesis: AI timeline predictions as evidence (page)

The predictions themselves:

—from surveys (page):
  1. 2016 Expert survey on progress in AI: our own survey, results forthcoming.
  2. Müller and Bostrom AI Progress Poll: the most recent survey with available results, including 29 of the most cited AI researchers as participants.
  3. Hanson AI Expert Survey: in which researchers judge fractional progress toward human-level performance over their careers, in a series of informal conversations.
  4. Kruel AI survey: in which experts give forecasts and detailed thoughts, interview style.
  5. FHI Winter Intelligence Survey: in which impacts-concerned AGI conference attendees forecast AI in 2011.
  6. AGI-09 Survey: in which AGI conference attendees forecast various human-levels of AI in 2009.
  7. Klein AGI survey: in which a guy with a blog polls his readers.
  8. AI@50 survey: in which miscellaneous conference goers are polled informally.
  9. Bainbridge Survey: in which 26 expert technologists expect human-level AI in 2085 and give it a 5.6/10 rating on benefit to humanity.
  10. Michie Survey: in which 67 AI and CS researchers are not especially optimistic in the ‘70s.
—from public statements:
  1. MIRI AI predictions dataset: a big collection of public predictions gathered from the internet.
—from written analyses (page), for example:
  1. The Singularity is Near: in which a technological singularity is predicted in 2045, based on when hardware is extrapolated to compute radically more than human minds in total.
  2. The Singularity Isn’t Near: in which it is countered that human-level AI requires software as well as hardware, and none of the routes to producing software will get there by 2045.
  3. (Several others are listed in the analyses page above, but do not have their own summary pages.)

On what to infer from the predictions

Some considerations regarding accuracy and bias (page):
  1. Contra a common view that past AI forecasts were unreasonably optimistic, AI predictions look fairly similar over time, except a handful of very early somewhat optimistic ones.
  2. The Maes Garreau Law claims that people tend to predict AI near the end of their own expected lifetime. It is not true.
  3. We expect publication biases to favor earlier forecasts.
  4. Predictions made in surveys seem to be overall a bit later than those made in public statements, (maybe because surveys prevent some publication biases).
  5. People who are inclined toward optimism about AI are more likely to become AI researchers, leading to a selection bias from optimistic experts.
  6. We know of some differences in forecasts made by different groups.

Blog posts on these topics:

Progress in general purpose factoring

The largest number factored to date grew by about 4.5 decimal digits per year over the past roughly half-century. Between 1988, when we first have good records, and 2009, when the largest number to date was factored, progress was roughly 6 decimal digits per year.

Progress was relatively smooth during the two decades for which we have good records, with half of new records being less than two years after the previous record, and the biggest advances between two records representing about five years of prior average progress.

Computing hardware used in record-setting factorings increased ten-thousand-fold over the records (roughly in line with falling computing prices), and we are unsure how much of overall factoring progress is due to this.

Support

Background

To factor an integer N is to find two integers l and m such that l*m = N. There is no known efficient algorithm for the general case of this problem. While there are special purpose algorithms for some kinds of numbers with particular forms, here we are interested in ‘general purpose‘ factoring algorithms. These factor any composite number with running time only dependent on the size of that number.1

Factoring numbers large enough to break records frequently takes months or years, even with a large number of computers. For instance, RSA-768, the largest number to be factored to date, had 232 decimal digits and was factored over multiple years ending in 2009, using the equivalent of almost 2000 years of computing on a single 2.2 GHz AMD Opteron processor with 2GB RAM.2

It is important to know what size numbers can be factored with current technology, because factoring large numbers is central to cryptographic security schemes such as RSA.3 Much of the specific data we have on progress in factoring is from the RSA Factoring Challenge: a contest funded by RSA Laboratories, offering large cash prizes for factoring various numbers on their list of 54 (the ‘RSA Numbers‘).4 Their numbers are semiprimes (i.e. each has only two prime factors), and each number has between 100 and 617 decimal digits.5

The records collected on this page are for factoring specific large numbers, using algorithms whose performance depends on nothing but the scale of the number (general purpose algorithms). So a record for a specific N-digit number does not imply that that was the first time any N-digit number was factored. However if it is an important record, it strongly suggests that it was difficult to factor arbitrary N-digit numbers at the time, so others are unlikely to have been factored very much earlier, using general purpose algorithms. For a sense of scale, our records here include at least eight numbers that were not the largest ever factored at the time yet appear to have been considered difficult to factor, and many of those were factored five to seven years after the first time a number so large was factored. For numbers that are the first of their size in our records, we expect that they mostly are the first of that size to have been factored, with the exception of early records.6

Once a number of a certain size has been factored, it will still take years before numbers at that scale can be cheaply factored. For instance, while 116 digit numbers had been factored by 1991 (see below), it was an achievement in 1996 to factor a different specific 116 digit number, which had been on a ‘most wanted list’.

If we say a digit record is broken in some year (rather than a record for a particular number), we mean that that is the first time that any number with at least that many digits has been factored using a general purpose factoring algorithm.

Our impression is that length of numbers factored is a measure people were trying to make progress on, so that progress here is the result of relatively consistent effort to make progress, rather than occasional incidental overlap with some other goal.

Most of the research on this page is adapted from Katja Grace’s 2013 report, with substantial revision.

Recent rate of progress

Data sources

These are the sources of data we draw on for factoring records:

  • Wikipedia’s table of ‘RSA numbers’, from the RSA factoring challenge (active ’91-’07). Of these 19 have been factored, of 54 total. The table includes numbers of digits, dates, and cash prizes offered. (data)
  • Scott Contini’s list of ‘general purpose factoring records’ since 1990’, which includes the nine RSA numbers that set digit records, and three numbers that appear to have set digit records as part of the ‘Cunningham Project‘. This list contains digits, date, solution time, and algorithm used.
  • Two estimates of how many digits could be factored in earlier decades from an essay by Carl Pomerance 7
  • This announcement, that suggests that the 116 digit general purpose factoring record was set in January 1991, contrary to Factorworld and Pomerance 8 We will ignore it, since the dates are too close to matter substantially, and the other two sources agree.
  • A 1988 paper discussing recent work and constraints on possibility at that time. It lists four ‘state of the art’ efforts at factorization, among which the largest number factored using a general purpose algorithm (MPQS) has 95 digits.9 It claims that 106 digits had been factored by 1988, which implies that the early RSA challenge numbers were not state of the art. 10 Together these suggest that the work in the paper is responsible for moving the record from 95 to 106 digits, and this matches our impressions from elsewhere though we do not know of a specific claim to this effect.
  • This ‘RSA honor roll‘ contains meta-data for the RSA solutions.
  • Cryptography and Computational Number Theory (1989), Carl Pomerance and Shafi Goldwasser.11

Excel spreadsheet containing our data for download: Factoring data 2017

Trends

Digits by year

Figure 1 shows how the scale of numbers that could be factored (using general purpose methods) grew over the last half-century (as of 2017). In red are the numbers that broke the record for the largest number of digits, as far as we know.

From it we see that since 1970, the numbers that can be factored have increased from around twenty digits to 232 digits, for an average of about 4.5 digits per year.

After the first record we have in 1988, we know of thirteen more records being set, for an average of one every 1.6 years between 1988 and 2009. Half of these were set the same or the following year as the last record, and the largest gap between records was four years. As of 2017, seven years have passed without further records being set.

The largest amount of progress seen in a single step is the last one—32 additional digits at once, or over five years of progress at the average rate seen since 1988 just prior to that point. The 200 digit record was also around five years of progress.

Figure 1: Size of numbers (in decimal digits) that could be factored over recent history. Green ‘approximate state of the art’ points do not necessarily represent specific numbers or the very largest that could be at that time—they are qualitative estimates. The other points represent specific large numbers being factored, either as the first number of that size ever to be factored (red) or not (orange). Dates are accurate to the year. Some points are annotated with decimal digit size, for ease of reading.

Hardware inputs

New digit records tend to use more computation, which makes progress in software alone hard to measure. At any point in the past it was in principle possible to factor larger numbers with more hardware. So the records we see are effectively records for what can be done with however much hardware anyone is willing to purchase for the purpose. Which grows from a combination of software improvements, hardware improvements, and increasing wealth among other things. Figure 2 shows how computing used for solutions increased with time.

CPU time to factor numbers. Measured in 1,000 MIPS-years before 2000, and in GHz-years after 2000; note that these are not directly comparable, and the conversion here is approximate. Data from Contini (2010).

Figure 2: CPU time to factor digit-record breaking numbers. Measured in 1,000 MIPS-years before 2000, and in GHz-years after 2000. These are similar, but not directly comparable, so the conversion here is approximate. Data from Contini (2010). Figure from Grace (2013). (Original caption: “Figure 31 shows CPU times for the FactorWorld records. These times have increased by a factor of around ten thousand since 1990. At 2000 the data changes from being in MIPS-years to 1 GHz CPU-years. These aren’t directly comparable. The figure uses 1 GHz CPU-year = 1,000 MIPS-years, because it is in the right ballpark and simple, and no estimates were forthcoming. The figure suggests that a GHz CPU-year is in fact worth a little more, given that the data seems to dip around 2000 with this conversion.”)

In the two decades between 1990 and 2010, figure 2 suggests that computing used has increased by about four orders of magnitude. During that time computing available per dollar has probably increased by a factor of ten every four years or so, for about five orders of magnitude. So we are seeing something like digits that can be factored at a fixed expense.

Further research

  • Discover how computation used is expected to scale with the number of digits factored, and use that to factor out increased hardware use from this trendline, and so measure non-hardware progress alone.
  • This area appears to have seen a small number of new algorithms, among smaller changes in how they are implemented. Check how much the new algorithms affected progress, and similarly for anything else with apparent potential for large impacts (e.g. a move to borrowing other people’s spare computing hardware via the internet, rather than paying for hardware).
  • Find records from earlier times
  • Some numbers had large prizes associated with their factoring, and others of similar sizes had none. Examine the relationship between progress and financial incentives in this case.
  • The Cunningham Project maintains a vast collection of recorded factorings of numbers, across many scales, along with dates, algorithms used, and people or projects responsible. Gather that data, and use to make similar inferences to the data we have here (see Relevance section below for more on that).

 

Relevance

We are interested in factoring, because it is an example of an algorithmic problem on which there has been well-documented progress. Such examples should inform our expectations for algorithmic problems in general (including problems in AI), regarding:

  • How smooth or jumpy progress tends to be, and related characteristics of its shape
  • How much warning there is of rapid progress
  • How events that are qualitatively considered ‘conceptual insights’ or ‘important progress’ relate to measured performance progress.
  • How software progress interacts with hardware (for instance, does a larger step of software progress cause a disproportionate increase in overall software output, because of redistribution of hardware?)
  • If performance is improving, how much of that is because of better hardware, and how much is because of better algorithms or other aspects of software

 

Assorted sources


 

Trends in algorithmic progress

Algorithmic progress has been estimated to contribute fifty to one hundred percent as much as hardware progress to overall performance progress, with low confidence.

Algorithmic improvements appear to be relatively incremental.

Details

We have not recently examined this topic carefully ourselves. This page currently contains relevant excerpts and sources.

Algorithmic Progress in Six Domains1 measured progress in the following areas, as of 2013:

  • Boolean satisfiability
  • Chess
  • Go
  • Largest number factored (our updated page)
  • MIP algorithms
  • Machine learning

Some key summary paragraphs from the paper:

Many of these areas appear to experience fast improvement, though the data are often noisy. For tasks in these areas, gains from algorithmic progress have been roughly fifty to one hundred percent as large as those from hardware progress. Improvements tend to be incremental, forming a relatively smooth curve on the scale of years

In recent Boolean satisfiability (SAT) competitions, SAT solver performance has increased 5–15% per year, depending on the type of problem. However, these gains have been driven by widely varying improvements on particular problems. Retrospective surveys of SAT performance (on problems chosen after the fact) display significantly faster progress.

Chess programs have improved by around fifty Elo points per year over the last four decades. Estimates for the significance of hardware improvements are very noisy but are consistent with hardware improvements being responsible for approximately half of all progress. Progress has been smooth on the scale of years since the 1960s, except for the past five.

Go programs have improved about one stone per year for the last three decades. Hardware doublings produce diminishing Elo gains on a scale consistent with accounting for around half of all progress. Improvements in a variety of physics simulations (selected after the fact to exhibit performance increases due to software) appear to be roughly half due to hardware progress.

The largest number factored to date has grown by about 5.5 digits per year for the last two decades; computing power increased ten-thousand-fold over this period, and it is unclear how much of the increase is due to hardware progress.

Some mixed integer programming (MIP) algorithms, run on modern MIP instances with modern hardware, have roughly doubled in speed each year. MIP is an important optimization problem, but one which has been called to attention after the fact due to performance improvements. Other optimization problems have had more inconsistent (and harder to determine) improvements.

Various forms of machine learning have had steeply diminishing progress in percentage accuracy over recent decades. Some vision tasks have recently seen faster progress.

Note that these points have not been updated for developments since 2013, and machine learning in particular is generally observed to have seen more progress very recently (as of 2017).

Figures

Below are assorted figures mass-extracted from Algorithmic Progress in Six Domains, some more self-explanatory than others. See the paper for their descriptions.

Page-27-Image-1311 Page-28-Image-1394 Page-29-Image-1395 Page-31-Image-1396 Page-32-Image-1397 Page-40-Image-1827 Page-41-Image-1828 Page-42-Image-1829 Page-43-Image-1830 Page-43-Image-1831 Page-47-Image-1832 Page-48-Image-1833 Page-48-Image-1834 Page-49-Image-1835 Page-50-Image-1836 Page-51-Image-1837 Page-51-Image-1838 Page-52-Image-1839 Page-52-Image-1840 Page-53-Image-1841 Page-53-Image-1842 Page-54-Image-1843 Page-54-Image-1844

Funding of AI Research

Provisional data suggests:

  • Equity deals made with startups in AI were worth about $5bn in 2016, and this value has been growing by around 50% per year in recent years.
  • The number of equity deals in AI startups globally is growing at around 30% per year, and was estimated at 658 in 2016.
  • NSF funding of IIS, a section of computer science that appears to include AI and two other areas, has increased at around 9% per year over the past two decades.

(Updated February 2017)

Background

Artificial Intelligence research is funded both publicly and privately. This page currently contains some data on private funding globally, public funding in the US,  and national government announcements of plans relating to AI funding. This page should not currently be regarded as an exhaustive summary of data available on these topics or on AI funding broadly.

Details 

AI startups

According to CB Insights, between the start of 2012 and the end of 2016, the number of equity deals being made with startups in artificial intelligence globally grew by a factor of four to 658 (around 30% per year), and the value of funding grew by a factor of over eight to $5 billion (around 50% per year).1 Their measure includes both startups developing AI techniques, and those applying existing AI techniques to problems in areas such as healthcare or advertising. They provide Figure 1 below, with further details of the intervening years. We have not checked the trustworthiness or completeness of CB Insights’ data.

Figure 1: Number of new equity deals supporting AI-related startups, and dollar values of disclosed investments over 2012-2015, according to CB Insights.

US National Science Foundation

In 2014 Muehlhauser and Sinick wrote:

In 2011, the National Science Foundation (NSF) received $636 million for funding CS research (through CISE). Of this, $169 million went to Information and Intelligent Systems (IIS). IIS has three programs: Cyber-Human Systems (CHS), Information Integration and Informatics (III) and Robust Intelligence (RI). If roughly 1/3 of the funding went to each of these, then $56 million went to Robust Intelligence, so 9% of the total CS funding. (Some CISE funding may have gone to AI work outside of IIS — that is, via ACI, CCF, or CNS — but at a glance, non-IIS AI funding through CISE looks negligible.)

The NSF Budget for Information and Intelligent Systems (IIS) has generally increased between 4% and 20% per year since 1996, with a one-time percentage boost of 60% in 2003, for a total increase of 530% over the 15 year period between 1996 and 2011.[14 {See table with upper left-hand corner A367 in the spreadsheet.}] “Robust Intelligence” is one of three program areas covered by this budget.

As of February 2017, CISE (Computer and Information Science and Engineering) covers five categories, and IIS appears to be the most relevant one.2 IIS still has three programs, of which Robust Intelligence is one.3

NSF funding into both CISE and IIS (the relevant subcategory) from 2009 to 2017 shows a steady rise.4 IIS funding as a percentage of CISE funding fluctuates, and has gone down in this time period. The following table summarizes data from NSF, collected by Finan Adamson in 2016. The figures below it (2 and 3) combine this data with some collected previously in this spreadsheet linked by Muehlhauser and Sinick. Over 21 years, IIS funding has increased fairly evenly, at 9% per year overall.

Fiscal Year IIS (Information and Intelligent Systems) Funding

In Millions of $

Total CISE (Computer and Information Science and Engineering Funding in Millions of $ IIS Funding as a % of total CISE Funding
2017 (Requested) 207.20 994.80 20.8
2016 (Estimate) 194.90 935.82 20.8
2015 (Actual) 194.58 932.98 20.9
2014 (Actual) 184.87 892.60 20.7
2013 (Actual) 176.23 858.13 20.5
2012 (Actual) 176.58 937.16 18.8
2011 (Actual) 169.14 636.06 26.5
2010 (Actual) 163.21 618.71 26.4
2009 (Actual) 150.93 574.50 26.3

IIS funding combined sources

Figure 2: Annual NSF funding to IIS

IIS funding growth

Figure 3: Yearly growth in NSF funding to IIS

National governments

US

On May, 3 2016 white house Deputy U.S. Chief Technology Officer Ed Felten announced a series of workshops and an interagency group to learn more about the benefits and risks of artificial intelligence.5

The Pentagon intended to include a request for $12-15 Billion to fund AI weapon technology in its 2017 fiscal year budget.6

Japan

Ms Kurata from the Embassy of Japan introduced Japan’s fifth Science and Technology Basic Plan, a ¥26 trillion government investment that will run between 2016-2020 and aims to promote R&D to establish a super smart society.7

China

The Chinese government announced in 2016 that it plans to create a “100 billion level” ($15 billion USD) artificial intelligence market by 2018. In their statement, the Chinese government defined artificial intelligence as a “branch of computer science where machines have human-like intelligence” and includes robots, natural language processing, and image recognition.8

South Korea

The South Korean government announced on March 17, 2016 that it would spend 1 trillion won (US$840 million) by 2020 on Artificial Intelligence. They plan to fund a high profile research center joined by Samsung and LG Electronics, SKT, KT, Naver and Hyundai Motor.9

Relevance

Financial investment in AI research is interesting because as an input to AI progress, it may help in forecasting progress. To further that goal, we are also interested in examining the relationship of funding to progress.

Investment can also be read as an indicator of investors’ judgments of the promise of AI.

Notable missing data

  • Private funding of AI other than equity deals
  • Public funding of AI research in relevant nations other than the US
  • Funding for internationally collaborative AI projects.


 

2016 Expert Survey on Progress in AI

The 2016 Expert Survey on Progress in AI is a survey of machine learning researchers that AI Impacts ran in 2016.

Details

Results have not yet been published.

The full list of questions is available here. Participants received randomized subsets of these questions.

 

Concrete AI tasks for forecasting

This page contains a list of relatively well specified AI tasks designed for forecasting. Currently all entries were used in the 2016 Expert Survey on Progress in AI.

List

  1. Translate a text written in a newly discovered language into English as well as a team of human experts, using a single other document in both languages (like a Rosetta stone). Suppose all of the words in the text can be found in the translated document, and that the language is a difficult one.
  2. Translate speech in a new language given only unlimited films with subtitles in the new language. Suppose the system has access to training data for other languages, of the kind used now (e.g. same text in two languages for many languages and films with subtitles in many languages).
  3. Perform translation about as good as a human who is fluent in both languages but unskilled at translation, for most types of text, and for most popular languages (including languages that are known to be difficult, like Czech, Chinese and Arabic).
  4. Provide phone banking services as well as human operators can, without annoying customers more than humans. This includes many one-off tasks, such as helping to order a replacement bank card or clarifying how to use part of the bank website to a customer.
  5. Correctly group images of previously unseen objects into classes, after training on a similar labeled dataset containing completely different classes. The classes should be similar to the ImageNet classes.
  6. One-shot learning: see only one labeled image of a new object, and then be able to recognize the object in real world scenes, to the extent that a typical human can (i.e. including in a wide variety of settings). For example, see only one image of a platypus, and then be able to recognize platypuses in nature photos. The system may train on labeled images of other objects.Currently, deep networks often need hundreds of examples in classification tasks1, but there has been work on one-shot learning for both classification2 and generative tasks3.
  7. See a short video of a scene, and then be able to construct a 3D model of the scene good enough to create a realistic video of the same scene from a substantially different angle.
    For example, constructing a short video of walking through a house from a video taking a very different path through the house.
  8. Transcribe human speech with a variety of accents in a noisy environment as well as a typical human can.
  9. Take a written passage and output a recording that can’t be distinguished from a voice actor, by an expert listener.
  10. Routinely and autonomously prove mathematical theorems that are publishable in top mathematics journals today, including generating the theorems to prove.
  11. Perform as well as the best human entrants in the Putnam competition—a math contest whose questions have known solutions, but which are difficult for the best young mathematicians.
  12. Defeat the best Go players, training only on as many games as the best Go players have played.
    For reference, DeepMind’s AlphaGo has probably played a hundred million games of self-play, while Lee Sedol has probably played 50,000 games in his life1.
  13. Beat the best human Starcraft 2 players at least 50% of the time, given a video of the screen.
    Starcraft 2 is a real time strategy game characterized by:

    • Continuous time play
    • Huge action space
    • Partial observability of enemies
    • Long term strategic play, e.g. preparing for and then hiding surprise attacks.
  14. Play a randomly selected computer game, including difficult ones, about as well as a human novice, after playing the game less than 10 minutes of game time. The system may train on other games.
  15. Play new levels of Angry Birds better than the best human players. Angry Birds is a game where players try to efficiently destroy 2D block towers with a catapult. For context, this is the goal of the IJCAI Angry Birds AI competition1.
  16. Outperform professional game testers on all Atari games using no game-specific knowledge. This includes games like Frostbite, which require planning to achieve sub-goals and have posed problems for deep Q-networks1, 2.
  17. Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.

    For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games1, but used hundreds of hours of play to train2.

  18. Fold laundry as well and as fast as the median human clothing store employee.
  19. Beat the fastest human runners in a 5 kilometer race through city streets using a bipedal robot body.
  20. Physically assemble any LEGO set given the pieces and instructions, using non- specialized robotics hardware.

    For context, Fu 20161 successfully joins single large LEGO pieces using model based reinforcement learning and online adaptation.
  21. Learn to efficiently sort lists of numbers much larger than in any training set used, the way Neural GPUs can do for addition1, but without being given the form of the solution.

    For context, Neural Turing Machines have not been able to do this2, but Neural Programmer-Interpreters3 have been able to do this by training on stack traces (which contain a lot of information about the form of the solution).

  22. Write concise, efficient, human-readable Python code to implement simple algorithms like quicksort. That is, the system should write code that sorts a list, rather than just being able to sort lists.

    Suppose the system is given only:

    • A specification of what counts as a sorted list
    • Several examples of lists undergoing sorting by quicksort
  23. Answer any “easily Googleable” factoid questions posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.

    Examples of factoid questions: “What is the poisonous substance in Oleander plants?” “How many species of lizard can be found in Great Britain?”

  24. Answer any “easily Googleable” factual but open ended question posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.

    Examples of open ended questions: “What does it mean if my lights dim when I turn on the microwave?” “When does home insurance cover roof replacement?”

  25. Give good answers in natural language to factual questions posed in natural language for which there are no definite correct answers.

    For example:”What causes the demographic transition?”, “Is the thylacine extinct?”, “How safe is seeing a chiropractor?”

  26. Write an essay for a high-school history class that would receive high grades and pass plagiarism detectors.

    For example answer a question like ‘How did the whaling industry affect the industrial revolution?’

  27. Compose a song that is good enough to reach the US Top 40. The system should output the complete song as an audio file.
  28. Produce a song that is indistinguishable from a new song by a particular artist, e.g. a song that experienced listeners can’t distinguish from a new song by Taylor Swift.
  29. Write a novel or short story good enough to make it to the New York Times best-seller list.
  30. For any computer game that can be played well by a machine, explain the machine’s choice of moves in a way that feels concise and complete to a layman.
  31. Play poker well enough to win the World Series of Poker.
  32. After spending time in a virtual world, output the differential equations governing that world in symbolic form.

    For example, the agent is placed in a game engine where Newtonian mechanics holds exactly and the agent is then able to conduct experiments with a ball and output Newton’s laws of motion.

Conversation with Tom Griffiths

Participants

  • Professor Tom Griffiths ­ Director of the Computational Cognitive Science Lab and the Institute of Cognitive and Brain Sciences at the University of California, Berkeley.
  • Finan Adamson ­ AI Impacts

Note: These notes were compiled by AI impacts and give an overview of the major points made by Professor Tom Griffiths.

They are available as a pdf here.

Summary

Professor Tom Griffiths answered questions about the intersection between cognitive science and AI. Topics include how studying human brains has helped with the development of AI and how it might help in the future.

How has cognitive science helped with the development of AI in the past?

AI and cognitive science were actually siblings, born at around the same time with the same parents. Arguably the first AI system, the Logic Theorist, was developed by Herb Simon and Allen Newell and was a result of thinking about the cognitive processes that human mathematicians use when developing proofs. Simon and Newell presented that work at a meeting at MIT in 1956 that many regard as the birth of cognitive science ­ it was a powerful demonstration of how thinking in computational terms could make theories of cognition precise enough that they could be tested rigorously. But it was also a demonstration of how trying to understand the ways that people solve complex problems can inspire the development of AI systems.

How is cognitive science helping with the development of AI presently?

When I think about this relationship, I imagine a positive feedback loop where cognitive science helps support AI and AI helps support cognitive science. Human beings remain the best examples of systems that can solve many of the problems that we want our AI systems to solve. As a consequence, insights that we get from studying human cognition can inform strategies that we take in developing AI systems. At the same time, progress in AI gives us new tools that we can use to formalize aspects of human cognition that we previously didn’t understand. As a consequence, we can rigorously study a wider range of questions about the mind.

How can cognitive science help with the development of AI in the future?

Deep Learning Systems

Deep learning systems are mastering a variety of basic perceptual and learning tasks, and the challenges that these systems now face look a lot like the first important stages of cognitive development in human children: identifying objects, formulating goals, and generating high­level conceptual representations. I think understanding how children do these things is potentially very relevant to making progress.

Efficient Strategies

One of the things that people have to be good at, given the limited computational capacity of our minds, is developing efficient strategies for solving problems given limited resources. That’s exactly the kind of thing that AI systems need to be able to do to operate in the real world.

What are the challenges to progress in studying brains as they relate to AI?

Birds and Planes

One important thing to keep in mind is that there are different levels at which we might see a correspondence between human minds/brains and AI systems. Critics of the idea that AI researchers can learn something from human cognition sometimes point out that the way jet airplanes work has little relationship to how birds fly, and in fact trying to mimic birds held back the development of planes. However, this analogy misses the fact that there is something important that both jets and birds share: they both have to grapple with aerodynamics. Ultimately, we can see them both as solutions to the same underlying physical problem, constrained by the same mathematical principles.It isn’t clear which aspects of human brains have the best insights that could cross over to AI. Examples of places to look include the power of neurons as computational units, the efficiency of particular cognitive strategies, or the structure of the computational problem that is being solved. This last possibility — looking at abstract computational problems and their ideal solutions — is the place where I think we’re likely to find the equivalent of aerodynamics for intelligent systems.

What blind spots does the field of AI have that could be addressed by studying cognitive science?

I don’t think they’re blind spots, they are problems that everybody is aware are hard ­ things like forming high­level actions for reinforcement learning, formulating goals, reasoning about the intentions of others, developing high­level conceptual representations, learning language from linguistic input alone, learning from very small amounts of data, discovering causal relationships through observation and experimentation, forming effective cognitive strategies, and managing your cognitive resources are all cases where we can potentially learn a lot from studying human cognition.

How does cognitive science relate to AI value alignment?

Theory of Mind

Inferring the preferences or goals of another person from their behavior ­ something that human children begin to do in infancy and gradually develop in greater sophistication over the first few years of life. This is part of a broader piece of cognitive machinery that developmental psychologists have studied extensively.

What risks might be mitigated by greater collaboration between those who study human brains and those building AI?

We’re already surrounded by autonomous agents that have the capacity to destroy all human life, but most of the time operate completely safely. Those autonomous agents are of course human beings. So that raises an interesting question: how is it that we’re able to create human­compatible humans? Answering this question might give us some insights that are relevant to building human­compatible AI systems. It’s certainly not going to give us all the answers ­ many of the issues in AI safety arise because of concerns about super­human intelligence and a capacity for self­modification that goes beyond the human norm ­ but I think it’s an interesting avenue to pursue.

Returns to scale in research

When universities or university departments produce research outputs—such as published papers—they sometimes experience increasing returns to scale, sometimes constant returns to scale, and sometimes decreasing returns to scale. At the level of nations however, R&D tends to see increasing returns to scale. These results are preliminary.

Background

“Returns to scale” refers to the responsiveness of a process’ outputs when all inputs (e.g. researcher hours, equipment) are increased by a certain proportion. If all outputs (e.g. published papers, citations, patents) increase by that same proportion, the process is said to exhibit constant returns to scale. Increasing returns to scale and decreasing returns to scale refer to situations where outputs still increase, but by a higher or lower proportion, respectively.

Assessing returns to scale in research may be useful in predicting certain aspects of the development of artificial intelligence, in particular the dynamics of an intelligence explosion.

Results

The conclusions in this article are drawn from an incomplete review of academic literature assessing research efficiency, presented in Table 1. These papers assess research in terms of its direct outputs such as published papers, citations, and patents. The broader effects of the research are not considered.

Most of the papers listed below use the Data Envelopment Analysis (DEA) technique, which is a quantitative technique commonly used to assess the efficiency of universities and research activities. It is capable of isolating the scale efficiency of the individual departments, universities or countries being studied.

Paper Level of comparison Activities assessed Results pertaining to returns to scale
Wang & Huang 2007 Countries’ overall R&D activities Research Increasing returns to scale in research are exhibited by more than two-thirds of the sample
Kocher, Luptacik & Sutter 2006 Countries’ R&D in economics Research Increasing returns to scale are found in all countries in the sample except the US
Cherchye & Abeele 2005 Dutch universities’ research in Economics and Business Management Research Returns to scale vary between decreasing, constant and increasing depending on each university’s specialization
Johnes & Johnes 1993 UK universities’ research in economics Research Constant returns to scale are found in the sample as a whole
Avkiran 2001 Australian universities Research, education Constant returns to scale found in most sampled universities
Ahn 1988 US universities Research, education Decreasing returns to scale on average
Johnes 2006 English universities Research, education Close to constant returns to scale exhibited by most universities sampled
Kao & Hung 2008 Departments of a Taiwanese university Research, education Increasing returns to scale exhibited by the five most scale-inefficient departments. However, no aggregate measure of returns to scale within the sample is presented.

Table 1: Sample of studies of research efficiency that assess returns to scale
Note: This table only identifies increasing/constant/decreasing returns to scale, rather than the size of this effect. Although DEA can measure the relative size of the effect for individual departments/universities/countries within a sample, such results cannot be readily compared between samples/studies.

Discussion of results

Of the studies listed in Table 1, the first four are the most relevant to this article, since they focus solely on research inputs and outputs. While the remaining four include educational inputs and outputs, they can still yield worthwhile insights.

Table 1 implies a difference between country-level and university-level returns to scale in research.

  • The two studies assessing R&D efficiency at the country level, Wang & Huang (2007) and Kocher, Luptacik & Sutter (2006), both identify increasing returns to scale.
  • The two university-level studies that assessed the scale efficiency of research alone found mixed results. Concretely, Johnes & Johnes (1993) concluded that returns to scale are constant among UK universities, and Cherchye & Abeele (2005) concluded that returns to scale vary among Dutch universities. This ambiguity is echoed by the remainder of the studies listed above, which assess education and research simultaneously and which find evidence of constant, decreasing and increasing returns to scale in different contexts.

Such differences are consistent with the possibility that scale efficiency may be influenced by scale (size) itself. In this framework, as an organisation increases its size, it may experience increasing returns to scale initially, resulting in increased efficiency. However, the efficiency gains from growth may not continue indefinitely; after passing a certain threshold the organisation may experience decreasing returns to scale. The threshold would represent the point of scale efficiency, at which returns to scale are constant and efficiency is maximized with respect to size.

Under this framework, size will influence whether increasing, constant or decreasing returns to scale are experienced. Applying this to research activities, the observation of different returns to scale between country-level and university-level research may mean that the size of a country’s overall research effort and the typical size of its universities are not determined by similar factors. For example, if increasing returns to scale at the country level and decreasing returns to scale at the university level are observed, this may indicate that the overall number of universities is smaller than needed to achieve scale efficiency, but that most of these universities are individually too large to be scale efficient.

Other factors may also contribute to the differences between university-level and country-level observations.

  • The country level studies use relatively aggregated data, capturing some of the non-university research and development activities in the countries sampled.
  • Country level research effort is not necessarily subject to some of the constraints which may cause decreasing returns to scale in large universities, such as excessive bureaucracy.
  • Results may be arbitrarily influenced by differences in the available input and output metrics at the university versus country level.

Limitations to conclusions drawn

One limitation of this article is the small scope of the literature review. A more comprehensive review may reveal a different range of conclusions.

Another limitation is that the research outputs studied—published papers, citations, and patents, inter alia—cannot be assumed to correspond directly to incremental knowledge or productivity. This point is expanded upon under “Topics for further investigation” below.

Further limitations arise due to the DEA technique used by most of the studies in Table 1.

  • DEA is sensitive to the choice of inputs and outputs, and to measurement errors.
  • Statistical hypothesis tests are difficult within the DEA framework, making it more difficult to separate signal from noise in interpreting results.
  • DEA identifies relative efficiency (composed of scale efficiency and also “pure technical efficiency”) within the sample, meaning that at least one country, university, or department is always identified as fully efficient (including exhibiting full scale efficiency or constant returns to scale). Of course, in practice, no university, organisation or production process is perfectly efficient. Therefore, conclusions drawn from DEA analysis are likely to be more informative for countries, universities, or departments that are not identified as fully efficient.
  • It may be questionable whether such a framework—where an optimal scale of production exists, past which decreasing returns to scale are experienced—is a good reflection of the dynamics of research activities. However, the frequent use of the DEA framework in assessing research activities would suggest that it is appropriate.

Topics for further investigation

The scope of this article is limited to direct research outputs (such as published papers, citations, and patents). While this is valuable, stronger conclusions could be drawn if this analysis were combined with further work investigating the following:

  • The impact of other sources of new knowledge apart from universities or official R&D expenditure. For example, innovations in company management discovered through “learning by doing” rather than through formal research may be an important source of improvement in economic productivity.
  • The translation of research outputs (such as published papers, citations, and patents) into incremental knowledge, and the translation of incremental knowledge into extra productive capacity. Assessment of this may be achievable through consideration of the economic returns to research, or of the value of patents generated by research.

Implications for AI

The scope for an intelligence explosion is likely to be greater if the returns to scale in research are greater. In particular, an AI system capable of conducting research into the improvement of AI may be able to be scaled up faster and more cheaply than the training of human researchers, for example through deployment on additional hardware. In addition, in the period before any intelligence explosion, a scaling-up of AI research may be observed, especially if the resultant technology were seen to have commercial applications.

This review is one component of a larger project to quantitatively model an intelligence explosion. This project, in addition to drawing upon the conclusions in this article, will also consider inter alia the effect of intelligence on research productivity, and actual increases in artificial intelligence that are plausible from research efforts.

One Comment

  1. AI Impacts – The AI Impacts Blog Says :

    2015-04-01 at 7:27 AM

    […] Articles […]