Metasurvey: predict the predictors

As I mentioned earlier, we’ve been making a survey for AI researchers.

The survey asks when AI will be able to do things like build a lego kit according to the instructions, be a surgeon, or radically accelerate global technological development. It also asks about things like intelligence explosions, safety research, how hardware hastens AI progress, and what kinds of disagreement AI researchers have with each other about timelines.

We wanted to tell you more about the project before actually surveying people, to make criticism more fruitful. However it turned out that we wanted to start sending out the survey soon even more than that, so we did. We did get an abundance of private feedback, including from readers of this blog, for which we are grateful.

We have some responses so far, and still have about a thousand people to ask. Before anyone (else) sees the results though, I thought it might be amusing to guess what they look like. That way, you can know whether you should be surprised when you see the results, and we can know more about whether running surveys like this might actually change anyone’s beliefs about anything.

So we made a second copy of the survey to act as metasurvey, in which you can informally register your predictions.

If you want to play, here is how it works:

  1. Go to the survey here.
  2. Instead of answering the questions as they are posed, guess what the median answer given by our respondents is for each question.
  3. If you want to guess something other than the median given by our other respondents, do so, then write what you are predicting in the box for comments at the end. (e.g. maybe you want to predict the mode, or the interquartile range, or what the subset of respondents who are actually AI researchers say).
  4. If you want your predictions to be identifiable to you, give us your name and email at the end.  This will for instance let us alert you if we notice that you are surprisingly excellent at predicting. We won’t make names or emails public.
  5. At the end, you should be redirected to a printout of your answers, which you can save somewhere if you want to be able to demonstrate later how right you were about stuff. There is a tiny pdf export button in the top right corner.
  6. You will only get a random subset of questions to predict, because that’s how the survey works. If you want to make more predictions, the printout has all of the questions.
  7. We might publish the data or summaries of it, other than names and email addresses, in what we think is an unidentifiable form.

Some facts about the respondents, to help predict them:

  • They are NIPS 2015/ICML 2015 authors (so a decent fraction are not AI researchers)
  • There are about 1600 of them, before we exclude people who don’t have real email addresses etc.

John Salvatier points out to me that the Philpapers survey did something like this (I think more formally). It appears to have been interesting—they find that ‘philosophers have substantially inaccurate sociological beliefs about the views of their peers’, and that ‘In four cases [of thirty], the community gets the leading view wrong…In three cases, the community predicts a fairly close result when in fact a large majority supports the leading view’. If it turned out that people thinking about the future of AI were that wrong about the AI community’s views, I think that would be good to know about.


Featured image: By DeFacto (Own work) [CC BY-SA 4.0], via Wikimedia Commons 

Concrete AI tasks bleg

We’re making a survey. I hope to write soon about our general methods and plans, so anyone kind enough to criticize them has the chance. Before that though, we have a different request: we want a list of concrete tasks that AI can’t do yet, but may achieve sometime between now and surpassing humans at everything. For instance, ‘beat a top human Go player in a five game match’ would have been a good example until recently. We are going to ask AI researchers to predict a subset of these tasks, to better chart the murky path ahead.

We hope to:

  1. Include tasks from across the range of AI subfields
  2. Include tasks from across the range of time (i.e. some things we can nearly do, some things that are really hard)
  3. Have the tasks relate relatively closely to narrowish AI projects, to make them easier to think about (e.g. winning a 5k bipedal race is fairly close to existing projects, whereas winning an interpretive dance-off would require a broader mixture of skills, so is less good for our purposes)
  4. Have the tasks relate to specific hard technical problems (e.g. one-shot learning or hierarchical planning)
  5. Have the tasks relate to large changes in the world (e.g. replacing all drivers would viscerally change things)

Here are some that we have:

  • Win a 5km race over rough terrain against the best human 5k runner.
  • Physically assemble any LEGO set given the pieces and instructions.
  • Be capable of winning an International Mathematics Olympiad Gold Medal (ignoring entry requirements). That is, solve mathematics problems with known solutions that are hard for the best high school students in the world, better than those students can solve them.
  • Watch a human play any computer game a small number of times (say 5), then perform as well as human novices at the game without training more on the game. (The system can train on other games).
  • Beat the best human players at Starcraft, with a human-like limit on moves per second.
  • Translate a new language using unlimited films with subtitles in the new language, but the kind of training data we have now for other languages (e.g. same text in two languages for many languages and films with subtitles in many languages).
  • Be about as good as unskilled human translation for most popular languages (including difficult languages like Czech, Chinese and Arabic).
  • Answer tech support questions as well as humans can.
  • Train to do image classification on half a dataset (say, ImageNet) then take the other half of the images, containing previously unseen objects, and separate them into the correct groupings (without the correct labels of course)
  • See a small number of examples of a new object (say 10), then be able to recognize it in novel scenes as well as humans can.
  • Reconstruct a 3d scene from a 2d image as reliably as a human can.
  • Transcribe human speech with a variety of accents in a quiet environment as well as humans can
  • Routinely and autonomously prove mathematical theorems that are publishable in mathematics journals today

Can you think of any interesting ones?

Mysteries of global hardware

This blog post summarizes recent research on our Global Computing Capacity page. See that page for full citations and detailed reasoning.

We recently investigated this intriguing puzzle:

FLOPS (then) apparently performed by all of  the world’s computing hardware: 3 x 1022 – 3 x 1024

(Support: Vipul Naik estimated 1022 – 1024 IPS by February 2014, and reports a long term growth rate of 85% per year for application-specific computing hardware, which made up 97% of hardware by 2007, suggesting total hardware should have tripled by now. FLOPS are roughly equivalent to IPS).

Price of FLOPS: $3 x 10-9

(Support: See our page on it)

Implied value of global hardware: $1014-1016

(Support: 3 x 1022 to 3 x 1024 * $3 x 10-9 = $1014-1016 )

Estimated total global wealth: $2.5 * 1014

(Support: see for instance Credit Suisse)

Implication: 40%-4,000% of global wealth is in the form of computing hardware.

Question: What went wrong?


Could most hardware be in large-scale, unusually cheap, projects? Probably not – our hardware price figures include supercomputing prices. Also, Titan is a supercomputer made from GPUs and CPUs, and doesn’t seem to be cheaper per computation than the component GPUs and CPUs.

Could the global wealth figure be off? We get roughly the same anomaly when comparing global GDP figures and the value of computation used annually.

Our solution

We think the estimate of global hardware is the source of the anomaly. We think this because the amount that people apparently spend on hardware each year doesn’t seem like it would buy nearly this much hardware.

Annual hardware revenue seems to be around $300bn-$1,500bn recently.1 Based on the prices of FLOPS, (and making some assumptions, e.g. about how long hardware lasts) this suggests the total global stock of hardware can perform around 7.5 x 1019 – 1.5 x 1021 FLOPS/year.2 However the lower end of this range is below a relatively detailed estimate of global hardware made in 2007. It seems unlikely that the hardware base actually shrunk in recent years, so we push our estimate up to 2 x 1020 – 1.5 x 1021 FLOPS/year.

This is about 0.3%-1.9% of global GDP—a more plausible number, we think—so resolves the original problem. But a big reason Naik gave such high estimates for global hardware was that the last time someone measured it—between 1986 and 2007—computing hardware was growing very fast. General purpose computing was growing at 61% per year, and the application specific computers studied (such as GPUs) were growing at 86% per year. Application specific computers made up the vast majority too, so we might expect growth to progress at close to 86% per year.

However if global hardware is as low as we estimate, the growth rate of total computing hardware since 2007 has been 25% or less, much lower than in the previous 21 years. Which would present us with another puzzle: what happened?

We aren’t sure, but this is still our best guess for the solution to the original puzzle. Hopefully we will have time to look into this puzzle too, but for now I’ll leave interested readers to speculate.


Added March 11 2016: Assuming the 2007 hardware figures are right, how much of the world’s wealth was in hardware in 2007? Back then, GWP was probably about $66T (in 2007 dollars). According to Hilbert & Lopez, the world could then perform 2 x 1020 IPS, which is  2 x 1014 MIPS. According to Muehlhauser & Rieber, hardware cost roughly $5 x 10-3/MIPS in 2007. Thus the total value of hardware would have been around $5 x 10-3/MIPS x 2 x 1014 MIPS = $1012 (a trillion dollars), or 1.5% of GWP.


By An employee of the Oak Ridge National Laboratory. -, Public Domain,

Titan Supercomputer. By an employee of the Oak Ridge National Laboratory.


Recently at AI Impacts

We’ve been working on a few longer term projects lately, so here’s an update in the absence of regular page additions.

New researchers

Stephanie Zolayvar and John Salvatier have recently joined us, to try out research here.

Stephanie recently moved to Berkeley from Seattle, where she was a software engineer at Google. She is making sense of a recent spate of interviews with AI researchers (more below), and investigating purported instances of discontinuous progress. She also just made this glossary of AI risk terminology.

John also recently moved to Berkeley from Seattle, where he was a software engineer at Amazon. He has been interviewing AI researchers with me, helping to design a new survey on AI progress, and evaluating different research avenues.

I’ve also been working on several smaller scale collaborative projects with other researchers.

AI progress survey

We are making a survey, to help us ask AI researchers about AI progress and timelines. We hope to get answers that are less ambiguous and more current than past timelines surveys. We also hope to learn about the landscape of progress in more detail than we have, to help guide our research.

AI researcher interviews

We have been having in-depth conversations with AI researchers about AI progress and predictions of the future. This is partly to inform the survey, but mostly because there are lots of questions where we want elaborate answers from at least one person, instead of hearing everybody’s one word answers to potentially misunderstood questions. We plan to put up notes on these conversations soon.

Bounty submissions

Ten people have submitted many more entries to our bounty experiment. We are investigating these, but are yet to verify that any of them deserve a bounty. Our request was for examples of discontinuous progress, or very early action on a risk. So far the more lucrative former question has been substantially more popular.


We just put up a glossary of AI safety terms. Having words for things often helps in thinking about them, so we hope to help in the establishment of words for things. If you notice important words without entries, or concepts without words, please send them our way.

  1. “In 2012, the worldwide computing hardware spending is expected at 418 billion U.S. dollars.” – Statista

    Statista’s figure of ‘Forecast hardware spendings worldwide from 2013 to 2019 (in billion U.S. dollars)’ reports a 2013 figure of $987bn, increasing to $1075bn in 2015. It is unclear why these spending forecasts differ so much from Statista’s reported 2012 spending.

    Statista also reports a prediction of 2016 hardware revenue at $409bn Euro, which is around $447bn USD. It looks like the prediction was made in 2012. Note that revenue is not identical to spending, but is probably a reasonable proxy.

    For 2009, Reuters reports a substantially lower revenue figure than Statista, suggesting Statista figures may be systematically high, e.g. by being relatively inclusive:

    “The global computer hardware market had total revenue of $193.2 billion in 2009, representing a compound annual growth rate (CAGR) of 5.4% for the period spanning 2005-2009.” – Research and Markets press release, Reuters,

    Statista‘s figure indicates revenue of 296 billion Euros, or around $320 billion USD in 2009 (this is the same figure as for 2007, which may be the only number you can see without a subscription—so while it may look like we made an error here, we do have the figure for the correct year). This is around 50% more than the Research and Markets press release.

    From these figures we estimate that spending on hardware in 2015 was $300bn-$1,500bn

  2. See our page on this topic for all the citations and calculations