Concrete AI tasks bleg

We’re making a survey. I hope to write soon about our general methods and plans, so anyone kind enough to criticize them has the chance. Before that though, we have a different request: we want a list of concrete tasks that AI can’t do yet, but may achieve sometime between now and surpassing humans at everything. For instance, ‘beat a top human Go player in a five game match’ would have been a good example until recently. We are going to ask AI researchers to predict a subset of these tasks, to better chart the murky path ahead.

We hope to:

  1. Include tasks from across the range of AI subfields
  2. Include tasks from across the range of time (i.e. some things we can nearly do, some things that are really hard)
  3. Have the tasks relate relatively closely to narrowish AI projects, to make them easier to think about (e.g. winning a 5k bipedal race is fairly close to existing projects, whereas winning an interpretive dance-off would require a broader mixture of skills, so is less good for our purposes)
  4. Have the tasks relate to specific hard technical problems (e.g. one-shot learning or hierarchical planning)
  5. Have the tasks relate to large changes in the world (e.g. replacing all drivers would viscerally change things)

Here are some that we have:

  • Win a 5km race over rough terrain against the best human 5k runner.
  • Physically assemble any LEGO set given the pieces and instructions.
  • Be capable of winning an International Mathematics Olympiad Gold Medal (ignoring entry requirements). That is, solve mathematics problems with known solutions that are hard for the best high school students in the world, better than those students can solve them.
  • Watch a human play any computer game a small number of times (say 5), then perform as well as human novices at the game without training more on the game. (The system can train on other games).
  • Beat the best human players at Starcraft, with a human-like limit on moves per second.
  • Translate a new language using unlimited films with subtitles in the new language, but the kind of training data we have now for other languages (e.g. same text in two languages for many languages and films with subtitles in many languages).
  • Be about as good as unskilled human translation for most popular languages (including difficult languages like Czech, Chinese and Arabic).
  • Answer tech support questions as well as humans can.
  • Train to do image classification on half a dataset (say, ImageNet) then take the other half of the images, containing previously unseen objects, and separate them into the correct groupings (without the correct labels of course)
  • See a small number of examples of a new object (say 10), then be able to recognize it in novel scenes as well as humans can.
  • Reconstruct a 3d scene from a 2d image as reliably as a human can.
  • Transcribe human speech with a variety of accents in a quiet environment as well as humans can
  • Routinely and autonomously prove mathematical theorems that are publishable in mathematics journals today

Can you think of any interesting ones?

Mysteries of global hardware

This blog post summarizes recent research on our Global Computing Capacity page. See that page for full citations and detailed reasoning.

We recently investigated this intriguing puzzle:

FLOPS (then) apparently performed by all of  the world’s computing hardware: 3 x 1022 – 3 x 1024

(Support: Vipul Naik estimated 1022 – 1024 IPS by February 2014, and reports a long term growth rate of 85% per year for application-specific computing hardware, which made up 97% of hardware by 2007, suggesting total hardware should have tripled by now. FLOPS are roughly equivalent to IPS).

Price of FLOPS: $3 x 10-9

(Support: See our page on it)

Implied value of global hardware: $1014-1016

(Support: 3 x 1022 to 3 x 1024 * $3 x 10-9 = $1014-1016 )

Estimated total global wealth: $2.5 * 1014

(Support: see for instance Credit Suisse)

Implication: 40%-4,000% of global wealth is in the form of computing hardware.

Question: What went wrong?


Could most hardware be in large-scale, unusually cheap, projects? Probably not – our hardware price figures include supercomputing prices. Also, Titan is a supercomputer made from GPUs and CPUs, and doesn’t seem to be cheaper per computation than the component GPUs and CPUs.

Could the global wealth figure be off? We get roughly the same anomaly when comparing global GDP figures and the value of computation used annually.

Our solution

We think the estimate of global hardware is the source of the anomaly. We think this because the amount that people apparently spend on hardware each year doesn’t seem like it would buy nearly this much hardware.

Annual hardware revenue seems to be around $300bn-$1,500bn recently.1 Based on the prices of FLOPS, (and making some assumptions, e.g. about how long hardware lasts) this suggests the total global stock of hardware can perform around 7.5 x 1019 – 1.5 x 1021 FLOPS/year.2 However the lower end of this range is below a relatively detailed estimate of global hardware made in 2007. It seems unlikely that the hardware base actually shrunk in recent years, so we push our estimate up to 2 x 1020 – 1.5 x 1021 FLOPS/year.

This is about 0.3%-1.9% of global GDP—a more plausible number, we think—so resolves the original problem. But a big reason Naik gave such high estimates for global hardware was that the last time someone measured it—between 1986 and 2007—computing hardware was growing very fast. General purpose computing was growing at 61% per year, and the application specific computers studied (such as GPUs) were growing at 86% per year. Application specific computers made up the vast majority too, so we might expect growth to progress at close to 86% per year.

However if global hardware is as low as we estimate, the growth rate of total computing hardware since 2007 has been 25% or less, much lower than in the previous 21 years. Which would present us with another puzzle: what happened?

We aren’t sure, but this is still our best guess for the solution to the original puzzle. Hopefully we will have time to look into this puzzle too, but for now I’ll leave interested readers to speculate.


Added March 11 2016: Assuming the 2007 hardware figures are right, how much of the world’s wealth was in hardware in 2007? Back then, GWP was probably about $66T (in 2007 dollars). According to Hilbert & Lopez, the world could then perform 2 x 1020 IPS, which is  2 x 1014 MIPS. According to Muehlhauser & Rieber, hardware cost roughly $5 x 10-3/MIPS in 2007. Thus the total value of hardware would have been around $5 x 10-3/MIPS x 2 x 1014 MIPS = $1012 (a trillion dollars), or 1.5% of GWP.


By An employee of the Oak Ridge National Laboratory. -, Public Domain,

Titan Supercomputer. By an employee of the Oak Ridge National Laboratory.


Recently at AI Impacts

We’ve been working on a few longer term projects lately, so here’s an update in the absence of regular page additions.

New researchers

Stephanie Zolayvar and John Salvatier have recently joined us, to try out research here.

Stephanie recently moved to Berkeley from Seattle, where she was a software engineer at Google. She is making sense of a recent spate of interviews with AI researchers (more below), and investigating purported instances of discontinuous progress. She also just made this glossary of AI risk terminology.

John also recently moved to Berkeley from Seattle, where he was a software engineer at Amazon. He has been interviewing AI researchers with me, helping to design a new survey on AI progress, and evaluating different research avenues.

I’ve also been working on several smaller scale collaborative projects with other researchers.

AI progress survey

We are making a survey, to help us ask AI researchers about AI progress and timelines. We hope to get answers that are less ambiguous and more current than past timelines surveys. We also hope to learn about the landscape of progress in more detail than we have, to help guide our research.

AI researcher interviews

We have been having in-depth conversations with AI researchers about AI progress and predictions of the future. This is partly to inform the survey, but mostly because there are lots of questions where we want elaborate answers from at least one person, instead of hearing everybody’s one word answers to potentially misunderstood questions. We plan to put up notes on these conversations soon.

Bounty submissions

Ten people have submitted many more entries to our bounty experiment. We are investigating these, but are yet to verify that any of them deserve a bounty. Our request was for examples of discontinuous progress, or very early action on a risk. So far the more lucrative former question has been substantially more popular.


We just put up a glossary of AI safety terms. Having words for things often helps in thinking about them, so we hope to help in the establishment of words for things. If you notice important words without entries, or concepts without words, please send them our way.

AI timelines and strategies

AI Impacts sometimes invites guest posts from fellow thinkers on the future of AI. These are not intended to relate closely to our current research, nor to necessarily reflect our views. However we think they are worthy contributions to the discussion of AI forecasting and strategy.

This is a guest post by Sarah Constantin.

One frame of looking at AI risk is the “geopolitical” stance. Who are the major players who might create risky strong AI? How could they be influenced or prevented from producing existential risks? How could safety-minded institutions gain power or influence over the future of AI? What is the correct strategy for reducing AI risk?

The correct strategy depends sharply on the timeline for when strong AI is likely to be developed. Will it be in 10 years, 50 years, 100 years or more? This has implications for AI safety research. If a basic research program on AI safety takes 10-20 years to complete and strong AI is coming in 10 years, then research is relatively pointless. If basic research takes 10-20 years and strong AI is coming more than 100 years from now (if at all), then research can wait. If basic research takes 10-20 years and strong AI is coming in around 50 years, then research is a good idea.

Another relevant issue for AI timelines and strategies is the boom-and-bust cycle in AI. Funding for AI research and progress on AI has historically fluctuated since the 1960s, with roughly 15 years between “booms.” The timeline between booms may change in the future, but fluctuation in investment, research funding, and popular attention seems to be a constant in scientific/technical fields.

Each AI boom has typically focused on a handful of techniques (GOFAI in the 1970’s, neural nets and expert systems in the 1980’s) which promised to deliver strong AI but eventually ran into limits and faced a collapse of funding and investment. The current AI boom is primarily focused on massively parallel processing and machine learning, particularly deep neural nets.

This is relevant because institutional and human capital is lost between booms. While leading universities can survive for centuries, innovative companies are usually only at their peak for a decade or so. It is unlikely that the tech companies doing the most innovation in AI during one boom will be the ones leading subsequent booms. (We don’t usually look to 1980’s expert systems companies for guidance on AI today.) If there were to be a Pax Googleiana lasting 50 years, it might make sense for people concerned with AI safety to just do research and development within Google. But the history of the tech industry suggests that’s not likely. Which means that any attempt to influence long-term AI risk will need to survive the collapse of current companies and the end of the current wave of popularity of AI.

The “extremely short-term AI risk scenario” (of strong AI arising within a decade) is not a popular view among experts; most contemporary surveys of AI researchers predict that strong AI will arise sometime in the mid-to-late 21st century. If we take the view that strong AI in the 2020’s is vanishingly unlikely (which is more “conservative” than the results of most AI surveys, but may be more representative of the mainstream computer science view), then this has various implications for AI risk strategy that seem to be rarely considered explicitly.

In the “long-term AI risk scenario”, there will be at least one “AI winter” before strong AI is developed. We can expect a period (or multiple periods) in the future where AI will be poorly funded and popularly discredited. We can expect that there are one or more jumps in innovation that will need to occur before human-level AI will be possible. And, given the typical life cycle of corporations, we can expect that if strong AI is developed, it will probably be developed by an institution that does not exist yet.

In the “long-term AI risk scenario”, there will probably be time to develop at least some theory of AI safety and the behavior of superintelligent agents. Basic research in computer science (and perhaps neuroscience) may well be beneficial in general from an AI risk perspective. If research on safety can progress during “AI winters” while progress on AI in general halts, then winters are particularly good news for safety. In this long-term scenario, there is no short-term imperative to cease progress on “narrow AI”, because contemporary narrow AI is almost certainly not risky.

In the “long-term AI risk scenario”, another important goal besides basic research is to send a message to the future. Today’s leading tech CEOs will not be facing decisions about strong AI; the critical decisionmakers may be people who haven’t been born yet, or people who are currently young and just starting their careers. Institutional cultures are rarely built to last decades. What can we do today to ensure that AI safety will be a priority decades from now, long after the current wave of interest in AI has come to seem faddish and misguided?

The mid- or late 21st century may be a significantly different place than the early 21st century. Economic and political situations fluctuate. The US may no longer be the world’s largest economy. Corporations and universities may look very different. Imagine someone speculating about artificial intelligence in 1965 and trying to influence the world of 2015. Trying to pass laws or influence policy at leading corporations in 1965 might not have had a lasting effect (this would be a useful historical topic to investigate in more detail.)

And what if the next fifty years looks more like the cataclysmic first half of the 20th century than the comparatively stable second half of the 20th century? How could a speculative thinker of 1895 hope to influence the world of 1945?

Educational and cultural goals, broadly speaking, seem relevant in this scenario. It will be important to have a lasting influence on the intellectual culture of future generations.

For instance: if fields of theoretical computer science relevant for AI risk are developed and included in mainstream textbooks, then the CS majors of 2050 who might grow up to build strong AI will know about the concerns being raised today as more than a forgotten historical curiosity. Of course, they might not be CS majors, and perhaps they won’t even be college students. We have to think about robust transmission of information.

In the “long-term AI risk scenario”, the important task is preparing future generations of AI researchers and developers to avoid dangerous strong AI. This means performing and disseminating and teaching basic research in new theoretical fields necessary for understanding the behavior of superintelligent agents.

A “geopolitical” approach is extremely difficult if we don’t know who the players will be. We’d like the future institutions that will eventually develop strong AI to be run and staffed by people who will incorporate AI safety into their plans. This means that a theory of AI safety needs to be developed and disseminated widely.

Ultimately, long-term AI strategy bifurcates, depending on whether the future of AI is more “centralized” or “decentralized.”

In a “centralized” future, a small number of individuals, perhaps researchers themselves, contribute most innovation in AI, and the important mission is to influence them to pursue research in helpful rather than harmful directions.

In a “decentralized” future, progress in AI is spread over a broad population of institutions, and the important mission is to develop something like “industry best practices” — identifying which engineering practices are dangerous and instituting broadly shared standards that avoid them. This may involve producing new institutions focused on safety.

Basic research is an important prerequisite for both the “centralized” and “decentralized” strategies, because currently we do not know what kinds of progress in AI (if any) are dangerous.

The “centralized” strategy means promoting something like an intellectual culture, or philosophy, among the strongest researchers of the future; it is something like an educational mission. We would like future generations of AI researchers to have certain habits of mind: in particular, the ability to reason about the dramatic practical consequences of abstract concepts. The discoverers of quantum mechanics were able to understand that the development of the atomic bomb would have serious consequences for humanity, and to make decisions accordingly. We would like the future discoverers of major advances in AI to understand the same. This means that today, we will need to communicate (through books, schools, and other cultural institutions, traditional and new) certain intellectual and moral virtues, particularly to the brightest young people.

The “decentralized” strategy will involve taking the theoretical insights from basic AI research and making them broadly implementable. Are some types of “narrow AI” particularly likely to lead to strong AI? Are there some precautions which, on the margin, make harmful strong AI less likely? Which kinds of precautions are least costly in immediate terms and most compatible with the profit and performance needs of the tech industry? To the extent that AI progress is decentralized and incremental, the goal is to ensure that it is difficult to go very far in the wrong direction. Once we know what we mean by a “wrong direction”, this is a matter of building long-term institutions and incentives that shape AI progress towards beneficial directions.

The assumption that strong AI is a long-term rather than a short-term risk affects strategy significantly. Influencing current leading players is not particularly important; promoting basic research is very important; disseminating information and transmitting culture to future generations, as well as building new institutions, is the most effective way to prepare for AI advances decades from now.

In the event that AI never becomes a serious risk, developing institutions and intellectual cultures that can successfully reason about AI is still societally valuable. The skill (in institutions and individuals) of taking theoretical considerations seriously and translating them into practical actions for the benefit of humanity is useful for civilizational stability in general. What’s important is recognizing that this is a long-term strategy — i.e. thinking more than ten years ahead. Planning for future decades looks different from taking advantage of the current boom in funding and attention for AI and locally hill-climbing.

Sarah Constantin blogs at Otium. She recently graduated from Yale with a PhD in mathematics.

  1. “In 2012, the worldwide computing hardware spending is expected at 418 billion U.S. dollars.” – Statista

    Statista’s figure of ‘Forecast hardware spendings worldwide from 2013 to 2019 (in billion U.S. dollars)’ reports a 2013 figure of $987bn, increasing to $1075bn in 2015. It is unclear why these spending forecasts differ so much from Statista’s reported 2012 spending.

    Statista also reports a prediction of 2016 hardware revenue at $409bn Euro, which is around $447bn USD. It looks like the prediction was made in 2012. Note that revenue is not identical to spending, but is probably a reasonable proxy.

    For 2009, Reuters reports a substantially lower revenue figure than Statista, suggesting Statista figures may be systematically high, e.g. by being relatively inclusive:

    “The global computer hardware market had total revenue of $193.2 billion in 2009, representing a compound annual growth rate (CAGR) of 5.4% for the period spanning 2005-2009.” – Research and Markets press release, Reuters,

    Statista‘s figure indicates revenue of 296 billion Euros, or around $320 billion USD in 2009 (this is the same figure as for 2007, which may be the only number you can see without a subscription—so while it may look like we made an error here, we do have the figure for the correct year). This is around 50% more than the Research and Markets press release.

    From these figures we estimate that spending on hardware in 2015 was $300bn-$1,500bn

  2. See our page on this topic for all the citations and calculations