Friendly AI as a global public good

A public good, in the economic sense, can be (roughly) characterized as a desirable good that is likely to be undersupplied, or not supplied at all, by private companies. It generally falls to the government to supply such goods. Examples include infrastructure networks, or a country’s military. See here for a more detailed explanation of public goods.

The provision of public goods by governments can work quite well at the national level. However, at the international level, there is no global government with the power to impose arbitrary legislation on countries and enforce it. As a result, many global public goods, such as carbon emission abatement, disease eradication, and existential risk mitigation, are partially provided or not provided.

Scott Barrett, in his excellent book Why Cooperate? The Incentive to Supply Global Public Goods, explains that not all global public goods are created equal. He develops a categorization scheme (Table 1), identifying important characteristics that influence whether they are likely to be provided, and what tools can be used to improve their likelihood of provision.

For example:

  • Climate change mitigation is classified as an “aggregate effort” global public good, since its provision depends on the aggregate of all countries’ CO2eq emissions. Provision is difficult, as countries each individually face strong incentives to pollute.
  • Defense against large Earth-bound asteroids is classified as a “single best effort” global public good, since provision requires actions by only one country (or coalition of countries). Providing this global public good unilaterally is likely to be in the interests and within the capabilities of at least one individual country, and so it is likely to be provided.
  • Nuclear non-proliferation is classified as a “mutual restraint” public good, since it is provided by countries refraining from doing something. Provision is difficult as many countries individually face strong incentives to maintain a nuclear deterrent (despite the associated economic cost).
Single best effort Weakest link Aggregate effort Mutual restraint Coordination
Supply depends on… The single best (unilateral or collective) effort The weakest individual effort The total effort of all countries Countries not doing something Countries doing the same thing
Examples Asteroid defense, knowledge, peacekeeping, suppressing an infectious disease outbreak at its source, geoengineering Disease eradication, preventing emergence of resistance and new diseases, securing nuclear materials, vessel reflagging Climate change mitigation, ozone layer protection Non-use of nuclear weapons, non-proliferation, bans on nuclear testing and biotechnology research Standards for the measurement of time, for oil tankers, and for automobiles
International cooperation needed? Yes, in many cases, to determine what should be done, and which countries should pay Yes, to establish universal minimum standards Yes, to determine the individual actions needed to achieve an overall outcome Yes, to agree on what countries should not do Yes, to choose a common standard
Financing and cost sharing needed? Yes, when the good is provided collectively Yes, in some cases Yes, with industrialized countries helping developing countries No No
Enforcement of agreement challenging? Not normally Yes, except when provision requires only coordination Yes Yes No, though participation will need to pass a threshold
International institutions for provision Treaties in some cases; international organizations, such as the UN, in other cases Consensus (World Health Assembly) or Security Council resolutions, customary law Treaties Treaties, norms, customary law Non-binding resolutions; treaties in some cases

Table 1: Simple Taxonomy of Global Public Goods
Source: Scott Barrett (2010), Why Cooperate? The Incentive to Supply Global Public Goods (location 520 of Kindle edition)

Applying the Barrett framework to friendly AI

Artificial Intelligence (AI) technology is likely to progress until the eventual creation of AI that vastly surpasses human cognitive capabilities—artificial superintelligence (ASI). The possibility of an intelligence explosion means that the first ASI system, or those that control it, might possess an unprecedented ability to shape the world according to their preferences. This event could define our entire species, leading rapidly to the full realization of humanity’s potential or causing our extinction. Since “friendly AI”—safe ASI deployed for the benefit of humanity—is a global public good, it may be informative to apply Barrett’s global public good classification scheme to analyse the different facets of this challenge.

Since this framework focuses on the incentives faced by national governments, it is most relevant to situations where ASI development is largely driven by governments, which will therefore be the focus of this article. This government-led scenario is distinct from the current situation of technology industry-led development of AI. Governments might achieve this high level of control through large-scale state-sponsored projects and regulation of private activities.

As with many global public goods, the development of friendly AI can be broken down into many components, each of which may conform to a different category within Barrett’s taxonomy. Here I will focus on those that I believe are most important for long term safety.

Arguably, one of the most concerning problems in the government-led scenario is the potential for the benefits of ASI to be captured by some subset of humanity. Humans are unfortunately much more strongly motivated by self-interest than by the common good, and this is reflected in national and international politics. This mean that, given the chance, leaders whose governments control the development of ASI might seek to capture the benefits for their country only, or some subset of their country such as their political allies, or other groups. This could be achieved by instilling values in the ASI system that favor such groups, or through the direct exertion of control over the ASI system. Protection against this possibility constitutes a “mutual restraint” public good, since its provision relies upon countries refraining from doing so. Failing to prevent this possibility may, depending on the preferences of those that control ASI, cause an existential catastrophe, for example in the form of “flawed realization” or “shriek”.

Because of this, and given the current anarchical state of international relations, any ASI-developing country is likely to be perceived as a significant security threat by other countries. Fears that any country succeeding at creating ASI would gain a large strategic advantage over other countries could readily lead to an ASI development race. In this scenario, speed may be prioritized at the expense of safety measures, for example those necessary to solve the value-loading problem (Ch. 12) and the control problem (Ch. 9). This would compound the risks of misuse of ASI explored in the previous paragraph by increasing the possibility of humanity losing control of this creation. The likelihood of an ASI development race is somewhat supported by Chalmers 2010 (footnote, p. 29).

Further, given that ASI may only be achievable on a timescale of decades, the global order prevailing when ASI is within reach may be truly multi-polar. For example, this timescale may allow both China and India to far surpass the USA in terms of economic weight, and may allow countries such as Brazil and Russia to rival the influence of the USA. With a diverse mix of world powers with differing national values, attempts at coordination and restraint could easily be undermined by mistrust.

Another facet of the global public good of friendly AI is the aforementioned technical challenges, including the value-loading problem and the control problem, which currently receive much attention in discussions of long-term AI safety. In isolation, these technical challenges can be considered a “single best effort” global public good in Barrett’s taxonomy, similar to asteroid defense or geoengineering, where it is often in the interests of some countries to unilaterally provide the good. Therefore, a substantial attempt would probably be made to solve these challenges in the government-led scenario, if race dynamics were not present. In reality, any additional advance work on this technical front is likely to be highly beneficial.

What can be done?

Without aiming to present a robust solution, this section briefly explores some of the available options, informed by insights presented by Barrett regarding mutual restraint global public goods.

A “silver bullet” solution to these institutional challenges could be achieved through the emergence of a world government capable of providing global public goods. Although this may eventually be possible, it seems unlikely within the timeframe in which ASI may be developed. Supporting progression towards this outcome may help to provide the global public goods identified above, but such action is probably insufficient alone.

In relation to mutual restraint public goods generally, Barrett identifies treaties, norms and customary law as institutional tools for provision. If a treaty requiring the necessary restraint could be enforced—Shulman mentions (p. 3) some ways in which one might be—it could be effective. However, this would still rely on countries’ willingness to voluntarily join the agreement.

Norms and custom can help achieve mutual restraint. In his book, Barrett analyses (location 2506 of Kindle edition) an important example; the taboo on the use of nuclear weapons. Thanks to strong aversion towards any destructive use of nuclear weapons, such use has not occurred since 1945. This has occurred despite numerous situations in which it would have been militarily advantageous to use a nuclear weapon, e.g. when a nuclear power was at war with a non-nuclear state. In the presence of such attitudes, any benefits to a country from using nuclear weaponry must be weighed against the costs of severe loss of international reputation, or in the extreme, the end of the taboo and consequent nuclear war.

The taboo on the use of nuclear weapons was not inevitable, but arose partly because of mutual understanding of the seriousness of the threat of nuclear war. If the potential effects of ASI are similarly well understood by all powers seeking to develop it, it is possible that a similar taboo could be created, perhaps with the help of a carefully designed treaty between those countries with meaningful ASI development capabilities. The purpose of such an arrangement would be not only to mandate the adoption of proper safety measures, but also to ensure that the benefits of ASI would be spread fairly amongst all of humanity.


To achieve positions of power, all political leaders depend heavily on their ability to amass resources and influence. Upon learning of the huge potential of ASI, such individuals may instinctively attempt to capture control of its power. They will also expect their rivals to do the same, and will strategize accordingly. Therefore, in the event of government-led ASI development, mutual restraint by ASI-developing nations would be needed to avoid attempts to capture the vast benefits of ASI for a small subset of humanity, and to avoid the harmful effects of a race to develop ASI.

Error in Armstrong and Sotala 2012

Can AI researchers say anything useful about when strong AI will arrive?

Back in 2012, Stuart Armstrong and Kaj Sotala weighed in on this question in a paper called ‘How We’re Predicting AI—or Failing To‘. They looked at a dataset of predictions about AI timelines, and concluded that predictions made by AI experts were indistinguishable from those of non-experts. (Which might suggest that AI researchers don’t have additional information).

As far as I can tell—and Armstrong and Sotala agree—this finding is based on an error. Not a fundamental philosophical error, but a spreadsheet construction and interpretation error.

The main clue that there has been a mistake is that their finding is about experts and non-experts, and their public dataset does not contain any division of people into experts and non-experts. (Hooray for publishing data!)

As far as we can tell, the column that was interpreted as ‘is this person an expert?’ was one of eight tracking ‘by what process did this person arrive at a prediction?’ The possible answers are ‘outside view’, ‘noncausal model’, ‘causal model’, ‘philosophical argument’, ‘expert authority’, ‘non-expert authority’, ‘restatement’ and ‘unclear’.

Based on comments and context, ‘expert authority’ appears to mean here that either the person who made the prediction is an expert who consulted their own intuition on something without providing further justification, or that the predictor is a non-expert who used expert judgments to inform their opinion. So the predictions not labeled ‘expert authority’ are a mixture of predictions made by experts using something other than their intuition—e.g. models and arguments—and predictions made by non-experts which are based on anything other than reference to experts. Plus restatements and unclarity that don’t involve any known expert intuition.

The reasons to think that the ‘expert authority’ column was misintepreted as an ‘expert’ column are A) that there doesn’t seem to be any other plausible expert column, B) that the number of predictions labeled with ‘expert authority’ is 62, the same as the number of experts Armstrong and Sotala claimed to have compared (and the rest of the set is 33, the number of non-experts they report), and C) Sotala suggests this is what must have happened.

How bad a problem is this? How badly does using unexplained expert opinion as a basis for prediction align with actually being an expert?

Even without knowing exactly what an expert is, we can tell the two aren’t all that well aligned because Armstrong and Sotala’s dataset contains many duplicates: multiple records of the same person making predictions in different places. All of these people appear at least twice, at least once relying ‘expert authority’ and at least once not: Rodney Brooks, Ray Kurzweil, Jürgen Schmidhuber, I. J. Good, Hans Moravec. It is less surprising that experts and non-experts have similar predictions when they are literally the same people! But multiple entries of the same people listed as experts and non-experts only accounts for a little over 10% of their data, so this is not the main thing going on.

I haven’t checked the data carefully and assessed people’s expertise, but here are other names that look to me like they fall in the wrong buckets if we intend ‘expert’ to mean something like ‘works/ed in the field of artificial intelligence’: Ben Goertzel (not ‘expert authority’), Marcus Hutter (not ‘expert authority’), Nick Bostrom (‘expert authority’), Kevin Warwick (not ‘expert authority’), Brad Darrach (‘expert authority’).

Expertise and ‘expert authority’ seem to be fairly related (there are only about 10 obviously dubious entries, out of 95—though 30 are dubious for other reasons), but not enough to take the erroneous result as much of a sign about experts I think.

On the other hand, it seems Armstrong and Sotala have a result they did not intend: predictions based on expert authority look much like those not based on expert authority. Which sounds interesting, though given the context is probably not surprising: whether someone cites reasons with their prediction is probably fairly random, as indicated by several people basing their predictions on expert authority half of the time. e.g. Whether Kurzweil mentions hardware extrapolation on a given occasion doesn’t vary his prediction much. A worse problem is that the actual categorization is ‘most non-experts’ + ‘experts who give reasons for their judgments’ vs. ‘experts who don’t mention reasons’ + ‘non-experts who listen to experts’, which is pretty random, and so hard to draw useful conclusions from.

We don’t have time right now to repeat this analysis after actually classifying people as experts or not, even though it looks straightforward. We delayed some in the hope of doing that, but it looks like we won’t get to it soon, and it seems best to publish this post sooner to avoid anyone relying on the erroneous finding.

In the meantime, here is our graph again of predictions from AI researchers, AGI researchers, futurists and other people—the best proxy we have of ‘expert vs. non-expert’. We think they look fairly different, though they are from the same dataset that Armstrong and Sotala used (though an edited version).

Predictions made by different groups since 2000 from the MIRI AI dataset.

Predictions made by different groups since 2000 from the MIRI AI predictions dataset.


Stuart Armstrong adds the following analysis, in the style of the graphs on figure 18 of their paper that it could replace:

Also please forgive the colour hideousness of the following graph:
image (15)
Here I did a bar chart of “time to AI after” for the four groups (and for all of them together), in 5-year bar increments (the last bar has all the 75 year+ predictions, not just 75-80). The data is incredibly sparse, but a few patterns do emerge: AGI are optimistic (and pretty similar to futurists), Others are pessimistic.
However, to within the limits of the data, I’d say that all groups (apart from “other”) still have a clear tendency to predict 10-25 years in the future more often than other dates. Here’s the % predictions in 10-25 years, and over 75 years:
%10-25 %>75
54% 8%  AGI
27% 23%  AI
47% 20%  Futurist
13% 50%  Other
36% 22%



Metasurvey: predict the predictors

As I mentioned earlier, we’ve been making a survey for AI researchers.

The survey asks when AI will be able to do things like build a lego kit according to the instructions, be a surgeon, or radically accelerate global technological development. It also asks about things like intelligence explosions, safety research, how hardware hastens AI progress, and what kinds of disagreement AI researchers have with each other about timelines.

We wanted to tell you more about the project before actually surveying people, to make criticism more fruitful. However it turned out that we wanted to start sending out the survey soon even more than that, so we did. We did get an abundance of private feedback, including from readers of this blog, for which we are grateful.

We have some responses so far, and still have about a thousand people to ask. Before anyone (else) sees the results though, I thought it might be amusing to guess what they look like. That way, you can know whether you should be surprised when you see the results, and we can know more about whether running surveys like this might actually change anyone’s beliefs about anything.

So we made a second copy of the survey to act as metasurvey, in which you can informally register your predictions.

If you want to play, here is how it works:

  1. Go to the survey here.
  2. Instead of answering the questions as they are posed, guess what the median answer given by our respondents is for each question.
  3. If you want to guess something other than the median given by our other respondents, do so, then write what you are predicting in the box for comments at the end. (e.g. maybe you want to predict the mode, or the interquartile range, or what the subset of respondents who are actually AI researchers say).
  4. If you want your predictions to be identifiable to you, give us your name and email at the end.  This will for instance let us alert you if we notice that you are surprisingly excellent at predicting. We won’t make names or emails public.
  5. At the end, you should be redirected to a printout of your answers, which you can save somewhere if you want to be able to demonstrate later how right you were about stuff. There is a tiny pdf export button in the top right corner.
  6. You will only get a random subset of questions to predict, because that’s how the survey works. If you want to make more predictions, the printout has all of the questions.
  7. We might publish the data or summaries of it, other than names and email addresses, in what we think is an unidentifiable form.

Some facts about the respondents, to help predict them:

  • They are NIPS 2015/ICML 2015 authors (so a decent fraction are not AI researchers)
  • There are about 1600 of them, before we exclude people who don’t have real email addresses etc.

John Salvatier points out to me that the Philpapers survey did something like this (I think more formally). It appears to have been interesting—they find that ‘philosophers have substantially inaccurate sociological beliefs about the views of their peers’, and that ‘In four cases [of thirty], the community gets the leading view wrong…In three cases, the community predicts a fairly close result when in fact a large majority supports the leading view’. If it turned out that people thinking about the future of AI were that wrong about the AI community’s views, I think that would be good to know about.


Featured image: By DeFacto (Own work) [CC BY-SA 4.0], via Wikimedia Commons 

Concrete AI tasks bleg

We’re making a survey. I hope to write soon about our general methods and plans, so anyone kind enough to criticize them has the chance. Before that though, we have a different request: we want a list of concrete tasks that AI can’t do yet, but may achieve sometime between now and surpassing humans at everything. For instance, ‘beat a top human Go player in a five game match’ would have been a good example until recently. We are going to ask AI researchers to predict a subset of these tasks, to better chart the murky path ahead.

We hope to:

  1. Include tasks from across the range of AI subfields
  2. Include tasks from across the range of time (i.e. some things we can nearly do, some things that are really hard)
  3. Have the tasks relate relatively closely to narrowish AI projects, to make them easier to think about (e.g. winning a 5k bipedal race is fairly close to existing projects, whereas winning an interpretive dance-off would require a broader mixture of skills, so is less good for our purposes)
  4. Have the tasks relate to specific hard technical problems (e.g. one-shot learning or hierarchical planning)
  5. Have the tasks relate to large changes in the world (e.g. replacing all drivers would viscerally change things)

Here are some that we have:

  • Win a 5km race over rough terrain against the best human 5k runner.
  • Physically assemble any LEGO set given the pieces and instructions.
  • Be capable of winning an International Mathematics Olympiad Gold Medal (ignoring entry requirements). That is, solve mathematics problems with known solutions that are hard for the best high school students in the world, better than those students can solve them.
  • Watch a human play any computer game a small number of times (say 5), then perform as well as human novices at the game without training more on the game. (The system can train on other games).
  • Beat the best human players at Starcraft, with a human-like limit on moves per second.
  • Translate a new language using unlimited films with subtitles in the new language, but the kind of training data we have now for other languages (e.g. same text in two languages for many languages and films with subtitles in many languages).
  • Be about as good as unskilled human translation for most popular languages (including difficult languages like Czech, Chinese and Arabic).
  • Answer tech support questions as well as humans can.
  • Train to do image classification on half a dataset (say, ImageNet) then take the other half of the images, containing previously unseen objects, and separate them into the correct groupings (without the correct labels of course)
  • See a small number of examples of a new object (say 10), then be able to recognize it in novel scenes as well as humans can.
  • Reconstruct a 3d scene from a 2d image as reliably as a human can.
  • Transcribe human speech with a variety of accents in a quiet environment as well as humans can
  • Routinely and autonomously prove mathematical theorems that are publishable in mathematics journals today

Can you think of any interesting ones?