Tom Griffiths on Cognitive Science and AI

Tom Griffiths

Tom Griffiths

This is a guest post by Finan Adamson

Prof. Tom Griffiths is the director of the Computational Cognitive Science Lab and the Institute of Cognitive and Brain Sciences at UC Berkeley. He studies human cognition and is involved with the Center for Human Compatible Artificial Intelligence. I asked him for insight into the intersection of cognitive science and AI. He offers his thoughts on the historical interaction of the fields and what aspects of human cognition might be relevant to developing AI in the future.

The conversation notes are here (pdf).

What if you turned the world’s hardware into AI minds?

In a classic ‘AI takes over the world’ scenario, one of the first things an emerging superintelligence wants to do is steal most of the world’s computing hardware and repurpose it to running the AI’s own software. This step takes one from ‘super-proficient hacker’ levels of smart to ‘my brain is one of the main things happening on Planet Earth’ levels of smart. There is quite a bit of hardware in the world, so this step in the takeover plan is kind of terrifying.

How terrifying exactly depends on A) how much computing hardware there is in the world at the time, and B) how efficiently hardware can be turned into AI at the time. We have some tentative answers to A)—probably at least a couple of hundred exaFLOPS now, growing somewhere between not at all and very fast. However B) is harder, in the absence of any idea how to get the efficiency of hardware-to-general-AI conversions above zero. Nonetheless, I think there are a couple of interesting reference points we can look at.

The one I’ll discuss now is the efficiency of the human brain. What if we could use about as much hardware as the human brain represents (in some sense) to run AI about as smart as a human brain? This is an interesting point to look at for a few reasons. We know brains are somewhere in the range of efficiency with which hardware can produce intelligent behavior, because they are an instance of that. And looking at one datapoint in the range is better than none. Also, for some means of building artificial intelligence—most obviously, brain emulation—we might expect to get something roughly as efficient as a human brain, give or take some.

So, we can think of the human brain as representing a pile of (fairly application specific) computing hardware. And we can estimate its computing power, in terms of FLOPS. People have done this (very inaccurately— their estimates are twelve orders of magnitude apart, but running through this calculation with such an uncertain number still seems informative). According to different sources, brain seems to be worth between about 3 x 1013 FLOPS and 1025 FLOPS. The median estimate is 1018 FLOPS.

So we can ask, if you turned all of the world’s two hundred exaFLOPS or more of computing hardware into brains, how many brains would you get?

This graph shows the answers over time, for a variety of assumptions about brain FLOPS, world FLOPS, and global computing hardware growth rates. Probably the most plausible line is the lower green one (brains median, world hardware high).

Figure: Projected number of human brains equivalent

Figure: Projected number of human brains equivalent to global hardware under various assumptions. For brains, ‘small’ = 3 x 10^ 13, ‘median’ = 10^18, ‘large’ = 10^25. For ‘world hardware’, ‘high’ =2 x 10^20, ‘low’ = 1.5 x 10^21. ‘Growth’ is growth in computing hardware, the unlabeled default used in most projections is 25% per annum (our estimate above), ‘high’ = 86% per annum (which would mean shifting to the highest growth rate we know of for related hardware—that of ASIC hardware in around 2007, which does not plausibly persist).

The basic answer is, if you turned all of the world’s computing hardware into AI as efficient as human brains right now, you would get less than a hundred million extra brains, or 1% of the population of the world. Probably a whole lot less. For the median estimates of brain computing power, you would get about a hundred or a thousand extra brains worth of AI.

That means, for instance, that if we figured out how to make uploads right now, and they were roughly as efficient as the median brains estimate, and then someone acquired all of the hardware in the world for them, they would only have about as many additional minds as a project willing to spend a few hundred million dollars per year on wages, e.g. Facebook. Which would really be something. But not something overwhelmingly outscaling everything else going on in the world.

If you trust the projections of hardware growth fifty years into the future at all (which you shouldn’t, but suppose you did) the most plausible (median brain size, low growth) lines don’t even reach the world population line by then, though they would certainly make for an incredible AI research project, if that was the direction to which the additional mental effort was directed.

Remember, all of this is very sketchy and probably inaccurate and you should maybe think about it a bit more if your decisions depend on it much (or ask us nicely to). But I strongly favor sketchy projections over none.

Image: Planetary Brain, Adrian Kenyon, some rights reserved.

Friendly AI as a global public good

A public good, in the economic sense, can be (roughly) characterized as a desirable good that is likely to be undersupplied, or not supplied at all, by private companies. It generally falls to the government to supply such goods. Examples include infrastructure networks, or a country’s military. See here for a more detailed explanation of public goods.

The provision of public goods by governments can work quite well at the national level. However, at the international level, there is no global government with the power to impose arbitrary legislation on countries and enforce it. As a result, many global public goods, such as carbon emission abatement, disease eradication, and existential risk mitigation, are partially provided or not provided.

Scott Barrett, in his excellent book Why Cooperate? The Incentive to Supply Global Public Goods, explains that not all global public goods are created equal. He develops a categorization scheme (Table 1), identifying important characteristics that influence whether they are likely to be provided, and what tools can be used to improve their likelihood of provision.

For example:

  • Climate change mitigation is classified as an “aggregate effort” global public good, since its provision depends on the aggregate of all countries’ CO2eq emissions. Provision is difficult, as countries each individually face strong incentives to pollute.
  • Defense against large Earth-bound asteroids is classified as a “single best effort” global public good, since provision requires actions by only one country (or coalition of countries). Providing this global public good unilaterally is likely to be in the interests and within the capabilities of at least one individual country, and so it is likely to be provided.
  • Nuclear non-proliferation is classified as a “mutual restraint” public good, since it is provided by countries refraining from doing something. Provision is difficult as many countries individually face strong incentives to maintain a nuclear deterrent (despite the associated economic cost).
Single best effort Weakest link Aggregate effort Mutual restraint Coordination
Supply depends on… The single best (unilateral or collective) effort The weakest individual effort The total effort of all countries Countries not doing something Countries doing the same thing
Examples Asteroid defense, knowledge, peacekeeping, suppressing an infectious disease outbreak at its source, geoengineering Disease eradication, preventing emergence of resistance and new diseases, securing nuclear materials, vessel reflagging Climate change mitigation, ozone layer protection Non-use of nuclear weapons, non-proliferation, bans on nuclear testing and biotechnology research Standards for the measurement of time, for oil tankers, and for automobiles
International cooperation needed? Yes, in many cases, to determine what should be done, and which countries should pay Yes, to establish universal minimum standards Yes, to determine the individual actions needed to achieve an overall outcome Yes, to agree on what countries should not do Yes, to choose a common standard
Financing and cost sharing needed? Yes, when the good is provided collectively Yes, in some cases Yes, with industrialized countries helping developing countries No No
Enforcement of agreement challenging? Not normally Yes, except when provision requires only coordination Yes Yes No, though participation will need to pass a threshold
International institutions for provision Treaties in some cases; international organizations, such as the UN, in other cases Consensus (World Health Assembly) or Security Council resolutions, customary law Treaties Treaties, norms, customary law Non-binding resolutions; treaties in some cases

Table 1: Simple Taxonomy of Global Public Goods
Source: Scott Barrett (2010), Why Cooperate? The Incentive to Supply Global Public Goods (location 520 of Kindle edition)

Applying the Barrett framework to friendly AI

Artificial Intelligence (AI) technology is likely to progress until the eventual creation of AI that vastly surpasses human cognitive capabilities—artificial superintelligence (ASI). The possibility of an intelligence explosion means that the first ASI system, or those that control it, might possess an unprecedented ability to shape the world according to their preferences. This event could define our entire species, leading rapidly to the full realization of humanity’s potential or causing our extinction. Since “friendly AI”—safe ASI deployed for the benefit of humanity—is a global public good, it may be informative to apply Barrett’s global public good classification scheme to analyse the different facets of this challenge.

Since this framework focuses on the incentives faced by national governments, it is most relevant to situations where ASI development is largely driven by governments, which will therefore be the focus of this article. This government-led scenario is distinct from the current situation of technology industry-led development of AI. Governments might achieve this high level of control through large-scale state-sponsored projects and regulation of private activities.

As with many global public goods, the development of friendly AI can be broken down into many components, each of which may conform to a different category within Barrett’s taxonomy. Here I will focus on those that I believe are most important for long term safety.

Arguably, one of the most concerning problems in the government-led scenario is the potential for the benefits of ASI to be captured by some subset of humanity. Humans are unfortunately much more strongly motivated by self-interest than by the common good, and this is reflected in national and international politics. This mean that, given the chance, leaders whose governments control the development of ASI might seek to capture the benefits for their country only, or some subset of their country such as their political allies, or other groups. This could be achieved by instilling values in the ASI system that favor such groups, or through the direct exertion of control over the ASI system. Protection against this possibility constitutes a “mutual restraint” public good, since its provision relies upon countries refraining from doing so. Failing to prevent this possibility may, depending on the preferences of those that control ASI, cause an existential catastrophe, for example in the form of “flawed realization” or “shriek”.

Because of this, and given the current anarchical state of international relations, any ASI-developing country is likely to be perceived as a significant security threat by other countries. Fears that any country succeeding at creating ASI would gain a large strategic advantage over other countries could readily lead to an ASI development race. In this scenario, speed may be prioritized at the expense of safety measures, for example those necessary to solve the value-loading problem (Ch. 12) and the control problem (Ch. 9). This would compound the risks of misuse of ASI explored in the previous paragraph by increasing the possibility of humanity losing control of this creation. The likelihood of an ASI development race is somewhat supported by Chalmers 2010 (footnote, p. 29).

Further, given that ASI may only be achievable on a timescale of decades, the global order prevailing when ASI is within reach may be truly multi-polar. For example, this timescale may allow both China and India to far surpass the USA in terms of economic weight, and may allow countries such as Brazil and Russia to rival the influence of the USA. With a diverse mix of world powers with differing national values, attempts at coordination and restraint could easily be undermined by mistrust.

Another facet of the global public good of friendly AI is the aforementioned technical challenges, including the value-loading problem and the control problem, which currently receive much attention in discussions of long-term AI safety. In isolation, these technical challenges can be considered a “single best effort” global public good in Barrett’s taxonomy, similar to asteroid defense or geoengineering, where it is often in the interests of some countries to unilaterally provide the good. Therefore, a substantial attempt would probably be made to solve these challenges in the government-led scenario, if race dynamics were not present. In reality, any additional advance work on this technical front is likely to be highly beneficial.

What can be done?

Without aiming to present a robust solution, this section briefly explores some of the available options, informed by insights presented by Barrett regarding mutual restraint global public goods.

A “silver bullet” solution to these institutional challenges could be achieved through the emergence of a world government capable of providing global public goods. Although this may eventually be possible, it seems unlikely within the timeframe in which ASI may be developed. Supporting progression towards this outcome may help to provide the global public goods identified above, but such action is probably insufficient alone.

In relation to mutual restraint public goods generally, Barrett identifies treaties, norms and customary law as institutional tools for provision. If a treaty requiring the necessary restraint could be enforced—Shulman mentions (p. 3) some ways in which one might be—it could be effective. However, this would still rely on countries’ willingness to voluntarily join the agreement.

Norms and custom can help achieve mutual restraint. In his book, Barrett analyses (location 2506 of Kindle edition) an important example; the taboo on the use of nuclear weapons. Thanks to strong aversion towards any destructive use of nuclear weapons, such use has not occurred since 1945. This has occurred despite numerous situations in which it would have been militarily advantageous to use a nuclear weapon, e.g. when a nuclear power was at war with a non-nuclear state. In the presence of such attitudes, any benefits to a country from using nuclear weaponry must be weighed against the costs of severe loss of international reputation, or in the extreme, the end of the taboo and consequent nuclear war.

The taboo on the use of nuclear weapons was not inevitable, but arose partly because of mutual understanding of the seriousness of the threat of nuclear war. If the potential effects of ASI are similarly well understood by all powers seeking to develop it, it is possible that a similar taboo could be created, perhaps with the help of a carefully designed treaty between those countries with meaningful ASI development capabilities. The purpose of such an arrangement would be not only to mandate the adoption of proper safety measures, but also to ensure that the benefits of ASI would be spread fairly amongst all of humanity.


To achieve positions of power, all political leaders depend heavily on their ability to amass resources and influence. Upon learning of the huge potential of ASI, such individuals may instinctively attempt to capture control of its power. They will also expect their rivals to do the same, and will strategize accordingly. Therefore, in the event of government-led ASI development, mutual restraint by ASI-developing nations would be needed to avoid attempts to capture the vast benefits of ASI for a small subset of humanity, and to avoid the harmful effects of a race to develop ASI.

Error in Armstrong and Sotala 2012

Can AI researchers say anything useful about when strong AI will arrive?

Back in 2012, Stuart Armstrong and Kaj Sotala weighed in on this question in a paper called ‘How We’re Predicting AI—or Failing To‘. They looked at a dataset of predictions about AI timelines, and concluded that predictions made by AI experts were indistinguishable from those of non-experts. (Which might suggest that AI researchers don’t have additional information).

As far as I can tell—and Armstrong and Sotala agree—this finding is based on an error. Not a fundamental philosophical error, but a spreadsheet construction and interpretation error.

The main clue that there has been a mistake is that their finding is about experts and non-experts, and their public dataset does not contain any division of people into experts and non-experts. (Hooray for publishing data!)

As far as we can tell, the column that was interpreted as ‘is this person an expert?’ was one of eight tracking ‘by what process did this person arrive at a prediction?’ The possible answers are ‘outside view’, ‘noncausal model’, ‘causal model’, ‘philosophical argument’, ‘expert authority’, ‘non-expert authority’, ‘restatement’ and ‘unclear’.

Based on comments and context, ‘expert authority’ appears to mean here that either the person who made the prediction is an expert who consulted their own intuition on something without providing further justification, or that the predictor is a non-expert who used expert judgments to inform their opinion. So the predictions not labeled ‘expert authority’ are a mixture of predictions made by experts using something other than their intuition—e.g. models and arguments—and predictions made by non-experts which are based on anything other than reference to experts. Plus restatements and unclarity that don’t involve any known expert intuition.

The reasons to think that the ‘expert authority’ column was misintepreted as an ‘expert’ column are A) that there doesn’t seem to be any other plausible expert column, B) that the number of predictions labeled with ‘expert authority’ is 62, the same as the number of experts Armstrong and Sotala claimed to have compared (and the rest of the set is 33, the number of non-experts they report), and C) Sotala suggests this is what must have happened.

How bad a problem is this? How badly does using unexplained expert opinion as a basis for prediction align with actually being an expert?

Even without knowing exactly what an expert is, we can tell the two aren’t all that well aligned because Armstrong and Sotala’s dataset contains many duplicates: multiple records of the same person making predictions in different places. All of these people appear at least twice, at least once relying ‘expert authority’ and at least once not: Rodney Brooks, Ray Kurzweil, Jürgen Schmidhuber, I. J. Good, Hans Moravec. It is less surprising that experts and non-experts have similar predictions when they are literally the same people! But multiple entries of the same people listed as experts and non-experts only accounts for a little over 10% of their data, so this is not the main thing going on.

I haven’t checked the data carefully and assessed people’s expertise, but here are other names that look to me like they fall in the wrong buckets if we intend ‘expert’ to mean something like ‘works/ed in the field of artificial intelligence’: Ben Goertzel (not ‘expert authority’), Marcus Hutter (not ‘expert authority’), Nick Bostrom (‘expert authority’), Kevin Warwick (not ‘expert authority’), Brad Darrach (‘expert authority’).

Expertise and ‘expert authority’ seem to be fairly related (there are only about 10 obviously dubious entries, out of 95—though 30 are dubious for other reasons), but not enough to take the erroneous result as much of a sign about experts I think.

On the other hand, it seems Armstrong and Sotala have a result they did not intend: predictions based on expert authority look much like those not based on expert authority. Which sounds interesting, though given the context is probably not surprising: whether someone cites reasons with their prediction is probably fairly random, as indicated by several people basing their predictions on expert authority half of the time. e.g. Whether Kurzweil mentions hardware extrapolation on a given occasion doesn’t vary his prediction much. A worse problem is that the actual categorization is ‘most non-experts’ + ‘experts who give reasons for their judgments’ vs. ‘experts who don’t mention reasons’ + ‘non-experts who listen to experts’, which is pretty random, and so hard to draw useful conclusions from.

We don’t have time right now to repeat this analysis after actually classifying people as experts or not, even though it looks straightforward. We delayed some in the hope of doing that, but it looks like we won’t get to it soon, and it seems best to publish this post sooner to avoid anyone relying on the erroneous finding.

In the meantime, here is our graph again of predictions from AI researchers, AGI researchers, futurists and other people—the best proxy we have of ‘expert vs. non-expert’. We think they look fairly different, though they are from the same dataset that Armstrong and Sotala used (though an edited version).

Predictions made by different groups since 2000 from the MIRI AI dataset.

Predictions made by different groups since 2000 from the MIRI AI predictions dataset.


Stuart Armstrong adds the following analysis, in the style of the graphs on figure 18 of their paper that it could replace:

Also please forgive the colour hideousness of the following graph:
image (15)
Here I did a bar chart of “time to AI after” for the four groups (and for all of them together), in 5-year bar increments (the last bar has all the 75 year+ predictions, not just 75-80). The data is incredibly sparse, but a few patterns do emerge: AGI are optimistic (and pretty similar to futurists), Others are pessimistic.
However, to within the limits of the data, I’d say that all groups (apart from “other”) still have a clear tendency to predict 10-25 years in the future more often than other dates. Here’s the % predictions in 10-25 years, and over 75 years:
%10-25 %>75
54% 8%  AGI
27% 23%  AI
47% 20%  Futurist
13% 50%  Other
36% 22%