2016 Expert Survey on Progress in AI

The 2016 Expert Survey on Progress in AI is a survey of machine learning researchers that AI Impacts ran in 2016.


Results have not yet been published.

The full list of questions is available here. Participants received randomized subsets of these questions.


Concrete AI tasks for forecasting

This page contains a list of relatively well specified AI tasks designed for forecasting. Currently all entries were used in the 2016 Expert Survey on Progress in AI.


  1. Translate a text written in a newly discovered language into English as well as a team of human experts, using a single other document in both languages (like a Rosetta stone). Suppose all of the words in the text can be found in the translated document, and that the language is a difficult one.
  2. Translate speech in a new language given only unlimited films with subtitles in the new language. Suppose the system has access to training data for other languages, of the kind used now (e.g. same text in two languages for many languages and films with subtitles in many languages).
  3. Perform translation about as good as a human who is fluent in both languages but unskilled at translation, for most types of text, and for most popular languages (including languages that are known to be difficult, like Czech, Chinese and Arabic).
  4. Provide phone banking services as well as human operators can, without annoying customers more than humans. This includes many one-off tasks, such as helping to order a replacement bank card or clarifying how to use part of the bank website to a customer.
  5. Correctly group images of previously unseen objects into classes, after training on a similar labeled dataset containing completely different classes. The classes should be similar to the ImageNet classes.
  6. One-shot learning: see only one labeled image of a new object, and then be able to recognize the object in real world scenes, to the extent that a typical human can (i.e. including in a wide variety of settings). For example, see only one image of a platypus, and then be able to recognize platypuses in nature photos. The system may train on labeled images of other objects.Currently, deep networks often need hundreds of examples in classification tasks1, but there has been work on one-shot learning for both classification2 and generative tasks3.
  7. See a short video of a scene, and then be able to construct a 3D model of the scene good enough to create a realistic video of the same scene from a substantially different angle.
    For example, constructing a short video of walking through a house from a video taking a very different path through the house.
  8. Transcribe human speech with a variety of accents in a noisy environment as well as a typical human can.
  9. Take a written passage and output a recording that can’t be distinguished from a voice actor, by an expert listener.
  10. Routinely and autonomously prove mathematical theorems that are publishable in top mathematics journals today, including generating the theorems to prove.
  11. Perform as well as the best human entrants in the Putnam competition—a math contest whose questions have known solutions, but which are difficult for the best young mathematicians.
  12. Defeat the best Go players, training only on as many games as the best Go players have played.
    For reference, DeepMind’s AlphaGo has probably played a hundred million games of self-play, while Lee Sedol has probably played 50,000 games in his life1.
  13. Beat the best human Starcraft 2 players at least 50% of the time, given a video of the screen.
    Starcraft 2 is a real time strategy game characterized by:

    • Continuous time play
    • Huge action space
    • Partial observability of enemies
    • Long term strategic play, e.g. preparing for and then hiding surprise attacks.
  14. Play a randomly selected computer game, including difficult ones, about as well as a human novice, after playing the game less than 10 minutes of game time. The system may train on other games.
  15. Play new levels of Angry Birds better than the best human players. Angry Birds is a game where players try to efficiently destroy 2D block towers with a catapult. For context, this is the goal of the IJCAI Angry Birds AI competition1.
  16. Outperform professional game testers on all Atari games using no game-specific knowledge. This includes games like Frostbite, which require planning to achieve sub-goals and have posed problems for deep Q-networks1, 2.
  17. Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.

    For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games1, but used hundreds of hours of play to train2.

  18. Fold laundry as well and as fast as the median human clothing store employee.
  19. Beat the fastest human runners in a 5 kilometer race through city streets using a bipedal robot body.
  20. Physically assemble any LEGO set given the pieces and instructions, using non- specialized robotics hardware.

    For context, Fu 20161 successfully joins single large LEGO pieces using model based reinforcement learning and online adaptation.
  21. Learn to efficiently sort lists of numbers much larger than in any training set used, the way Neural GPUs can do for addition1, but without being given the form of the solution.

    For context, Neural Turing Machines have not been able to do this2, but Neural Programmer-Interpreters3 have been able to do this by training on stack traces (which contain a lot of information about the form of the solution).

  22. Write concise, efficient, human-readable Python code to implement simple algorithms like quicksort. That is, the system should write code that sorts a list, rather than just being able to sort lists.

    Suppose the system is given only:

    • A specification of what counts as a sorted list
    • Several examples of lists undergoing sorting by quicksort
  23. Answer any “easily Googleable” factoid questions posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.

    Examples of factoid questions: “What is the poisonous substance in Oleander plants?” “How many species of lizard can be found in Great Britain?”

  24. Answer any “easily Googleable” factual but open ended question posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.

    Examples of open ended questions: “What does it mean if my lights dim when I turn on the microwave?” “When does home insurance cover roof replacement?”

  25. Give good answers in natural language to factual questions posed in natural language for which there are no definite correct answers.

    For example:”What causes the demographic transition?”, “Is the thylacine extinct?”, “How safe is seeing a chiropractor?”

  26. Write an essay for a high-school history class that would receive high grades and pass plagiarism detectors.

    For example answer a question like ‘How did the whaling industry affect the industrial revolution?’

  27. Compose a song that is good enough to reach the US Top 40. The system should output the complete song as an audio file.
  28. Produce a song that is indistinguishable from a new song by a particular artist, e.g. a song that experienced listeners can’t distinguish from a new song by Taylor Swift.
  29. Write a novel or short story good enough to make it to the New York Times best-seller list.
  30. For any computer game that can be played well by a machine, explain the machine’s choice of moves in a way that feels concise and complete to a layman.
  31. Play poker well enough to win the World Series of Poker.
  32. After spending time in a virtual world, output the differential equations governing that world in symbolic form.

    For example, the agent is placed in a game engine where Newtonian mechanics holds exactly and the agent is then able to conduct experiments with a ball and output Newton’s laws of motion.

Conversation with Tom Griffiths


  • Professor Tom Griffiths ­ Director of the Computational Cognitive Science Lab and the Institute of Cognitive and Brain Sciences at the University of California, Berkeley.
  • Finan Adamson ­ AI Impacts

Note: These notes were compiled by AI impacts and give an overview of the major points made by Professor Tom Griffiths.

They are available as a pdf here.


Professor Tom Griffiths answered questions about the intersection between cognitive science and AI. Topics include how studying human brains has helped with the development of AI and how it might help in the future.

How has cognitive science helped with the development of AI in the past?

AI and cognitive science were actually siblings, born at around the same time with the same parents. Arguably the first AI system, the Logic Theorist, was developed by Herb Simon and Allen Newell and was a result of thinking about the cognitive processes that human mathematicians use when developing proofs. Simon and Newell presented that work at a meeting at MIT in 1956 that many regard as the birth of cognitive science ­ it was a powerful demonstration of how thinking in computational terms could make theories of cognition precise enough that they could be tested rigorously. But it was also a demonstration of how trying to understand the ways that people solve complex problems can inspire the development of AI systems.

How is cognitive science helping with the development of AI presently?

When I think about this relationship, I imagine a positive feedback loop where cognitive science helps support AI and AI helps support cognitive science. Human beings remain the best examples of systems that can solve many of the problems that we want our AI systems to solve. As a consequence, insights that we get from studying human cognition can inform strategies that we take in developing AI systems. At the same time, progress in AI gives us new tools that we can use to formalize aspects of human cognition that we previously didn’t understand. As a consequence, we can rigorously study a wider range of questions about the mind.

How can cognitive science help with the development of AI in the future?

Deep Learning Systems

Deep learning systems are mastering a variety of basic perceptual and learning tasks, and the challenges that these systems now face look a lot like the first important stages of cognitive development in human children: identifying objects, formulating goals, and generating high­level conceptual representations. I think understanding how children do these things is potentially very relevant to making progress.

Efficient Strategies

One of the things that people have to be good at, given the limited computational capacity of our minds, is developing efficient strategies for solving problems given limited resources. That’s exactly the kind of thing that AI systems need to be able to do to operate in the real world.

What are the challenges to progress in studying brains as they relate to AI?

Birds and Planes

One important thing to keep in mind is that there are different levels at which we might see a correspondence between human minds/brains and AI systems. Critics of the idea that AI researchers can learn something from human cognition sometimes point out that the way jet airplanes work has little relationship to how birds fly, and in fact trying to mimic birds held back the development of planes. However, this analogy misses the fact that there is something important that both jets and birds share: they both have to grapple with aerodynamics. Ultimately, we can see them both as solutions to the same underlying physical problem, constrained by the same mathematical principles.It isn’t clear which aspects of human brains have the best insights that could cross over to AI. Examples of places to look include the power of neurons as computational units, the efficiency of particular cognitive strategies, or the structure of the computational problem that is being solved. This last possibility — looking at abstract computational problems and their ideal solutions — is the place where I think we’re likely to find the equivalent of aerodynamics for intelligent systems.

What blind spots does the field of AI have that could be addressed by studying cognitive science?

I don’t think they’re blind spots, they are problems that everybody is aware are hard ­ things like forming high­level actions for reinforcement learning, formulating goals, reasoning about the intentions of others, developing high­level conceptual representations, learning language from linguistic input alone, learning from very small amounts of data, discovering causal relationships through observation and experimentation, forming effective cognitive strategies, and managing your cognitive resources are all cases where we can potentially learn a lot from studying human cognition.

How does cognitive science relate to AI value alignment?

Theory of Mind

Inferring the preferences or goals of another person from their behavior ­ something that human children begin to do in infancy and gradually develop in greater sophistication over the first few years of life. This is part of a broader piece of cognitive machinery that developmental psychologists have studied extensively.

What risks might be mitigated by greater collaboration between those who study human brains and those building AI?

We’re already surrounded by autonomous agents that have the capacity to destroy all human life, but most of the time operate completely safely. Those autonomous agents are of course human beings. So that raises an interesting question: how is it that we’re able to create human­compatible humans? Answering this question might give us some insights that are relevant to building human­compatible AI systems. It’s certainly not going to give us all the answers ­ many of the issues in AI safety arise because of concerns about super­human intelligence and a capacity for self­modification that goes beyond the human norm ­ but I think it’s an interesting avenue to pursue.

Returns to scale in research

When universities or university departments produce research outputs—such as published papers—they sometimes experience increasing returns to scale, sometimes constant returns to scale, and sometimes decreasing returns to scale. At the level of nations however, R&D tends to see increasing returns to scale. These results are preliminary.


“Returns to scale” refers to the responsiveness of a process’ outputs when all inputs (e.g. researcher hours, equipment) are increased by a certain proportion. If all outputs (e.g. published papers, citations, patents) increase by that same proportion, the process is said to exhibit constant returns to scale. Increasing returns to scale and decreasing returns to scale refer to situations where outputs still increase, but by a higher or lower proportion, respectively.

Assessing returns to scale in research may be useful in predicting certain aspects of the development of artificial intelligence, in particular the dynamics of an intelligence explosion.


The conclusions in this article are drawn from an incomplete review of academic literature assessing research efficiency, presented in Table 1. These papers assess research in terms of its direct outputs such as published papers, citations, and patents. The broader effects of the research are not considered.

Most of the papers listed below use the Data Envelopment Analysis (DEA) technique, which is a quantitative technique commonly used to assess the efficiency of universities and research activities. It is capable of isolating the scale efficiency of the individual departments, universities or countries being studied.

Paper Level of comparison Activities assessed Results pertaining to returns to scale
Wang & Huang 2007 Countries’ overall R&D activities Research Increasing returns to scale in research are exhibited by more than two-thirds of the sample
Kocher, Luptacik & Sutter 2006 Countries’ R&D in economics Research Increasing returns to scale are found in all countries in the sample except the US
Cherchye & Abeele 2005 Dutch universities’ research in Economics and Business Management Research Returns to scale vary between decreasing, constant and increasing depending on each university’s specialization
Johnes & Johnes 1993 UK universities’ research in economics Research Constant returns to scale are found in the sample as a whole
Avkiran 2001 Australian universities Research, education Constant returns to scale found in most sampled universities
Ahn 1988 US universities Research, education Decreasing returns to scale on average
Johnes 2006 English universities Research, education Close to constant returns to scale exhibited by most universities sampled
Kao & Hung 2008 Departments of a Taiwanese university Research, education Increasing returns to scale exhibited by the five most scale-inefficient departments. However, no aggregate measure of returns to scale within the sample is presented.

Table 1: Sample of studies of research efficiency that assess returns to scale
Note: This table only identifies increasing/constant/decreasing returns to scale, rather than the size of this effect. Although DEA can measure the relative size of the effect for individual departments/universities/countries within a sample, such results cannot be readily compared between samples/studies.

Discussion of results

Of the studies listed in Table 1, the first four are the most relevant to this article, since they focus solely on research inputs and outputs. While the remaining four include educational inputs and outputs, they can still yield worthwhile insights.

Table 1 implies a difference between country-level and university-level returns to scale in research.

  • The two studies assessing R&D efficiency at the country level, Wang & Huang (2007) and Kocher, Luptacik & Sutter (2006), both identify increasing returns to scale.
  • The two university-level studies that assessed the scale efficiency of research alone found mixed results. Concretely, Johnes & Johnes (1993) concluded that returns to scale are constant among UK universities, and Cherchye & Abeele (2005) concluded that returns to scale vary among Dutch universities. This ambiguity is echoed by the remainder of the studies listed above, which assess education and research simultaneously and which find evidence of constant, decreasing and increasing returns to scale in different contexts.

Such differences are consistent with the possibility that scale efficiency may be influenced by scale (size) itself. In this framework, as an organisation increases its size, it may experience increasing returns to scale initially, resulting in increased efficiency. However, the efficiency gains from growth may not continue indefinitely; after passing a certain threshold the organisation may experience decreasing returns to scale. The threshold would represent the point of scale efficiency, at which returns to scale are constant and efficiency is maximized with respect to size.

Under this framework, size will influence whether increasing, constant or decreasing returns to scale are experienced. Applying this to research activities, the observation of different returns to scale between country-level and university-level research may mean that the size of a country’s overall research effort and the typical size of its universities are not determined by similar factors. For example, if increasing returns to scale at the country level and decreasing returns to scale at the university level are observed, this may indicate that the overall number of universities is smaller than needed to achieve scale efficiency, but that most of these universities are individually too large to be scale efficient.

Other factors may also contribute to the differences between university-level and country-level observations.

  • The country level studies use relatively aggregated data, capturing some of the non-university research and development activities in the countries sampled.
  • Country level research effort is not necessarily subject to some of the constraints which may cause decreasing returns to scale in large universities, such as excessive bureaucracy.
  • Results may be arbitrarily influenced by differences in the available input and output metrics at the university versus country level.

Limitations to conclusions drawn

One limitation of this article is the small scope of the literature review. A more comprehensive review may reveal a different range of conclusions.

Another limitation is that the research outputs studied—published papers, citations, and patents, inter alia—cannot be assumed to correspond directly to incremental knowledge or productivity. This point is expanded upon under “Topics for further investigation” below.

Further limitations arise due to the DEA technique used by most of the studies in Table 1.

  • DEA is sensitive to the choice of inputs and outputs, and to measurement errors.
  • Statistical hypothesis tests are difficult within the DEA framework, making it more difficult to separate signal from noise in interpreting results.
  • DEA identifies relative efficiency (composed of scale efficiency and also “pure technical efficiency”) within the sample, meaning that at least one country, university, or department is always identified as fully efficient (including exhibiting full scale efficiency or constant returns to scale). Of course, in practice, no university, organisation or production process is perfectly efficient. Therefore, conclusions drawn from DEA analysis are likely to be more informative for countries, universities, or departments that are not identified as fully efficient.
  • It may be questionable whether such a framework—where an optimal scale of production exists, past which decreasing returns to scale are experienced—is a good reflection of the dynamics of research activities. However, the frequent use of the DEA framework in assessing research activities would suggest that it is appropriate.

Topics for further investigation

The scope of this article is limited to direct research outputs (such as published papers, citations, and patents). While this is valuable, stronger conclusions could be drawn if this analysis were combined with further work investigating the following:

  • The impact of other sources of new knowledge apart from universities or official R&D expenditure. For example, innovations in company management discovered through “learning by doing” rather than through formal research may be an important source of improvement in economic productivity.
  • The translation of research outputs (such as published papers, citations, and patents) into incremental knowledge, and the translation of incremental knowledge into extra productive capacity. Assessment of this may be achievable through consideration of the economic returns to research, or of the value of patents generated by research.

Implications for AI

The scope for an intelligence explosion is likely to be greater if the returns to scale in research are greater. In particular, an AI system capable of conducting research into the improvement of AI may be able to be scaled up faster and more cheaply than the training of human researchers, for example through deployment on additional hardware. In addition, in the period before any intelligence explosion, a scaling-up of AI research may be observed, especially if the resultant technology were seen to have commercial applications.

This review is one component of a larger project to quantitatively model an intelligence explosion. This project, in addition to drawing upon the conclusions in this article, will also consider inter alia the effect of intelligence on research productivity, and actual increases in artificial intelligence that are plausible from research efforts.

Global computing capacity

Computing capacity worldwide was probably around 2 x 1020 – 1.5 x 1021 FLOPS, at around the end of 2015.


We are not aware of recent, plausible estimates for hardware capacity.

Vipul Naik estimated global hardware capacity in February 2014, based on Hilbert & Lopez’s estimates for 1986-2007. He calculated that if all computers ran at full capacity, they would perform 10-1000 zettaFLOPS, i.e. 1022 – 1024 FLOPS.1 We think these are substantial overestimates, because producing so much computing hardware would cost more than 10% of gross world product (GWP), which is implausibly high. The most cost-efficient computing hardware we are aware of today are GPUs, which still cost about $3/GFLOPS, or $1/GFLOPSyear if we assume hardware is used for around three years. This means maintaining hardware capable of 1022 – 1024 FLOPS would cost at least $1013 – $1015  per year. Yet gross world product (GWP) is only around $8 x 1013, so this would imply hardware spending constitutes more than 13% – 1300% of GWP. Even the lower end of this range seems implausible.2

One way to estimate global hardware capacity ourselves is based on annual hardware spending. This is slightly complicated because hardware lasts for several years. So to calculate how much hardware exists in 2016, we would ideally like to know how much was bought in every preceding year, and also how much of each annual hardware purchase has already been discarded. To simplify matters, we will instead assume that hardware lasts for around three years.

It appears that very roughly $300bn-$1,500bn was spent on hardware in 2015.3 We previously estimated that the cheapest available hardware (in April 2015) was around $3/GFLOPS. So if humanity spent $300bn-$1,500bn on hardware in 2015, and it was mostly the cheapest hardware, then the hardware we bought should perform around 1020 – 5 x 1020 FLOPS. If we multiply this by three to account for the previous two years’ hardware purchases still being around, we have about  3 x 1020 – 1.5 x 1021 FLOPS.

This estimate is rough, and could be improved in several ways. Most likely, more hardware is being bought each year than the previous year. So approximating last years’ hardware purchase to this years’ will yield too much hardware. In particular, the faster global hardware is growing, the closer the total is to whatever humanity bought this year (that is, counterintuitively, if you think hardware is growing faster, you should suppose that there is less of it by this particular method of estimation). Furthermore, perhaps a lot of hardware is not the cheapest for various reasons. This too suggests there is less hardware than we estimated.

On the other hand, hardware may often last for more than three years (we don’t have a strong basis for our assumption there). And our prices are from early 2015, so hardware is likely somewhat cheaper now (in early 2016). Our guess is that overall these considerations mean our estimate should be lower, but probably by less than a factor of four in total. This suggests 7.5 x 1019 – 1.5 x 1021 FLOPS of hardware.

However Hilbert & Lopez (2012) estimated that in 2007 the world’s computing capacity was around 2 x 1020 IPS (similar to FLOPS) already, after constructing a detailed inventory of technologies.4 Their estimate does not appear to conflict with data about the global economy at the time.5 Growth is unlikely to have been negative since 2007, though Hilbert & Lopez may have overestimated. So we revise our estimate to 2 x 1020 – 1.5 x 1021 FLOPS for the end of 2015.

This still suggests that in the last nine years, the world’s hardware has grown by a factor of 1-7.5, implying a growth rate of 0%-25%. Even 25% would be quite low compared to growth rates between 1986 and 2007 according to Hilbert & Lopez (2012), which were 61% for general purpose computing and 86% for the set of ASICs they studied (which in 2007 accounted for 32 times as much computing as general purpose computers).6 However if we are to distrust estimates which imply hardware is a large fraction of GWP, then we must expect hardware growth has slowed substantially in recent years. For comparison, our estimates are around 2-15% of Naik’s lower bound, and suggest that hardware constitutes around 0.3%-1.9% of GWP.

Such large changes in the long run growth rate are surprising to us, and—if they are real—we are unsure what produced them. One possibility is that hardware prices have stopped falling so fast (i.e. Moore’s Law is ending for the price of computation). Another is that spending on hardware decreased for some reason, for instance because people stopped enjoying large returns from additional hardware. We think this question deserves further research.


Global computing capacity in terms of human brains

According to different estimates, the human brain performs the equivalent of between 3 x 1013 and 1025 FLOPS. The median estimate we know of is 1018 FLOPS. According to that median estimate and our estimate of global computing hardware, if the world’s entire computing capacity could be directed at running minds around as efficient as those of humans, we would have the equivalent of 200-1500 extra human minds.7 That is, turning all of the world’s hardware into human-efficiency minds at present would increase the world’s population of minds by at most about 0.00002%. If we select the most favorable set of estimates for producing large numbers, turning all of the world’s computing hardware into minds as efficient as humans’ would produce around 50 million extra minds, increasing the world’s effective population by about 1%.8

Figure: Projected number of human brains equivalent

Figure: Projected number of human brains equivalent to global hardware under various assumptions. For brains, ‘small’ = 3 x 10^ 13, ‘median’ = 10^18, ‘large’ = 10^25. For ‘world hardware’, ‘high’ =2 x 10^20, ‘low’ = 1.5 x 10^21. ‘Growth’ is growth in computing hardware, the unlabeled default used in most projections is 25% per annum (our estimate above), ‘high’ = 86% per annum (the apparent growth rate in ASIC hardware in around 2007).



Costs of human-level hardware

Computing hardware which is equivalent to the brain –

  • in terms of FLOPS probably costs between $1 x 105 and $3 x 1016, or $2/hour-$700bn/hour.
  • in terms of TEPS probably costs $200M – $7B, or or $4,700 – $170,000/hour (including energy costs in the hourly rate).
  • in terms of secondary memory probably costs $300-3,000, or $0.007-$0.07/hour.


Partial costs


Main articles: Brain performance in FLOPS, Current FLOPS prices, Trends in the costs of computing

FLoating-point Operations Per Second (FLOPS) is a measure of computer performance that emphasizes computing capacity. The human brain is estimated to perform between 1013.5 and 1025 FLOPS. Hardware currently costs around $3 x 10-9/FLOPS, or $7 x 10-14/FLOPShour. This makes the current price of hardware which has equivalent computing capacity to the human brain between $1 x 105 and $3 x 1016, or $2/hour-$700bn/hour if hardware is used for five years.

The price of FLOPS has probably decreased by a factor of ten roughly every four years in the last quarter of a century.


Main articles: Brain performance in TEPSThe cost of TEPS 

Traversed Edges Per Second (TEPS) is a measure of computer performance that emphasizes communication capacity. The human brain is estimated to perform at 0.18 – 6.4 x 105 GTEPS. Communication capacity costs around $11,000/GTEP or $0.26/GTEPShour in 2015, when amortized over five years and combined with energy costs. This makes the current price of hardware which has equivalent communication capacity to the human brain around $200M – $7B in total, or $4,700 – $170,000/hour including energy costs.

We estimate that the price of TEPS falls by a factor of ten every four years, based the relationship between TEPS and FLOPS.

Information storage

Main articles: Information storage in the brainCosts of information storageCosts of human-level information storage

Computer memory comes in primary and secondary forms. Primary memory (e.g. RAM) is intended to be accessed frequently, while secondary memory is slower to access but has higher capacity. Here we estimate the secondary memory requirements ofthe brain. The human brain is estimated to store around 10-100TB of data. Secondary storage costs around $30/TB in 2015. This means it costs $300-3,000 for enough storage to store the contents of a human brain, or $0.007-$0.07/hour if hardware is used for five years.

In the long run the price of secondary memory has declined by an order of magnitude roughly every 4.6 years. However the rate has declined so much that prices haven’t substantially dropped since 2011 (in 2015).

Interpreting partial costs

Calculating the total cost of hardware that is relevantly equivalent to the brain is not as simple as adding the partial costs as listed. FLOPS and TEPS are measures of different capabilities of the same hardware, so if you pay for TEPS at the aforementioned prices, you will also receive FLOPS.

The above list is also not exhaustive: there may be substantial hardware costs that we haven’t included.

Brain performance in FLOPS

Five credible estimates of brain performance in terms of FLOPS that we are aware of are spread across the range from 3 x 1013 to 1025. The median estimate is 1018.



We have not investigated the brain’s performance in FLOPS in detail. This page summarizes others’ estimates that we are aware of. Text on this page was heavily borrowed from a blog post, Preliminary prices for human-level hardware.


Sandberg and Bostrom 2008

Sandberg and Bostrom project the processing required to emulate a human brain at different levels of detail.1 For the three levels that their workshop participants considered most plausible, their estimates are 1018, 1022, and 1025 FLOPS. These would cost around $100K/hour, $1bn/hour and $1T/hour in 2015.

Moravec 2009

Moravec (2009) estimates that the brain performs around 100 million MIPS.2 MIPS are not directly comparable to MFLOPS (millions of FLOPS), and have deficiencies as a measure, but the empirical relationship in computers is something like MFLOPS = 2.3 x MIPS0.89, according to Sandberg and Bostrom.3 This suggests Moravec’s estimate coincides with around 3.0 x 1013 FLOPS. Given that an order of magnitude increase in computing power per dollar corresponds to about four years, knowing that MFLOPS and MIPS are roughly comparable is plenty of precision.

Kurzweil 2005

In The Singularity is Near, Kurzweil claimed that a human brain required 1016 calculations per second, which appears to be roughly equivalent to 1016 FLOPS.4


Index of articles about hardware

Hardware in terms of computing capacity (FLOPS and MIPS)

Brain performance in FLOPS

Current FLOPS prices

Trends in the cost of computing

Wikipedia history of GFLOPS costs

Hardware in terms of communication capacity (TEPS)

Brain performance in TEPS (includes the cost of brain-level TEPS performance on current hardware)

The cost of TEPS (includes current costs, trends and relationship to other measures of hardware price)

Information storage

Information storage in the brain

Costs of information storage

Costs of human-level information storage


Costs of human-level hardware

Research topic: hardware, software and AI

Index of articles about hardware

Related blog posts

Preliminary prices for human level hardware (4 April 2015)

A new approach to predicting brain-computer parity (7 May 2015)

Time flies when robots rule the earth (28 July 2015)

One Comment

  1. AI Impacts – The AI Impacts Blog Says :

    2015-04-01 at 7:27 AM

    […] Articles […]