This page contains a list of relatively well specified AI tasks designed for forecasting. Currently all entries were used in the 2016 Expert Survey on Progress in AI.


  1. Translate a text written in a newly discovered language into English as well as a team of human experts, using a single other document in both languages (like a Rosetta stone). Suppose all of the words in the text can be found in the translated document, and that the language is a difficult one.
  2. Translate speech in a new language given only unlimited films with subtitles in the new language. Suppose the system has access to training data for other languages, of the kind used now (e.g. same text in two languages for many languages and films with subtitles in many languages).
  3. Perform translation about as good as a human who is fluent in both languages but unskilled at translation, for most types of text, and for most popular languages (including languages that are known to be difficult, like Czech, Chinese and Arabic).
  4. Provide phone banking services as well as human operators can, without annoying customers more than humans. This includes many one-off tasks, such as helping to order a replacement bank card or clarifying how to use part of the bank website to a customer.
  5. Correctly group images of previously unseen objects into classes, after training on a similar labeled dataset containing completely different classes. The classes should be similar to the ImageNet classes.
  6. One-shot learning: see only one labeled image of a new object, and then be able to recognize the object in real world scenes, to the extent that a typical human can (i.e. including in a wide variety of settings). For example, see only one image of a platypus, and then be able to recognize platypuses in nature photos. The system may train on labeled images of other objects.Currently, deep networks often need hundreds of examples in classification tasks1, but there has been work on one-shot learning for both classification2 and generative tasks3.
  7. See a short video of a scene, and then be able to construct a 3D model of the scene good enough to create a realistic video of the same scene from a substantially different angle.
    For example, constructing a short video of walking through a house from a video taking a very different path through the house.
  8. Transcribe human speech with a variety of accents in a noisy environment as well as a typical human can.
  9. Take a written passage and output a recording that can’t be distinguished from a voice actor, by an expert listener.
  10. Routinely and autonomously prove mathematical theorems that are publishable in top mathematics journals today, including generating the theorems to prove.
  11. Perform as well as the best human entrants in the Putnam competition—a math contest whose questions have known solutions, but which are difficult for the best young mathematicians.
  12. Defeat the best Go players, training only on as many games as the best Go players have played.
    For reference, DeepMind’s AlphaGo has probably played a hundred million games of self-play, while Lee Sedol has probably played 50,000 games in his life1.
  13. Beat the best human Starcraft 2 players at least 50% of the time, given a video of the screen.
    Starcraft 2 is a real time strategy game characterized by:

    • Continuous time play
    • Huge action space
    • Partial observability of enemies
    • Long term strategic play, e.g. preparing for and then hiding surprise attacks.
  14. Play a randomly selected computer game, including difficult ones, about as well as a human novice, after playing the game less than 10 minutes of game time. The system may train on other games.
  15. Play new levels of Angry Birds better than the best human players. Angry Birds is a game where players try to efficiently destroy 2D block towers with a catapult. For context, this is the goal of the IJCAI Angry Birds AI competition1.
  16. Outperform professional game testers on all Atari games using no game-specific knowledge. This includes games like Frostbite, which require planning to achieve sub-goals and have posed problems for deep Q-networks1, 2.
  17. Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.

    For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games1, but used hundreds of hours of play to train2.

  18. Fold laundry as well and as fast as the median human clothing store employee.
  19. Beat the fastest human runners in a 5 kilometer race through city streets using a bipedal robot body.
  20. Physically assemble any LEGO set given the pieces and instructions, using non- specialized robotics hardware.

    For context, Fu 20161 successfully joins single large LEGO pieces using model based reinforcement learning and online adaptation.
  21. Learn to efficiently sort lists of numbers much larger than in any training set used, the way Neural GPUs can do for addition1, but without being given the form of the solution.

    For context, Neural Turing Machines have not been able to do this2, but Neural Programmer-Interpreters3 have been able to do this by training on stack traces (which contain a lot of information about the form of the solution).

  22. Write concise, efficient, human-readable Python code to implement simple algorithms like quicksort. That is, the system should write code that sorts a list, rather than just being able to sort lists.

    Suppose the system is given only:

    • A specification of what counts as a sorted list
    • Several examples of lists undergoing sorting by quicksort
  23. Answer any “easily Googleable” factoid questions posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.

    Examples of factoid questions: “What is the poisonous substance in Oleander plants?” “How many species of lizard can be found in Great Britain?”

  24. Answer any “easily Googleable” factual but open ended question posed in natural language better than an expert on the relevant topic (with internet access), having found the answers on the internet.

    Examples of open ended questions: “What does it mean if my lights dim when I turn on the microwave?” “When does home insurance cover roof replacement?”

  25. Give good answers in natural language to factual questions posed in natural language for which there are no definite correct answers.

    For example:”What causes the demographic transition?”, “Is the thylacine extinct?”, “How safe is seeing a chiropractor?”

  26. Write an essay for a high-school history class that would receive high grades and pass plagiarism detectors.

    For example answer a question like ‘How did the whaling industry affect the industrial revolution?’

  27. Compose a song that is good enough to reach the US Top 40. The system should output the complete song as an audio file.
  28. Produce a song that is indistinguishable from a new song by a particular artist, e.g. a song that experienced listeners can’t distinguish from a new song by Taylor Swift.
  29. Write a novel or short story good enough to make it to the New York Times best-seller list.
  30. For any computer game that can be played well by a machine, explain the machine’s choice of moves in a way that feels concise and complete to a layman.
  31. Play poker well enough to win the World Series of Poker.
  32. After spending time in a virtual world, output the differential equations governing that world in symbolic form.

    For example, the agent is placed in a game engine where Newtonian mechanics holds exactly and the agent is then able to conduct experiments with a ball and output Newton’s laws of motion.