Takeaways from safety by default interviews

By Asya Bergal, 3 April 2020

Last year, several researchers at AI Impacts (primarily Robert Long and I) interviewed prominent researchers inside and outside of the AI safety field who are relatively optimistic about advanced AI being developed safely. These interviews were originally intended to focus narrowly on reasons for optimism, but we ended up covering a variety of topics, including AGI timelines, the likelihood of current techniques leading to AGI, and what the right things to do in AI safety are right now. 

We talked to Ernest Davis, Paul Christiano, Rohin Shah, Adam Gleave, and Robin Hanson.

Here are some more general things I personally found noteworthy while conducting these interviews. For interview-specific summaries, check out our Interviews Page.

Relative optimism in AI often comes from the belief that AGI will be developed gradually, and problems will be fixed as they are found rather than neglected.

All of the researchers we talked to seemed to believe in non-discontinuous takeoff.1 Rohin gave ‘problems will likely be fixed as they come up’ as his primary reason for optimism,2 Adam3 and Paul4 both mentioned it as a reason.

Relatedly, both Rohin5 and Paul6 said one thing that could update their views was gaining information about how institutions relevant to AI will handle AI safety problems– potentially by seeing them solve relevant problems, or by looking at historical examples.

I think this is a pretty big crux around the optimism view; my impression is that MIRI researchers generally think that 1) the development of human-level AI will likely be fast and potentially discontinuous and 2) people will be incentivized to hack around and redeploy AI when they encounter problems. See Likelihood of discontinuous progress around the development of AGI for more on 1). I think 2) could be a fruitful avenue for research; in particular, it might be interesting to look at recent examples of people in technology, particularly ML, correcting software issues, perhaps when they’re against their short-term profit incentives. Adam said he thought the AI research community wasn’t paying enough attention to building safe, reliable, systems.7

Many of the arguments I heard around relative optimism weren’t based on inside-view technical arguments.

This isn’t that surprising in hindsight, but it seems interesting to me that though we interviewed largely technical researchers, a lot of their reasoning wasn’t based particularly on inside-view technical knowledge of the safety problems. See the interviews for more evidence of this, but here’s a small sample of the not-particularly-technical claims made by interviewees:

  • AI researchers are likely to stop and correct broken systems rather than hack around and redeploy them.8
  • AI has and will progress via a cumulation of lots of small things rather than via a sudden important insight.

My instinct when thinking about AGI is to defer largely to safety researchers, but these reasons felt noteworthy to me in that they seemed like questions that were perhaps better answered by economists or sociologists (or for the latter case, neuroscientists) than safety researchers. I really appreciated Robin’s efforts to operationalize and analyze the second claim above.

(Of course, many of the claims were also more specific to machine learning and AI safety.)

There are lots of calls for individuals with views around AI risk to engage with each other and understand the reasoning behind  fundamental disagreements. 

This is especially true around views that MIRI have, which many optimistic researchers reported not having a good understanding of.

This isn’t particularly surprising, but there was a strong universal and unprompted theme that there wasn’t enough engagement around AI safety arguments. Adam and Rohin both said they had a much worse understanding than they would like of others viewpoints.9 Robin10 and Paul11 both pointed to some existing but meaningful unfinished debate in the space.

3 April 2020


  1. Paul Christiano: https://sideways-view.com/2018/02/24/takeoff-speeds/

    Rohin Shah: “I don’t know, in a world where fast takeoff is true, lots of things are weird about the world, and I don’t really understand the world. So I’m like, “Shit, it’s quite likely something goes wrong.” I think the slow takeoff is definitely a crux. Also, we keep calling it slow takeoff and I want to emphasize that it’s not necessarily slow in calendar time. It’s more like gradual. … Yeah. And there’s no discontinuity between… you’re not like, “Here’s a 2X human AI,” and a couple of seconds later it’s now… Not a couple of seconds later, but like, “Yeah, we’ve got 2X AI,” for a few months and then suddenly someone deploys a 10,000X human AI. If that happened, I would also be pretty worried. It’s more like there’s a 2X human AI, then there’s like a 3X human AI and then a 4X human AI. Maybe this happens from the same AI getting better and learning more over time. Maybe it happens from it designing a new AI system that learns faster, but starts out lower and so then overtakes it sort of continuously, stuff like that.”
    — From Conversation with Rohin Shah

    Adam Gleave: “I don’t see much reason for AI progress to be discontinuous in particular. So there’s a lot of empirical records you could bring to bear on this, and it also seems like a lot of commercially valuable interesting research applications are going to require solving some of these problems. You’ve already seen this with value learning, that people are beginning to realize that there’s a limitation to what we can just write a reward function down for, and there’s been a lot more focus on imitation learning recently. Obviously people are solving much narrower versions of what the safety community cares about, but as AI progresses, they’re going to work on broader and broader versions of these problems.”
    — From Conversastion with Adam Gleave

    Robin Hanson: “That argument was a very particular one, that this would appear under a certain trajectory, under a certain scenario. That was a scenario where it would happen really fast, would happen in a very concentrated place in time, and basically once it starts, it happens so fast, you can’t really do much about it after that point. So the only chance you have is before that point. … But I was doubting that scenario. I was saying that that wasn’t a zero probability scenario, but I was thinking it was overestimated by him and other people in that space. I still think many people overestimate the probability of that scenario. Over time, it seems like more people have distanced themselves from that scenario, yet I haven’t heard as many substitute rationales for why we should do any of this stuff early.”
    — From Conversation with Robin Hanson
  2. “The first one I had listed is that continual or gradual or slow takeoff, whatever you want to call it, allows you to correct the AI system online. And also it means that AI systems are likely to fail in not extinction-level ways before they fail in extinction-level ways, and presumably we will learn from that and not just hack around it and fix it and redeploy it.”
    — From Conversation with Rohin Shah
  3. “And then, I also have optimism that yes, the AI research community is going to try to solve these problems. It’s not like people are just completely disinterested in whether their systems cause harm, it’s just that right now, it seems to a lot of people very premature to work on this. There’s a sense of ‘how much good can we do now, where nearer to the time there’s going to just be naturally 100s of times more people working on the problem?’. I think there is still value you can do now, in laying the foundations of the field, but that maybe gives me a bit of a different perspective in terms of thinking, ‘What can we do that’s going to be useful to people in the future, who are going to be aware of this problem?’ versus ‘How can I solve all the problems now, and build a separate AI safety community?’.”
    — From Conversation with Adam Gleave

  4. “Before we get to resources or people, I think one of the basic questions is, there’s this perspective which is fairly common in ML, which is like, ‘We’re kind of just going to do a bunch of stuff, and it’ll probably work out’. That’s probably the basic thing to be getting at. How right is that?

    This is the bad view of safety conditioned on– I feel like prosaic AI is in some sense the worst– seems like about as bad as things would have gotten in terms of alignment. Where, I don’t know, you try a bunch of shit, just a ton of stuff, a ton of trial and error seems pretty bad. Anyway, this is a random aside maybe more related to the previous point. But yeah, this is just with alignment. There’s this view in ML that’s relatively common that’s like, we’ll try a bunch of stuff to get the AI to do what we want, it’ll probably work out. Some problems will come up. We’ll probably solve them. I think that’s probably the most important thing in the optimism vs pessimism side.”
    — From Conversation with Paul Christiano
  5. “I think I could imagine getting more information from either historical case studies of how people have dealt with new technologies, or analyses of how AI researchers currently think about things or deal with stuff, could change my mind about whether I think the AI community would by default handle problems that arise, which feels like an important crux between me and others.”
    — From Conversation with Rohin Shah
  6. “One can learn… I don’t know very much about any of the relevant institutions, I may know a little bit. So you can imagine easily learning a bunch about them by observing how well they solve analogous problems or learning about their structure, or just learning better about the views of people. That’s the second category.”
    — From Conversation with Paul Christiano
  7. “And then, if I look at the state of the art in AI, there’s a number of somewhat worrying trends. We seem to be quite good at getting very powerful superhuman systems in narrow domains when we can specify the objective that we want quite precisely. So AlphaStar, AlphaGo, OpenAI Five, these systems are very much lacking in robustness, so you have some quite surprising failure modes. Mostly we see adversarial examples in image classifiers, but some of these RL systems also have somewhat surprising failure modes. This seems to me like an area the AI research community isn’t paying much attention to, and I feel like it’s almost gotten obsessed with producing flashy results rather than necessarily doing good rigorous science and engineering. That seems like quite a worrying trend if you extrapolate it out, because some other engineering disciplines are much more focused on building reliable systems, so I more trust them to get that right by default.

    Even in something like aeronautical engineering where safety standards are very high, there are still accidents in initial systems. But because we don’t even have that focus, it doesn’t seem like the AI research community is going to put that much focus on building safe, reliable systems until they’re facing really strong external or commercial pressures to do so. Autonomous vehicles do have a reasonably good safety track record, but that’s somewhere where it’s very obvious what the risks are. So that’s kinda the sociological argument, I guess, for why I don’t think that the AI research community is going to solve all of the safety problems as far ahead of time as I would like.”
    — From Conversation with Adam Gleave

  8. See footnotes 2 – 4 above.
  9. “I guess, I don’t know if this is really useful, but I do wish I had a better sense of what other people in the safety community and outside of it actually thought and why they were working on it, so I really appreciate you guys doing these interviews because it’s useful to me as well. I am generally a bit concerned about lots of people coming to lots of different conclusions regarding how pessimistic we should be, regarding timelines, regarding the right research agenda. 

    I think disagreement can be healthy because it’s good to explore different areas. The ideal thing would be for us to all converge to some common probability distribution and we decide we’re going to work on different areas. But it’s very hard psychologically to do this, to say, ‘okay, I’m going to be the person working on this area that I think isn’t very promising because at the margin it’s good’– people don’t work like that. It’s better if people think, ‘oh, I am working on the best thing, under my beliefs’. So having some diversity of beliefs is good. But it bothers me that I don’t know why people have come to different conclusions to me. If I understood why they disagree, I’d be happier at least.”
    — From Conversation with Adam Gleave

    “Slow takeoff versus fast takeoff…. I feel like MIRI still apparently believes in fast takeoff. I don’t have a clear picture of these reasons, I expect those reasons would move me towards fast takeoff. … Yeah, there’s a lot of just like.. MIRI could say their reasons for believing things and that would probably cause me to update. Actually, I have enough disagreements with MIRI that they may not update me, but it could in theory update me.”
    — From Conversation with Rohin Shah

  10. “My experience is that I’ve just written on this periodically over the years, but I get very little engagement. Seems to me there’s just a lack of a conversation here. Early on, Eliezer Yudkowsky and I were debating, and then as soon as he and other people just got funding and recognition from other people to pursue, then they just stopped engaging critics and went off on pursuing their stuff.

    Which makes some sense, but these criticisms have just been sitting and waiting. Of course, what happens periodically is they are most eager to engage the highest status people who criticize them. So periodically over the years, some high-status person will make a quip, not very thought out, at some conference panel or whatever, and they’ll be all over responding to that, and sending this guy messages and recruiting people to talk to him saying, “Hey, you don’t understand. There’s all these complications.”

    Which is different from engaging the people who are the longest, most thoughtful critics. There’s not so much of that going on. You are perhaps serving as an intermediary here. But ideally, what you do would lead to an actual conversation. And maybe you should apply for funding to have an actual event where people come together and talk to each other. Your thing could be a preliminary to get them to explain how they’ve been misunderstood, or why your summary missed something; that’s fine. If it could just be the thing that started that actual conversation it could be well worth the trouble.”
    — From Conversation with Robin Hanson
  11. “And I don’t know, I mean this has been a project that like, it’s a hard project. I think the current state of affairs is like, the MIRI folk have strong intuitions about things being hard. Essentially no one in… very few people in ML agree with those, or even understand where they’re coming from. And even people in the EA community who have tried a bunch to understand where they’re coming from mostly don’t. Mostly people either end up understanding one side or the other and don’t really feel like they’re able to connect everything. So it’s an intimidating project in that sense. I think the MIRI people are the main proponents of the everything is doomed, the people to talk to on that side. And then in some sense there’s a lot of people on the other side who you can talk to, and the question is just, who can articulate the view most clearly? Or who has most engaged with the MIRI view such that they can speak to it?”
    — From Conversation with Paul Christiano

We welcome suggestions for this page or anything on the site via our feedback box, though will not address all of them.