AI Impacts talked to AI safety researcher Adam Gleave about his views on AI risk. With his permission, we have transcribed this interview.
- Adam Gleave — PhD student at the Center for Human-Compatible AI, UC Berkeley
- Asya Bergal – AI Impacts
- Robert Long – AI Impacts
We spoke with Adam Gleave on August 27, 2019. Here is a brief summary of that conversation:
- Gleave gives a number of reasons why it’s worth working on AI safety:
- It seems like the AI research community currently isn’t paying enough attention to building safe, reliable systems.
- There are several unsolved technical problems that could plausibly occur in AI systems without much advance notice.
- A few additional people working on safety may be extremely high leverage, especially if they can push the rest of the AI research community to pay more attention to important problems.
- Gleave thinks there’s a ~10% chance that AI safety is very hard in the way that MIRI would argue, a ~20-30% chance that AI safety will almost certainly be solved by default, and a remaining ~60-70% chance that what we’re working on actually has some impact.
- Here are the reasons for Gleave’s beliefs, weighted by how much they factor into his holistic viewpoint:
- 40%: The traditional arguments for risks from AI are unconvincing:
- Traditional arguments often make an unexplained leap from having superintelligent AIs to superintelligent AIs being catastrophically bad.
- It’s unlikely that AI systems not designed from mathematical principles are going to inherently be unsafe.
- They’re long chains of heuristic reasoning, with little empirical validation.
- Outside view: most fears about technology have been misplaced.
- 20%: The AI research community will solve the AI safety problem naturally.
- 20%: AI researchers will be more interested in AI safety when the problems are nearer.
- 10%: The hard, MIRI version of the AI safety problem is not very compelling.
- 10%: AI safety problems that seem hard now will be easier to solve once we have more sophisticated ML.
- 40%: The traditional arguments for risks from AI are unconvincing:
- Here are the reasons for Gleave’s beliefs, weighted by how much they factor into his holistic viewpoint:
- Fast takeoff defined as “GDP will double in 6 months before it doubles in 24 months” is plausible, though Gleave still leans towards slow takeoff.
- Gleave thinks discontinuous progress in AI is extremely unlikely:
- There is unlikely to be a sudden important insight dropped into place, since AI has empirically progressed more by accumulation of lots of bags and tricks and compute.
- There isn’t going to be a sudden influx of compute in the near future, since well-funded organizations are currently already spending billions of dollars to optimize it.
- If we train impressive systems, we will likely train other systems beforehand that are almost as capable.
- Given discontinuous progress, the most likely story is that we combine many narrow AI systems in a way where the integrated whole is much better than half of them.
- Gleave guesses a ~10-20% chance that AGI technology will only be a small difference away from current techniques, and a ~50% chance that AGI technology will be easily comprehensible to current AI researchers:
- There are fairly serious roadblocks in current techniques right now, e.g. memory, transfer learning, Sim2Real, sample inefficiency.
- Deep learning is slowing down compared to 2012 – 2013:
- Much of the new progress is going to different domains, e.g. deep RL instead of supervised deep learning.
- Computationally expensive algorithms will likely hit limits without new insights.
- Though it seems possible that in fact progress will come from more computationally efficient algorithms.
- Outside view, we’ve had lots of different techniques for AI over time, so it would be surprising is the current one is the right one for AGI.
- Pushing more towards current techniques getting to AGI, from an economic point of view, there is a lot of money going into companies whose current mission is to build AGI.
- Conditional on advanced AI technology being created, Gleave gives a 60-70% chance that it will pose a significant risk of harm without additional safety efforts.
- Gleave thinks that best case, we drive it down to 20 – 10%, median case, we drive it down to 40 – 30%. A lot of his uncertainty comes from how difficult the problem is.
- Gleave thinks he could see evidence that could push him in either direction in terms of how likely AI is to be safe:
- Evidence that would cause Gleave to think AI is less likely to be safe:
- Evidence that thorny but speculative technical problems, like inner optimizers, exist.
- Seeing more arms race dynamics, e.g. between U.S. and China.
- Seeing major catastrophes involving AI, though they would also cause people to pay more attention to risks from AI.
- Hearing more solid arguments for AI risk.
- Evidence that would cause Gleave to think AI is more likely to be safe:
- Seeing AI researchers spontaneously focus on relevant problems would make Gleave think that AI is less risky.
- Getting evidence that AGI was going to take longer to develop.
- Evidence that would cause Gleave to think AI is less likely to be safe:
- Gleave is concerned that he doesn’t understand why members of the safety community come to widely different conclusions when it comes to AI safety.
- Gleave thinks a potentially important question is the extent to which we can successfully influence field building within AI safety.
This transcript has been lightly edited for concision and clarity.
Asya Bergal: We have a bunch of questions, sort of around the issue of– basically, we’ve been talking to people who are more optimistic than a lot of people in the community about AI. The proposition we’ve been asking people to explain their reasoning about is, ‘Is it valuable for people to be expending significant effort doing work that purports to reduce the risk from advanced artificial intelligence?’ To start with, I’d be curious for you to give a brief summary of what your take on that question is, and what your reasoning is.
Adam Gleave: Yeah, sure. The short answer is, yes, I think it’s worth people spending a lot of effort on this, at the margins, it’s still in absolute terms quite a small number. Obviously it depends a bit whether you’re talking about diverting resources of people who are already really dedicated to having a high impact, versus having your median AI researchers work more on safety related things. Maybe you think the median AI researcher isn’t trying to optimize for impact anyway, so the opportunity cost might be lower. The case I see from reducing the risk of AI is maybe weaker than some people in the community, but I think it’s still overall very strong.
The goal of AI as a field is still to build artificial general intelligence, or human-level AI. If we’re successful in that, it does seem like it’s going to be an extremely transformative technology. There doesn’t seem to be any roadblock that would prevent us from eventually reaching that goal. The path to that, the timeline is quite murky, but that alone seems like a pretty strong signal for ‘oh, there should be some people looking at this and being aware of what’s going on.’
And then, if I look at the state of the art in AI, there’s a number of somewhat worrying trends. We seem to be quite good at getting very powerful superhuman systems in narrow domains when we can specify the objective that we want quite precisely. So AlphaStar, AlphaGo, OpenAI Five, these systems are very much lacking in robustness, so you have some quite surprising failure modes. Mostly we see adversarial examples in image classifiers, but some of these RL systems also have somewhat surprising failure modes. This seems to me like an area the AI research community isn’t paying much attention to, and I feel like it’s almost gotten obsessed with producing flashy results rather than necessarily doing good rigorous science and engineering. That seems like quite a worrying trend if you extrapolate it out, because some other engineering disciplines are much more focused on building reliable systems, so I more trust them to get that right by default.
Even in something like aeronautical engineering where safety standards are very high, there are still accidents in initial systems. But because we don’t even have that focus, it doesn’t seem like the AI research community is going to put that much focus on building safe, reliable systems until they’re facing really strong external or commercial pressures to do so. Autonomous vehicles do have a reasonably good safety track record, but that’s somewhere where it’s very obvious what the risks are. So that’s kinda the sociological argument, I guess, for why I don’t think that the AI research community is going to solve all of the safety problems as far ahead of time as I would like.
And then, there’s also a lot of very thorny technical problems that do seem like they’re going to need to be solved at some point before AGI. How do we get some information about what humans actually want? I’m a bit hesitant to use this phrase ‘value learning’ because you could plausibly do this just by imitation learning as well. But there needs to be some way of getting information from humans into the system, you can’t just derive it from first principles, we still don’t have a good way of doing that.
There’s lots of more speculative problems, e.g. inner optimizers. I’m not sure if these problems are necessarily going to be real or cause issues, but it’s not something that we– we’ve not ruled it in or out. So there’s enough plausible technical problems that could occur and we’re not necessarily going to get that much advance notice of, that it seems worrying to just charge ahead without looking into this.
And then to caveat all this, I do think the AI community does care about producing useful technology. We’ve already seen some backlashes against autonomous weapons. People do want to do good science. And when the issues are obvious, there’s going to be a huge amount of focus on them. And it also seems like some of the problems might not actually be that hard to solve. So I am reasonably optimistic that in the default case of there’s no safety community really, things will still work out okay, but it also seems like the risk is large enough that just having a few people working on it can be extremely high leverage, especially if you can push the rest of the AI research community to pay a bit more attention to these problems.
Does that answer that question?
Asya Bergal: Yeah, it totally does.
Robert Long: Could you say a little bit more about why you think you might be more optimistic than other people in the safety community?
Adam Gleave: Yeah, I guess one big reason is that I’m still not fully convinced by a lot of the arguments for risks from AI. I think they are compelling heuristic arguments, meaning it’s worth me working on this, but it’s not compelling enough for me to think ‘oh, this is definitely a watertight case’.
I think the common area where I just don’t really follow the arguments is when you say, ‘oh, you have this superintelligent AI’. Let’s suppose we get to that, that’s already kind of a big leap of faith. And then if it’s not aligned, humans will die. It seems like there’s just a bit of a jump here that no one’s really filled in.
In particular it seems like sure, if you have something sufficiently capable, both in terms of intelligence and also access to other resources, it could destroy humanity. But it doesn’t just have to be smarter than an individual human, it has to be smarter than all of humanity potentially trying to work to combat this. And humanity will have a lot of inside knowledge about how this AI system works. And it’s also starting from a potentially weakened position in that it doesn’t already have legal protection, property ownership, all these other things.
I can certainly imagine there being scenarios unfolding where this is a problem, so maybe you actually give an AI system a lot of power, or it just becomes so, so much more capable than humans that it really is able to outsmart all of us, or it might just be quite easy to kill everyone. Maybe civilization is just much more fragile than we think. Maybe there are some quite easy bio ex-risks or nanotech that you could reason about from first principles. If it turned out that a malevolent but very smart human could kill all of humanity, then I would be more worried about the AI problem, but then maybe we should also be working on the human x-risk problem. So that’s one area that I’m a bit skeptical about, though maybe flushing that argument out more is bad for info-hazard reasons.
Then the other thing is I guess I feel like there’s a distribution of how difficult the AI safety problem is going to be. So there’s one world where anything that is not designed from mathematical principles is just going to be unsafe– there are going to be failure modes we haven’t considered, these failure modes are only going to arise when the system is smart enough to hurt you, and the system is going to be actively trying to deceive you. So this is I think, maybe a bit of a caricature, but I think this is roughly MIRI’s viewpoint. I think this is a productive viewpoint to inhabit when you’re trying to identify problems, but I think it’s probably not the world we actually live in. If you can solve that version, great, but it seems like a lot of the failure modes that are going to occur with advanced AI systems you’re going to see signs of earlier, especially if you’re actually looking out for them.
I don’t see much reason for AI progress to be discontinuous in particular. So there’s a lot of empirical records you could bring to bear on this, and it also seems like a lot of commercially valuable interesting research applications are going to require solving some of these problems. You’ve already seen this with value learning, that people are beginning to realize that there’s a limitation to what we can just write a reward function down for, and there’s been a lot more focus on imitation learning recently. Obviously people are solving much narrower versions of what the safety community cares about, but as AI progresses, they’re going to work on broader and broader versions of these problems.
I guess the general skepticism I have with the arguments, is, a lot of them take the form of ‘oh, there’s this problem that we need to solve and we have no idea how to solve it,’ but forget that we only need to solve that problem once we have all this other treasure trove of AI techniques that we can bring to bear on the problem. It seems plausible that this very strong unsupervised learning is going to do a lot of heavy lifting for us, maybe it’s going to give us a human ontology, it’s going to give us quite a good inductive bias for learning values, and so on. So there’s just a lot of things that might seem a lot stickier than they actually are in practice.
And then, I also have optimism that yes, the AI research community is going to try to solve these problems. It’s not like people are just completely disinterested in whether their systems cause harm, it’s just that right now, it seems to a lot of people very premature to work on this. There’s a sense of ‘how much good can we do now, where nearer to the time there’s going to just be naturally 100s of times more people working on the problem?’. I think there is still value you can do now, in laying the foundations of the field, but that maybe gives me a bit of a different perspective in terms of thinking, ‘What can we do that’s going to be useful to people in the future, who are going to be aware of this problem?’ versus ‘How can I solve all the problems now, and build a separate AI safety community?’.
I guess there’s also the outside view of just, people have been worried about a lot of new technology in the past, and most of the time it’s worked out fine. I’m not that compelled by this. I think there are real reasons to think that AI is going to be quite different. I guess there’s also just the outside view of, if you don’t know how hard a problem is, you should put a probability distribution over it and have quite a lot of uncertainty, and right now we don’t have that much information about how hard the AI safety problem is. Some problems seem to be pretty tractable, some problems seem to be intractable, but we don’t know if they actually need to be solved or not.
So, decent chance– I think I put a reasonable probability, like 10% probability, on the hard-mode MIRI version of the world being true. In which case, I think there’s probably nothing we can do. And I also put a significant probability, 20-30%, on AI safety basically not needing to be solved, we’ll just solve it by default unless we’re completely completely careless. And then there’s this big chunk of probability mass in the middle where maybe what we’re working on will actually have an impact, and obviously it’s hard to know whether at the margin, you’re going to be changing the outcome.
Asya Bergal: I’m curious– I think a lot of people we’ve talked to, some people have said somewhat similar things to what you said. And I think there’s two classic axes on which peoples’ opinions differ. One is this slow takeoff, fast takeoff proposition. The other is whether they think something that looks like current methods is likely to lead to AGI. I’m curious on your take on both those questions.
Adam Gleave: Yeah, sure. So, for slow vs. fast takeoff, I feel like I need to define the terms for people who use them in slightly different ways. I don’t expect there to be a discontinuity, in the sense of, we just see this sudden jump. But I wouldn’t be that surprised if there was exponential growth and quite a high growth rate. I think Paul defines fast takeoff as, GDP will double in six months before it doubles in 24 months. I’m probably mangling that but it was something like that. I think that scenario of fast takeoff seems plausible to me. I probably am still leaning slightly more towards the slow takeoff scenario, but it seems like fast takeoff will be plausible in terms of very fast exponential growth.
I think a lot of the case for the discontinuous progress argument falls on there being sudden insight that dropped into place, and it doesn’t seem to me like that’s what’s happening in AI, it’s more just a cumulation of lots of bags of tricks and a lot of compute. I also don’t see there being bags of compute falling out of the sky. Maybe if there was another AI winter, leading to a hardware overhang, then you might see sudden progress when AI gets funding again. But right now a lot of very well-funded organizations are spending billions of dollars on compute, including developing new application-specific integrated circuits for AI, so we’re going to be very close to the physical limits there anyway.
Probably the strongest case I see for discontinuities are the discontinuities you see when you’re training systems. But I just don’t think that’s going to be strong enough, because you’ll train other systems before that’ll be almost as capable. I guess we do see sometimes cases where one technique lets you solve a new class of problems.
Maybe you could see something where you get increasingly capable narrow systems, and there’s not a discontinuity overall, you already had very strong narrow AI. But eventually you just have so many narrow AI systems that they can basically do everything, and maybe you get to a stage where the integrated whole of those is much stronger than if you just had half of them, let’s say. I guess this is sort of the comprehensive AI services model. But again that seems a bit unlikely to me, because most of the time you can probably outsource some other chunks to humans if you really needed to. But yeah, I think it’s a bit more plausible than some of the other stories.
And then, in terms of whether I think current techniques are likely to get us to human-level AI– I guess I put significant probability mass on that depending on how narrowly you define it. One fuzzy definition is that a PhD thesis describing AGI being something that a typical AI researcher today could read and understand without too much work. Under this definition I’d assign 40 – 50%. And that could still include introducing quite a lot of new techniques, right, but just– I mean plausibly I think something based on deep learning, deep RL, you could describe to someone in the 1970s in a PhD thesis and they’d still understand it. But it’s just showing you, it’s not that much real theory that was developed, it was applying some pretty simple algorithms and a lot of compute in the right way. Which implies no huge new theoretical insights.
But if we’re defining it more narrowly, only allowing small variants of current techniques, I think that’s much less likely to lead to AGI: around 10-20%. I think that case is almost synonymous with the argument that you just need more compute, because it seems like there are so many things right now that we really cannot do: we still don’t have great solutions to memory, we still can’t really do transfer learning, Sim2Real just barely works sometimes. We’re still extremely sample inefficient. It just feels like all of those problems are going to require quite a lot of research in themselves. I can’t see there being one simple trick that would solve all of them. But maybe, current algorithms if you gave them 10000x compute would do a lot better on these, that is somewhat plausible.
And yeah, I do put fairly significant probability, 50%, on it being something that is kind of radically different. And I guess there’s a couple of reasons for that. One is, just trying to extrapolate progress forward, it does seem like there are some fairly serious roadblocks. Deep learning is slowing down in terms of, it’s not hitting as many big achievements as it was in the past. And also just AI has had many kinds of fads over time, right. We’ve had good old-fashioned AI, symbolic AI, we had expert systems, we had Bayesianism. It would be sort of surprising that the current method is the right one.
I don’t find that people are focusing on these techniques is necessarily particularly strong evidence that these systems are going to lead us to AGI. First, many researchers are not focused on AGI, and you can probably get useful applications out of current techniques. Second, AI research seems like it can be quite fashion driven. Obviously, there are organizations whose mission is to build AGI who are working within the current paradigm. And I think it is probably still the best bet, of the things that we know, but I still think it’s a bet that’s reasonably unlikely to pay off.
Does that answer your question?
Asya Bergal: Yeah.
Robert Long: Just on that last bit, you said– I might just be mixing up the different definitions you had and your different credences in those– but in the end there you said that’s a bet that you think is reasonably unlikely to pay off, but you’d also said 50% that it’s something radically different, so how– I think I was just confusing which ones you were on.
Adam Gleave: Right. So, I guess these definitions are all quite fuzzy, but I was saying 10-20% that something that is only a small difference away from current techniques would build AGI, and 50% that AGI was going to be comprehensible to us. I guess the distinction I’m trying to draw is the narrow one, which I give 10-20% credence, is we basically already have the right algorithms and we just need a few tricks and more compute. And the other more expansive definition, which I give 40-50% credence to, is allows for completely different algorithms, but excludes any deep theoretical insight akin to a whole new field of mathematics. So we might not be using back propagation any longer, we might not be using gradient descent, but it’ll be something similar — like the difference between gradient descent and evolutionary algorithms.
There’s a separate question of, if you’re trying to build AGI right now, where should you be investing your resources? Should you be trying to come up with a completely new novel theory, or should you be trying to scale up current techniques? And I think it’s plausible that you should just be trying to scale up techniques and figure out if we can push them forward, because trying to come up with a completely new way of doing AI is also very challenging, right. It’s not really a sort of insight you can force.
Asya Bergal: You kind of covered this earlier– and maybe you even said the exact number, so I’m sorry if this is a repeat. But one thing we’ve been asking people is the credence that without additional intervention– so imagining a world where EA wasn’t pushing for AI safety, and there wasn’t this separate AI safety movement outside of the AI research community, imagining that world. In that world, what is the chance that advanced artificial intelligence poses a significant risk of harm?
Adam Gleave: The chance it does cause a significant risk of harm?
Asya Bergal: Yeah, that’s right.
Adam Gleave: Conditional on advanced artificial intelligence being created, I think 60, 70%. I have a much harder time giving an unconditional probability, because there are other things that could cause humanity to stop developing AI. Is a conditional probability good enough, or do you want me to give an unconditional one?
Asya Bergal: No, I think the conditional one is what we’re looking for.
Robert Long: Do you have a hunch about how much we can expect dedicated efforts to drive down that probability? That is, the EA-focused AI safety efforts.
Adam Gleave: I think the best case is, you drive it down to 20 – 10%. I’m kind of picturing a lot of this uncertainty coming from just, how hard is the problem technically? And if we do inhabit this really hard version where you have to solve all of the problems perfectly and you have to have a formally verified AI system, I just don’t think we’re going to do that in time. You’d have to solve a very hard coordination problem to stop people developing AI without those safety checks. It seems like a very expensive process, developing safe AI.
I guess the median case, where the AI safety community just sort of grows at its current pace, I think maybe that gets it down to 40 – 30%? But I have a lot of uncertainty in these numbers.
Asya Bergal: Another question, going back to original statements for why you believe this– do you think there’s plausible concrete evidence that we could get or are likely to get that would change your views on this one direction or the other?
Adam Gleave: Yeah, so, seeing evidence of some of the more thorny but currently quite speculative technical problems, like inner optimizers, would make me update towards, ‘oh, this is just a really hard technical problem, and unless we really work hard on this, the default outcome is definitely going to be bad’. Right now, no one’s demonstrated an inner optimizer existing, it’s just a sort of theoretical problem. This is a bit of an unfair thing to ask in some sense, in that the whole reason that people are worried about this is that it’s only a problem with very advanced AI systems. Maybe I’m asking for evidence that can’t be provided. But relative to many other people, I am unconvinced by heuristic arguments appealing just to mathematical intuitions. I’m much more convinced either by very solid theoretical arguments that are proof-based, or by empirical evidence.
Another thing that would update me in a positive direction, as in AI seems less risky, would be seeing more AI researchers spontaneously focus on some relevant problems. There’s already, I guess this is a bit of a tangent, but I think maybe– people tend to conceive as the AI safety community as people who would identify as AI safety researchers. But I think the vast majority of AI safety research work is happening by people who have never heard of AI safety, but they have been working on related problems. This is useful to me all of the time. I think where we could plausibly end up having a lot more of this work happening without AI safety ever really becoming a thing is people realizing ‘oh, I want my robot to do this thing and I have a really hard time making it do that, let’s come up with a new imitation learning technique’.
But yeah, other things that could update me positively… I guess, AI seeming like a harder problem, as in, it seems like AI, general artificial intelligence is further away, that would probably update me in a positive direction. It’s not obvious but I think generally all else being equal, longer timelines is going to generally have more time to diagnose problems. And also it seems like the current set of AI techniques — deep learning and very data-driven approaches — are particularly difficult to analyze or prove anything about, so some other paradigm is probably going to be better, if possible.
Other things that would make me scared would be more arms race dynamics. It’s been very sad to me what we’re seeing with China – U.S. arms race dynamics around AI, especially since it doesn’t even seem like there is much direct competition, but that meme is still being pushed for political reasons.
Any actual major catastrophes involving AI would make me think it’s more risky, although it would also make people pay more attention to AI risk, so I guess it’s not obvious what direction it would push overall. But it certainly would make me think that there’s a bit more technical risk.
I’m trying to think if there’s anything else that would make more pessimistic. I guess just more solid arguments for AI safety, because a lot of my skepticism is coming from there’s just this very unlikely sounding set of ideas, and there are just heuristic arguments that I’m convinced enough by to work on the problem, but not convinced by enough to say, this is definitely going to happen. And if there was a way to patch some of the holes in those arguments, then I probably would be more convinced as well.
Robert Long: Can I ask you a little bit more about evidence for or against AGI being a certain distance away? You mentioned that as evidence that would change your mind. What sort of evidence do you have in mind?
Adam Gleave: Sure, so I guess a lot of the short timelines scenarios basically are coming from current ML techniques scaling to AGI, with just a bit more compute. So, watching for if those milestones are being achieved at the rate I was expecting, or slower.
This is a little bit hard to crystallize, but I would say right now it seems like the rate of progress is slowing down compared to something like 2012, 2013. And interestingly, I think a lot of the more interesting progress has come from, I guess, from going to different domains. So we’ve seen maybe a little bit more progress happening in deep RL compared to supervised deep learning. And the optimistic thing is to say, well, that’s because we’ve solved supervised learning, but we haven’t really. We’ve got superhuman performance on ImageNet, but not on real images that you just take on your mobile phone. And it’s still very sample inefficient, we can’t do few-shot learning well. Sometimes it seems like there’s a lack of interest on the part of the research community in solving some of these problems. I think it’s partly because no one has a solid angle of attack on solving these problems.
Similarly, while some of the recent progress in deep RL has been very exciting, it seems to have some limits. For example, AlphaStar and OpenAI Five both involved scaling up self-play and population based training. These were hugely computationally expensive, and that was where a lot of the scaling was coming from. So while there have been algorithmic improvements, I don’t see how you get this working in much more complicated environments without either huge additional compute or some major insights. These are things that are pushing me towards thinking deep learning will not continue to scale, and therefore very short timelines are unlikely.
Something that would update me towards shorter timelines would be if something that I thought was impossible turns out to be very easy. So OpenAI Five did update me positively, because I just didn’t think PPO was going to work well in Dota and it turns out that it does if you have enough compute. I don’t think it updated me that strongly towards short timelines, because it did need a lot of compute, and if you scale it to a more complex game you’re going to have exponential scaling. But it did make me think, well, maybe there isn’t a deep insight required, maybe this is going to be much more about finding more computationally efficient algorithms rather than lots of novel insights.
I guess there’s also sort of economic factors– I mention mostly because I often see people neglecting them. One thing that makes me bullish on short timelines is that, there’s some very well-resourced companies whose mission is to build AGI. OpenAI just raised a billion, DeepMind is spending considerable resources. As long as this continues, it’s going to be a real accelerator. But that could go away: if AI doesn’t start making people money, I expect another AI winter.
Robert Long: One thing we’re asking people, and again I think you’ve actually already given us a pretty good sense of this, is just a relative weighting of different considerations. And as I say that, you actually have already been tagging this. But just to half review, from what I’ve scrawled down. A lot of different considerations in your relative optimism are: cases for AI as an x-risk being not as watertight as you’d like them, arguments for failure modes being the default and really hard, not being sold on those arguments, ideas that these problems might become easier to solve the closer we get to AGI when we have more powerful techniques, and then the general hope that people will try to solve them as we get closer to AI. Yeah, I think those were at least some of the main considerations I got. How strong relatively are those considerations in your reasoning?
Adam Gleave: I’m going to quote numbers that may not add up to 100, so we’ll have to normalize it at the end. I think the skepticism surrounding AI x-risk arguments is probably the strongest consideration, so I would put maybe 40% of my weight on that. This is because the outside view is quite strong to me, so if you talk about this very big problem that there’s not much concrete evidence for, then I’m going to be reasonably optimistic that actually we’re wrong and there isn’t a big problem.
The second most important thing to me is the AI research community solving this naturally. We’re already seeing signs of a set of people beginning to work on related problems, and I see this continuing. So I’m putting 20% of my weight on that.
And then, the hard version of AI safety not seeming very likely to me, I think that’s 10% of the weight. This seems reasonably important if I buy into the AI safety argument in general, because that makes a big difference in terms of how tractable these problems are. What were the other considerations you listed?
Robert Long: Two of them might be so related that you already covered them, but I had distinguished between the problems getting easier the closer we get, and people working more on them the closer we get.
Adam Gleave: Yeah, that makes sense. I think I don’t put that much weight on the problems getting easier. Or I don’t directly put weight on it, maybe it’s just rolled into my skepticism surrounding AI safety arguments, because I’m going to naturally find an argument a bit uncompelling if you say ‘we don’t know how to properly model human preferences’. I’m going to say, ‘Well, we don’t know how to properly do lots of things humans can do right now’. So everything needs to be relative to our capabilities. Whereas I find arguments of the form ‘we can solve problems that humans can’t solve, but only when we know how to specify what those problems are’, that seems more compelling, that’s talking about a relative strength between ability to optimize vs. ability to specify objectives. Obviously that’s not the only AI safety problem, but it’s a problem.
So yeah, I think I’m putting a lot of the weight on people paying more attention to these problems over time, so that’s probably actually 15 – 20% of my weight. And then I’ll put 5% on the problems getting easier and then some residual probability mass on things I haven’t thought about or haven’t mentioned in this conversation.
Robert Long: Is there anything you wish we had asked that you would like to talk about?
Adam Gleave: I guess, I don’t know if this is really useful, but I do wish I had a better sense of what other people in the safety community and outside of it actually thought and why they were working on it, so I really appreciate you guys doing these interviews because it’s useful to me as well. I am generally a bit concerned about lots of people coming to lots of different conclusions regarding how pessimistic we should be, regarding timelines, regarding the right research agenda.
I think disagreement can be healthy because it’s good to explore different areas. The ideal thing would be for us to all converge to some common probability distribution and we decide we’re going to work on different areas. But it’s very hard psychologically to do this, to say, ‘okay, I’m going to be the person working on this area that I think isn’t very promising because at the margin it’s good’– people don’t work like that. It’s better if people think, ‘oh, I am working on the best thing, under my beliefs’. So having some diversity of beliefs is good. But it bothers me that I don’t know why people have come to different conclusions to me. If I understood why they disagree, I’d be happier at least.
I’m trying to think if there’s anything else that’s relevant… yeah, so I guess another, this is merely just a question for you guys to maybe think about, is, I’m still unsure about how valuable field-building should be. And in particular, to what extent AI safety researchers should be working on this. It seems like a lot of reasons why I was optimistic assume the the AI research community is going to solve some of these problems naturally. A natural follow up to that is to ask whether we should be doing something to encourage this to happen, like writing more position papers, or just training up more grad students? Should we be trying to actively push for this rather than just relying on people to organically develop an interest in this research area? And I don’t know whether you can actually change research directions in this way, because it’s very far outside my area of expertise, but I’d love someone to study it.