AI Impacts is beginning a serious hiring round (see here for job postings), so I’d like to explain a bit why it has been my own best guess at the highest impact place for me to work for me. (As in, this is a personal blog post by Katja on the AI Impacts blog, not some kind of officialesque missive from the organization.)
What is AI Impacts?
AI Impacts is a few things:
- An online library of best-guess answers to questions about the future of AI. Including big questions, like ‘how likely is a sudden jump in AI progress at around human-level performance?’, and sub-questions informing those answers (‘are discontinuities common in technological trends?’), and sub-sub questions (‘did penicillin cause any discontinuous changes in syphilis trends?’), and so on. Each page ideally has a high-level conclusion at the top, and reasoning supporting it below, which will often call on the conclusions of other pages. These form something like a set of trees, with important, hard, decision-relevant questions at the root and low-level, tractable, harder-to-use-on-their-own questions at the leaves. This isn’t super obvious at the moment, because a lot of the trees are very incomplete, but that’s the basic idea.
- A research group focused on finding such answers, through a mixture of original research and gathering up that which has been researched by others.
- A blog on these topics, for more opinionated takes, conversational guides to the research, updates, and other things that don’t fit in the main library (like this!).
- A locus of events for people interested in this kind of research, e.g. dinners and workshops, a Slack with other researchers, online coffees.
Why think working on AI Impacts is among the best things to do?
1. AI risk looks like a top-notch cause area
It seems plausible to me that advanced AI poses a substantial risk to humanity’s survival. I don’t think this is clear, but I do think there’s enough evidence that it warrants a lot of attention. I hope to write more about this, see here for recent discussion. Furthermore, I don’t know of other similarly serious risks (see Ord’s The Precipice for a review), or of other intervention areas that look clearly more valuable than reducing existential risk to humanity.
I actually also think AI risk is a potentially high-impact area to work (for a little while at least) if AI isn’t a huge existential risk to humanity, because so many capable and well-intentioned people are dedicating themselves to it. Demonstrating that it wasn’t that bad could redirect mountains of valuable effort to real problems.
2. Understanding the situation beats intervening on the current margin
Within the area of mitigating AI risk, there are several broad classes of action being taken. Technical safety research focuses on building AI that won’t automatically cause catastrophe. AI Governance focuses on maneuvering the policy landscape to lower risk. These are both kinds of intervention: ‘intervening’ is a meta-category, and the other main meta-category in my mind is ‘understanding the situation’. My own best guess is that on the current margin, ‘understanding the situation’ is a better place for an additional person with general skills than any particular intervening that I know of. (Or maybe it’s only almost as good—I flip-flop, but it doesn’t really matter much: the important thing is that for some large part of the space of people and their skills and characteristics, it seems better.)
By ‘understanding the situation’, I mean for instance working toward better answers to questions like these:
- Fast or slow takeoff?
- What concrete kinds of problems might destroy humanity? E.g. single AI god intentionally murders everyone with nanotech vs. large economy gradually drifts away from human comprehension or control?
- Is there a single relevant ‘deployment’?
- If so, what does it look like?
- Would we be safe if AI systems weren’t ‘agentic’?
- Do not-intentionally-agentic things readily become agentic things? Under what circumstances?
- How fast would an intelligence explosion go?
- Is it possible to describe a plausible future where things go well? (Is it possible to describe a plausible future where things go badly?)
Carrying out any particular intervention also involves a lot of ‘understanding the situation’, but I think this is often at a different level. For instance, if you decide to intervene by trying to get AI labs to collaborate with each other, you might end up accruing better models of how people at AI projects interact socially, how decisions are made, how running events works, and so on, because these things are part of the landscape between you and your instrumental goal: improving collaboration between AI projects. You probably also learn about things around you, like what kinds of AI projects people are doing. But you don’t get to learn much at all about how the achievement of your goal affects the future of AI. (I fear that in general this situation means you can end up lumbering forward blindly while thinking you can see, because you are full of specific concrete information—the intricacies of the steering wheel distracting you from the dense fog on the road.) There are some exceptions to this. For instance, I expect some technical work to be pretty enlightening about the nature of AI systems, which is directly relevant to how the development of better AI systems will play out. For instance, mesa-optimization seems like a great contribution to ‘understanding the situation’ which came out of a broadly intervention-oriented organization.
It is that kind of understanding the situation—understanding what will happen with AI and its effects on society, under different interventions—that I think deserves way more attention.
Why do I think understanding the situation is better than intervening? Of course in general, both are great. Intervening is generally necessary for achieving anything, and understanding the situation is arguably necessary for intervening well. (The intense usefulness of understanding the situation for achieving your goals in most situations is exactly the reason one might be concerned about AI to begin with.) So in general, you want a combination of understanding the situation and intervening. The question is how valuable the two are on the current margin.
My guess: understanding the situation is better. Which is to say, I think a person with a subjectively similar level of skill at everything under consideration will add more value via improving everyone’s understanding of the situation by one person’s worth of effort than they would by adding one person’s worth of effort to pursuing the seemingly best intervention.
Here are a few things influencing this guess:
- A basic sense that our understanding of the situation is low My impression when talking to people working on AI risk is that they often don’t feel that they understand the situation very well. There are major disagreements about what kind of basic scenario we are expecting. The going explanations for why there will be human extinction at all seem to vary across time and between people. Offers to try to clarify are generally met with enthusiasm. These things don’t seem great as signs about whether we understand the situation well enough to take useful action.
- It’s easy to think of specific questions for which updated answers would change the value of different interventions. Here are a few examples off the top of my head of questions, answers, and strategies that would seem to be relatively favored by those answers:
- Does AI pose a substantial risk of human extinction?
Yes: work on AI risk instead of other EA causes and other non-emergency professions. Show a good case for this to large numbers of people who aren’t thinking about it and try to change views within the AI community and public about the appropriate degree of caution for relevant AI work.
No: work on something more valuable, support AI progress
- When will relevantly advanced AI be developed?
5 years: plan for what specific actors should do in a situation much like our current one and talk to them about doing it; build relationships with likely actors; try to align systems much like our current AI systems.
20 years: more basic time-consuming alignment research; movement building; relationship building with institutions rather than people.
100 years: avert risks from narrow or weak AI and other nearer technologies, even more basic alignment research, improve society’s general institutions for responding to risks like this, movement building directed at broader issues that people won’t get disillusioned with over that long a period (e.g. ‘responding to technological risks’ vs. AI specifically).
- How fast is the progression to superhumanly powerful AI likely to be?
Before you know it: searching for technical solutions that can be proven to entirely solve the problem before it arises (even if you are unlikely to find any), social coordination to avoid setting off such an event.
Weeks: Immediate-response contingency plans.
Years: Fast-response contingency plans; alignment plans that would require some scope for iteration.
Decades: Expect to improve safety through more normal methods of building systems, observing them, correcting, iterating. ‘Soft’ forces like regulations, broadscale understanding of the problems, cooperation initiatives. Systems that are incrementally safer but not infinitely safer.
- Does AI pose a substantial risk of human extinction?
- Broad heuristic value of seeing
When approaching a poorly understood danger down a dark corridor, I feel like even a small amount of light is really good. Good for judging whether you are facing a dragon or a cliff, good for knowing when you are getting close to it so you can ready your sword (or your ropes, as the case may be), good for telling how big it is. But even beyond those pre-askable questions, I expect the details of the fight (or climb) to go much better if you aren’t blind. You will be able to strike well, and jump out of the way well, and generally have good feedback about your micro-actions and local risks.So I don’t actually trust tallying up possible decision changes as in the last point, that much. If you told me that we had reasoned through the correct course of action for dragons, and cliff faces, and tar pits, and alternate likely monsters, and decided they were basically the same, I’d persist in being willing to pay a lot to be able to see.
Applied to AI strategy: understanding the situation both lets you choose interventions that might help, and having chosen an intervention, probably helps you make smaller choices within that intervention well, such that the intervention hits its target.
I think another part of the value here is that very abstract reasoning about complicated situations seems untrustworthy (especially when it isn’t actually formal), and I expect getting more data and working out more details to generally engage people’s concrete thinking better, and for that to be helpful.
- Large multipliers available It’s not that hard to imagine the work of one person’s year substantially redirecting way more than one person-year worth of time or money. Intuitively the chance of this seems high enough to make it a good prospect.
- We have a really long list of projects to do. About a hundred that we have bothered to write down, though they vary in tractability. It isn’t hard to find important topics which have received little thorough research. On the current margin, it looks to me like an additional competent person can expect to do useful research.
- If I were to work on a direct intervention in this space, I would feel fairly unsure about whether it would be helpful even if it succeeded in its goals.
- Understanding the situation has way fewer people than intervening: I haven’t measured this carefully, but my guess is that between ten and a hundred times as much labor goes into intervening than understanding the situation. I’m not sure what the division should be, but intuitively this seems too lopsided.
- Assumptions do not seem solid: it’s arguably not very hard to find points that people are bringing to the table that, upon empirical investigation, seem contrary to the evidence. Einstein and idiots are probably not really right next to each other on natural objective measures of intelligence, as far as I can tell. Qualitatively cool technologies do not generally cause large discontinuities in any particular metrics. Not empirical, but many of the arguments I’ve heard for expecting discontinuous progress at around the time of human-level AI just don’t make much sense to me.
- The ‘understanding the situation’ project is at a pretty unsophisticated stage, compared with intervening projects, according to my assessment anyway. That suggests a mistake, in the same way that navigating an expensive car using divining rods because you don’t have a GPS or map suggests some kind of misallocation of investments.
- I think people overestimate the effort put into understanding the situation, because there is a decent amount of talking about it at parties and blogging about it.
- There are people positioned to make influential choices if they knew what to do asking for help in assessing the situation (e.g. Holden of Open Phil, people with policy influence, philanthropists).
People sometimes ask if we might be scraping the barrel on finding research to do in this space, I guess because quite a few people have prolifically opined on it over numerous years, and things seem pretty uncertain. I think that radically under-imagines what understanding, or an effort dedicated to understanding, could look like. Like, we haven’t gotten as far as making sure that the empirical claims being opined about are solid, whereas a suitable investment for a major international problem that you seriously need to solve should probably look more like the one we see for climate change. Climate change is a less bad and arguably easier to understand problem than AI risk, and the ‘understanding the situation’ effort there looks like an army of climate scientists working for decades. And they didn’t throw up their hands and say things were too uncertain and they had run out of things to think about after twenty climate hobbyists had thought about it for a bit. There is a big difference between a vibrant corner of the blogosphere and a serious research effort.
3. Different merits of different projects
Ok, so AI risk is the most impactful field to my knowledge, and within AI risk I claim that the highest impact work is on understanding the situation1. This is reason to work at AI Impacts, and also reason to work at Open Philanthropy, FHI, Metaculus, as an independent scholar, in academia, etc. Probably who should do which depends on the person and their situation. Here are some things AI Impacts is about, and axes on which we have locations:
- Openness, broad comprehensibility and reasoning transparency: our goal is to make an online repository of reasoning around these topics, so we prioritize publishing work (vs. distributing it privately to smaller networks of people), and legibility. There can be research that is better done privately, but such research is not our project. We hope to describe the basis for our conclusions well enough that a non-expert reader can verify the reasoning.
- Modularity and question decomposition: AI Impacts is intended to be something like a group of hierarchical trees of modular conclusions, that can be referred to and questioned in a relatively clean way. We try to roughly have a page for each important conclusion, though things get complicated sometimes, and it is easier to have a short list of them. I think this kind of structure for understanding a complex topic is a promising one, relative to for instance less structured piles of prose. I expect this to make research more re-purposeable, clear, updateable, navigable, and amenable to tight feedback loops. Echoing this structure, we try to answer big questions by breaking them into smaller questions, until we have tractable questions.
- Eye on the prize vs. exploratory wandering: there are many research questions that are interesting and broadly shed light on the future of AI, and following one’s curiosity can be a good strategy. Nonetheless we especially strive to answer the questions that more help with answering important high-level questions. While researchers have a decent amount of freedom, we expect people to be contributing to filling in the gaps in this shared structure of understanding that we are building.
- Back of the envelopes expanding into arbitrarily detailed investigation: in places like academia, it seems normal to work on a project for many months or years, and to finish with something polished. Part of the idea with AI Impacts is to look out for questions that can be substantially clarified by a day and a back of the envelope calculation, to not put in more research than needed, and to iterate at more depth when relevant. This is hard to get right, and we usually fail at this so far, with investigations often expanding to be large clusters of pages before any go up. But as far as I’m concerned, long projects are a failure mode, not a goal.
- Adding concrete reusable things to the conversation, which can be called on in other discussions. This means prioritizing things like empirical investigations that add new data, or cleanly stated considerations, rather than long vague or hard-to-disentangle discussions, or conclusions whose use requires trusting the author a lot.
- Generalist research and widely ranging projects vs. developed expertise. I’m not an expert on anything, as far as I know. Some things my work has involved: thinking about the origin of humans, examining records of 1700s cotton exports, designing incentives for survey participants, reasoning about computer hardware designs, corresponding incredulously with makers of computing benchmarks, skimming papers about the energy efficiency of albatrosses. We do have relative specializations (I do more philosophy, Rick does more empirical work), and would welcome more relevant expertise, but this work can be pretty wide ranging.
- Trustworthiness as an unbiased source vs. persuasion. We focus on questions where we are genuinely unsure of the answer (though we might expect that data will reveal our own current guess is correct), and try to write neutrally about the considerations that we think merit attention. We are unlikely to look for the best way to ‘convince people of AI risk’, but rather to set out to establish whether or not there is AI risk, and to document our reasoning clearly.
- Thriving emphasis vs. high-pressure productivity orientation. We sit toward the thriving end of this spectrum, and hope that pays off in terms of longer term productivity. We are relatively accommodating to idiosyncratic needs or preferences. Our work requires less temporal consistency or predictability than some jobs, so while we value seeing each other regularly and getting stuff done often, we are able to be flexible if someone has things to contribute, but difficulties with the standard office situation.
I’m focused here on the positives, but here are a few negatives too:
- Variable office situation: through a series of unfortunate and fortunate events which is getting ridiculous, we haven’t had a consistent shared office in years. At present, we have an office in SF but Rick works from the bigger Rationalist/EA offices in Berkeley.
- Small: currently two full-time people, plus various occasional people and socially around people. Working from Berkeley next to other AI risk orgs mitigates this some. Has been as many as seven people in a summer, which seemed better, and we hope to move back to at least four soon.
- Even the relatively easy work is hard in ways: everything is complicated and even if you set out to do the most basic analysis ever there seems to be a strong current pulling toward getting bogged down in details of details. This is not the kind of ‘hard’ where you need to be a genius, but rather where you can easily end up taking much longer than hoped, and also get discouraged, which doesn’t help with speed. We’re still figuring out how to navigate this while being epistemically careful enough to produce good information.
So, that was a hand-wavy account of why I think working at AI Impacts is particularly high impact, and some of what it’s like. If you might want to work for us, see our jobs page2. If you don’t, but like thinking about the future of AI and wish we invited you to dinners, coffees, parties or our Slack, drop me a DM or send us a message through the AI Impacts feedback box. Pitches that I’m wrong and should do something else are also welcome.
- You might ask, why do I call this thing ‘understanding the situation’, not ‘AI strategy’? Isn’t that both what other people call this, and a less ridiculous name? Strategy, to my mind, is about deciding what to do. Strategy is like engineering. But there is a whole earlier activity that is science: just figuring out what the world is like. I think it’s worth separating science as an activity rather than bundling with engineering because if you think of yourself as doing ‘AI strategy’, I think it’s easy to skip over a lot of science and make too many assumptions – for instance to jump into asking how much to disvalue the ‘other side’ winning an AI arms race and their values controlling the future before being super clear on whether it makes sense to think of your situation as an arms race, or on whether countries are the relevant units, or on how a nation developing certain AI technology earlier relates to the government of that nation implementing ‘their values’. Which is maybe fine if you are in enough of a hurry, but is not what I’m advocating for. I often call the AI futurism version of science ‘AI forecasting’, but my guess is that that brings to mind a wrongly narrow picture—making quantitative predictions about the date of AI milestones. So I’d like a better word, but don’t have one right now. I realize it would be strategic to have a better word before really advocating for the importance of a thing, but you win some you lose some. Curious if others agree with these connotations. ↩
- If you already applied when we weren’t actively hiring and we didn’t get back to you: sorry, feel free to try again, we now have better ops capacity, and hope to soon make it better still. ↩