Some Preliminary Notes on the Promise of a Wisdom Explosion

By Chris Leong

This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy.

Leading AI labs are aiming to trigger an intelligence explosion, but perhaps this is a grave mistake? Maybe they should be aiming to trigger a “wisdom explosion instead”?:
- Defining this as “pretty much the same thing as an intelligence explosion, but with wisdom instead” is rather vague¹, but I honestly think it is good enough for now. I think it’s fine for early-stage exploratory work to focus on opening up a new part of conversational space rather than trying to perfectly pin everything down².
- Regarding my definition of wisdom, I’ll be exploring this in more detail in part six (“What kinds of wisdom are valuable?) of my upcoming Less Wrong sequence, but for now, I’ll just say that I take an expansive definition of what wisdom is and that achieving a “wisdom explosion” would likely require us to train a system that is fairly strong on a number of different subtypes. As an example though, if a coalition of groups focused on AI safety were able to wisely strategize, wisely co-ordinate and wisely pursue methods of non-manipulative persuasion, I’d feel significantly better about humanity’s chances of surviving.
- In any case, I don’t want to center my own understanding of wisdom too much. Instead, I’d encourage you to consider the types of wisdom that you think might be most valuable for achieving a positive future for humanity and whether the arguments below follow given how you conceive of wisdom, rather than how I conceive of wisdom³.
- In an intelligence explosion, the recursive self-improvement occurs within a single AI system. However, in terms of defining a wisdom explosion, I want to take a more expansive view. In particular, instead of requiring that it occur within a single AI, I want to allow the possibility that it may occur within a cybernetic system consisting of both humans and AI’s, either within a single organisation, or within a cluster of collaborating organisations. In fact, I think this is the best path route for pursuing a wisdom explosion.
- I find the version involving a cluster of collaborating organisations particularly compelling both because it would enable the pooling of resources⁴ for developing wisdom tech, but also because it would enable pursuing a pivotal process rather than a pivotal action.
For purposes of simplicity, I’ll talk about “responsible & wise” actors vs. “irresponsible & unwise” actors even though responsibility and wisdom don’t always line up⁵.
I will develop this argument more fully in my upcoming Less Wrong post “Artificial Intelligence/Capabilities⁶ as Potentially Fatal Mistake. Artificial Wisdom as Antidote”, but an outline of the argument I plan to make is below
Firstly I will argue that the pursuit of an intelligence explosion most likely result in catastrophe:
- Capabilities inevitably proliferate: key factors include a strong open-source community, large career incentives for researchers to publish and challenges with preventing espionage
- The attack-defense balance strongly favors the attack: attackers only need to get lucky once, defenders need to get lucky every time
- The proliferation of capabilities most likely leads to an AI arms race: the diffusion of capabilities levels the playing field which forces actors to race to maintain their lead
- Intelligence/Capability tech differentially benefits irresponsible & unwise actors: Recklessly racing ahead increases your access to resources, whilst responsible & wise actors need time to figure out how to act wisely
- Society struggles to adapt: Government processes aren’t designed to be able to handle a technology that moves as fast as AI. Reckless & unwise actors will use their political influence to push society to adopt unwise policies.
In contrast, I’ll argue that the pursuit of a wisdom explosion is likely to be much safer:
- Pursuing wisdom tech likely produces less capability externalities
  - A wisdom explosion might be achievable with AI’s built on top of relatively weak base models: think of the wisest people you know, they don’t all have massive amounts of cognitive “firepower”
- Both malicious and reckless & unwise actors are less likely to pursue such technologies:
  - They are less likely to value wisdom, especially given the trade-off with pursuing shiny, shiny capabilities.
- Reckless & unwise actors are disadvantaged in pursuing a wisdom explosion:
  - There is likely a minimum bar of wisdom required to trigger such an explosion. As they say, garbage in, garbage out.
  - Even if they were able to trigger such an explosion, it’d likely take them longer and/or require a higher capability level. Remember I’m proposing producing a cybernetic system, so the human operators play a key role here.
- Reckless & unwise actors are less likely to know what to do with any wisdom tech that they develop or acquire:
  - This is less true at higher capability levels where the system can help them figure out what they should be asking, but they might just ignore it.
- Even if reckless & unwise actors actually pursue and then manage to acquire wisdom tech, it may not be harmful:
  - Acquiring such technology may make them realise their foolishness.
  - They may then either delete their model, hand it over to someone more responsible or start working towards becoming a more responsible actor themselves
- Responsible actors can use wisdom tech to help them attempt to non-manipulatively persuade irresponsible actors to be more responsible:
  - My intuition is that this is much harder for intelligence/capability tech which will likely be superhuman at persuasion soon, but which is not a natural fit for non-manipulative persuasion⁷
I also think it may be viable. I’ll develop these arguments more fully in the seventh post of my upcoming Less Wrong sequence “Is a “Wisdom Explosion” a coherent concept?”, but my high-level thoughts are as follows:
- Before we begin: What level of wisdom would we need to spiral up to count as having achieved a “wisdom explosion”? We might not need to set the level at too high of a level (insofar as super-human systems go). Saving the world may require superhuman wisdom, but I don’t think it would have to be that superhuman.
- Wisdom seems like the kind of thing where having a greater degree of wisdom makes it easier to acquire even more. In particular, you are more likely to be able to discern who is providing wise or unwise advice. You are also more likely to be able to discern which assumptions require questioning.
- Insofar as we buy into the argument for an intelligence explosion being viable, one might naively assume that this also increases the chance that a wisdom explosion is viable:
  - One could push back against this by noting that intelligence is much easier to train than wisdom because, for intelligence, we can train our system on problems with known solutions or with a simulator. This is true, but it doesn’t mean that we can’t use these kinds of things for training wisdom. Instead, it just means that we have to be more careful in terms of how we go about it.
- While a certain level of wisdom would likely be required in order to trigger a wisdom explosion, the level might not be that high:
  - It’s less about being wise and more about not being so ideological that you are unable to break out of an attractor
- As mentioned before, our base models might not need to be particularly large (by the crazy standards of frontier models). There’s a chance that a wisdom explosion could be triggered at a lower capability level than an intelligence explosion⁸ if wisdom isn’t really about cognitive firepower:
  - If this is true, then we may be able to trigger a wisdom explosion earlier than an intelligence explosion
  - This may also address some concerns about inner alignment if we believe that smaller models tend to be more controllable⁹.
- Some people might think that wisdom is too fuzzy to make any progress at all. I’ll discuss this in “An Overview of “Obvious” Approaches to Training Wise AI and I’ll discuss this further in the third post of my upcoming Less Wrong sequence “Against Learned Helplessness With Training Wise AI”.
“Wisdom explosion” as creative stimuli:
- Even if the concept of a wisdom explosion turns out to be incoherent or triggering a wisdom explosion turns out to be impossible, I still think that investigating and debating these topics would be a valuable use of time. I can’t fully explain this, but certain questions feel like obvious or natural questions to ask. Noticing these questions and following the line of inquiry until you reach a natural conclusion is one of the best ways of developing your ability to think clearly about confusing matters.
- The value of gaining a new frame isn’t just in the potential application of the frame itself, but in how it can reveal assumptions within your worldview that you may not even be aware of.

Notes

And likely even frustrating for some folk! Sorry if this is the case, but my focus here is really on starting a conversation and I understand how this could be annoying if you prefer posts that are written in such a way to make it as quick and easy as possible to determine whether what the post is saying is true.
I plan to examine this in more detail in part seven (Is a “Wisdom Explosion” a coherent concept?) of my upcoming Less Wrong seqence on Training Wise AI Advisers via Imitation Learning
One of the risks of saying too much about how I conceive of wisdom too early on is that it may have the unintentional effect of accidentally narrowing the conversation or encouraging people to anchor too much on my conceptions.
Particularly important since wisdom is a cluster of different things and developing an entirely new paradigm would be a lot of work
Arguments always involve some degree of simplification. The question is whether the additional clarity outweighs the reduction in accuracy.
Intelligent and capabilities aren’t quite the same thing. I’ll explore the distinction in more detail in my upcoming sequence.
I expect most techniques for training wisdom to be adaptable towards this end. Non-manipulative persuasion requires difficult subjective judgements, just like wisdom
Admittedly, GPT o1 makes this less likely as it indicates a greater role for inference time scaling going forward.
Plausible, but unclear