When predicting and planning for coming decades, we classify futures different ways based on what happens with artificial general intelligence. There could be a hard take-off, where soon after an AGI is created it self-improves to become extraordinarily powerful, or a soft take-off, where progress is more gradual. There could be a singleton – a single AGI, or a single group-with-AGI, which uses AGI to become much more powerful than everyone else, or things could be decentralized, with lots of AGIs or lots of groups and individuals that have AGIs.
The soft- vs hard-takeoff question is a matter of prediction; either there is a level of intelligence which enables rapid recursive self improvement, or there isn’t, and we can study this question but we can’t do much to change the answer one way or the other. Whether AGI is decentralized or a singleton, however, can be a choice. If a team crosses the finish line and creates a working AGI, and they think decentralized control will lead to a better future, then they can share it to everyone. If multiple teams are close to finishing but they think a singleton will lead to a better future, then they can (we hope) join forces and cross the finish line together.
There are things to worry about and try to prepare for in singleton-AGI futures, and things to worry about and prepare for in decentralized-AGI futures, and these are quite different from each other. Which is better, and which will actually happen? I think a lot of people talking about AGI and AGI safety end up talking past each other, because they are imagining different answers to this question and envisioning different futures. So let’s consider two futures. Both will be good futures, where everything went right. One will be a singleton future, and the other will be a decentralized future.
Let’s look at a singleton future, starting with a version of that future in which everything went right. There are some who want to make – or for others to make – a single, very powerful AGI. They want to design it in such a way that it will respect everyone’s rights and preferences, be impossible for anyone to hijack, and be amazingly good at getting us what we want. In a world where this was executed perfectly, if I wanted something, then the AGI would help me get it. If two people wanted things that were incompatible, then somewhere in the AGI’s programming would be a rule which decides who wins. Philosophers have a lot to say about what that rule would be, and about how to resolve situations when people’s preferences are inconsistent or would change if they knew more. In the world where everything went right, all of those puzzles were solved conclusively, and the answers were programmed into the AGI. The theory of how intelligence works was built up and carefully verified, all the AGI experts agreed that they AGI would do what all the philosophers and AGI experts together agreed was right. Then the AGI would take over the world, and everyone would be happy about it, at least in retrospect when they saw what happened next.
On the other hand, there are a lot of ways for this to go wrong. If someone were to say they’d built an AGI and they wanted to make it a singleton, we’d all be justifiably skeptical. For one thing, they could by lying, and building a different AGI to benefit only themselves, rather than to benefit everyone. But even the very best intentions aren’t necessarily enough. A major takeaway from MIRI and FHI’s research on the subject is that there’s a very real risk of trying to make something universally-benevolent, but getting it disastrously wrong. This is an immensely difficult problem. Hence their emphasis on using formal math: when something is mathematically proven then it’s true, reducing the number of places a mistake could be made by one. There’s a social coordination problem, to make sure that whoever is first to create an AGI makes one that will benefit everyone; another social coordination, to make sure that people aren’t racing to be first-to-finish in a way that causes them to cut corners; and a whole lot of technical problems. Any one of these things could easily fail.
So how about a world with decentralized AGI–that is, one where everyone (or every company) has an AGI of their own, which they’ve configured to serve their own values. Again, we’ll start with the version in which everything goes right. First of all, in this world, there is no hard take-off, and especially no delayed hard take-off. If recursive self-improvement is a thing that can happen, then any balance of power is doomed to collapse and be replaced with a singleton as soon as one AGI manages to do it. And second, the set of other (non-AGI) technologies need to work out in a particular way to make a stable power equilibrium possible. As an analogy, consider what would happen if every individual person had access to nuclear weapons. We would expect things to turn out very badly. Luckily, nuclear weapons require rare materials and difficult technologies, which makes it possible to restrict access to a small number of groups who have all more-or-less agreed to never use them. In a hypothetical alternate universe where anyone could make a nuclear weapon using only sand, controlling them would be impossible, and that hypothetical alternate universe would probably be doomed. Similarly, our decentralized-AGI world can’t have any technologies like sand-nuke world, or it will collapse quickly as soon as AGIs get smart enough to independently rediscover the secret. Or alternatively, that world could build a coordination mechanism where everyone is monitored closely enough to make sure they aren’t pursuing any known or suspect dangerous technologies.
The problems in singleton-AGI world were mostly technical: the creators of the AGI might screw it up. In decentralized-AGI world, the problems mostly come from the shape of the technology landscape. We don’t know whether recursive self-improvement is possible, but if it is, then decentralized-AGI worlds aren’t likely to work out. We don’t know if making-nukes-from-sand is a possible sort of thing, but if anything like that is possible, then the bar for how good the world’s institutions will have to be to prevent disaster will be very high. These things are especially worrying because they aren’t things we can influence; they’re just facts about physics and its implications which we don’t know the answers to yet.
Suppose we make optimistic assumptions. Recursive self-improvement turns out not to be possible, the balance of technologies favors defense over offense, and our AGI representatives get together, form institutions, and enforce laws and agreements that prevent anything truly horrible from happening. There is still a problem. It’s the same problem that happens when humans get together and try to make institutions, laws and agreements. The problem is local incentives.
Any human with above room temperature IQ can design a utopia. The reason our current system isn’t a utopia is that it wasn’t designed by humans. Just as you can look at an arid terrain and determine what shape a river will one day take by assuming water will obey gravity, so you can look at a civilization and determine what shape its institutions will one day take by assuming people will obey incentives.
But that means that just as the shapes of rivers are not designed for beauty or navigation, but rather an artifact of randomly determined terrain, so institutions will not be designed for prosperity or justice, but rather an artifact of randomly determined initial conditions.
— Meditations on Moloch by Scott Alexander
If we give everyone their own AGIs, then the way the future turns out depends on the landscape of incentives. That isn’t an easy thing to change, although it isn’t possible. Nor is it an easy thing to predict, though some have certainly tried. (For example Robin Hanson’s The Age of Em). We can imagine nudging things in such a way that, as civilization flows downhill, it goes this way instead of that and ends up in a good future.
The problem is that, at the bottom of the hill as best I understand it, there are bad futures.
This isn’t something I can be confident in. Predicting the future is extremely hard, and where the far future is concerned, everything is uncertain. Maybe we could find a way to make having huge numbers of smarter-than-human AIs safe, and steer humanity from there to a good future. But for this sort of strategy, uncertainty is not our friend. If there were some reason to expect this sort of future to turn out well, or some strategy to make it turn out well, then for the same reason I’m uncertain in my belief that it will turn out badly, we would be uncertain in our belief that it will turn out well.
So, how do these comparative scenarios compare? To make a good future with a singleton AGI in it, humanity has to solve immensely difficult technical and social coordination problems, without making any mistakes. To make a good future with decentralized AGI in it, humanity has to… find out that luckily physics do not allow for recursive self-improvement or certain other classes of dangerous technologies.
I find the idea of building an AGI singleton intuitively unappealing and unaesthetic. It goes against my egalitarian instinct. It creates a single point of failure for all of humanity. On the other hand, giving everyone their own, decentralized AGIs is terrifying. Reckless. I can’t imagine any decentralized-AI scenarios that aren’t insanely risky gambles. So I favor humanity building a singleton, and AGI research being less than fully open.