The AI Containment (or Boxing) problem is, given an artificial general intelligence or a prospective artificial general intelligence, how do you keep it from influencing the world? Containment is vital to safety, because without it, other safety strategies fail to work. If you have an AGI that you’re still testing, you need to keep it securely contained, or else when you find a dangerous bug, it will be too late. If you have a tripwire that’s supposed to warn you if the AGI is planning something dangerous, you need to keep it securely contained, or else it will disable the tripwire or intercept the warning. If you have a multi-part system where your two AGIs have to agree before they proceed with a plan, you have to keep them contained, or else one of the AGIs may decide to remove the other’s veto.
Others have observed that AGI containment isn’t a sufficient safety measure, because an AGI could convince its gatekeepers to let it out. While this is true, what it means is that containment is necessary, but not sufficient. And containment has one major advantage over most other subproblems of AGI safety: we know what needs to be done.
I recently applied for a grant to work on the AGI containment problem. If granted, I’ll be working on the AGI containment problem full time starting in September, with two other collaborators. In the mean time, I’ll be spending as much of my spare time on it as I can on it, because despite being massively important, to the best of my knowledge, the three of us are the only people in the world working on the problem. In the process of writing that grant application, I spent some time exploring the space of possible containment strategies and technologies, and discovered a massive amount of important work to be done.
There should be more people working on this. If you have programming skill, you can get started right now. Here, I present some software projects that need doing.
- Write a Linux kernel module for a tripwired network interface: a device which looks like an inactive network interface from userspace, but rather than being an actual interface, activates a tripwire/alarm when activated.
- Automated seccomp configuration: Write a tool that takes output from strace, which is a list of syscalls, and uses it to generate a BPF filter for Linux’s seccomp mode which allows those syscalls and no others.
- Fuzz test the Torch7 library interfaces for buffer overflows.
- Write a library that uses the Intel CPU performance counters to detect suspicious instruction mixes (eg RowHammer).
- GPU API filtering. Look through a GPGPU library such as CUDA, identify a relatively-safe subset which preserves most of the important functionality, and write a wrapper which runs in a separate process and provides only that subset via an IPC interface.
- Work on anti-steganography techniques for images, audio and video. Ie, make a tool such that if an AGI draws a picture, the gatekeeper run it through the tool and publish the picture and be confident that no one will find a hidden message inside it later.
If you’re convinced that AGI safety is important but have had trouble finding an affordance to actually work on it, hopefully this will help you find a project. These are things that I am not planning to do myself, because I already have a long list of things I’m doing that are this good or better. If no one has jumped in here to say they’re doing it, it probably isn’t getting done.