Shovel-Ready AGI Safety Work for Programmers

Speed Read This
Posted by on May 29, 2015

The AI Containment (or Boxing) problem is, given an artificial general intelligence or a prospective artificial general intelligence, how do you keep it from influencing the world? Containment is vital to safety, because without it, other safety strategies fail to work. If you have an AGI that you’re still testing, you need to keep it securely contained, or else when you find a dangerous bug, it will be too late. If you have a tripwire that’s supposed to warn you if the AGI is planning something dangerous, you need to keep it securely contained, or else it will disable the tripwire or intercept the warning. If you have a multi-part system where your two AGIs have to agree before they proceed with a plan, you have to keep them contained, or else one of the AGIs may decide to remove the other’s veto.

Others have observed that AGI containment isn’t a sufficient safety measure, because an AGI could convince its gatekeepers to let it out. While this is true, what it means is that containment is necessary, but not sufficient. And containment has one major advantage over most other subproblems of AGI safety: we know what needs to be done.

I recently applied for a grant to work on the AGI containment problem. If granted, I’ll be working on the AGI containment problem full time starting in September, with two other collaborators. In the mean time, I’ll be spending as much of my spare time on it as I can on it, because despite being massively important, to the best of my knowledge, the three of us are the only people in the world working on the problem. In the process of writing that grant application, I spent some time exploring the space of possible containment strategies and technologies, and discovered a massive amount of important work to be done.

There should be more people working on this. If you have programming skill, you can get started right now. Here, I present some software projects that need doing.

  1. Write a Linux kernel module for a tripwired network interface: a device which looks like an inactive network interface from userspace, but rather than being an actual interface, activates a tripwire/alarm when activated.
  2. Automated seccomp configuration: Write a tool that takes output from strace, which is a list of syscalls, and uses it to generate a BPF filter for Linux’s seccomp mode which allows those syscalls and no others.
  3. Fuzz test the Torch7 library interfaces for buffer overflows.
  4. Write a library that uses the Intel CPU performance counters to detect suspicious instruction mixes (eg RowHammer).
  5. GPU API filtering. Look through a GPGPU library such as CUDA, identify a relatively-safe subset which preserves most of the important functionality, and write a wrapper which runs in a separate process and provides only that subset via an IPC interface.
  6. Work on anti-steganography techniques for images, audio and video. Ie, make a tool such that if an AGI draws a picture, the gatekeeper run it through the tool and publish the picture and be confident that no one will find a hidden message inside it later.

If you’re convinced that AGI safety is important but have had trouble finding an affordance to actually work on it, hopefully this will help you find a project. These are things that I am not planning to do myself, because I already have a long list of things I’m doing that are this good or better. If no one has jumped in here to say they’re doing it, it probably isn’t getting done.

3 Comments on Shovel-Ready AGI Safety Work for Programmers

  1. Ben Walker says:

    FYI you aren’t the only person in the world working on the contain the AI problem. Some of us just work from other angles. My associates and I have been working on trying to train a more social engineering resistant human and HRO capable of making them. It doesn’t matter how good the anti virus software is if some dipstick clicks allow. Would you like to share literature?

  2. Daniel Ziegler says:

    Honestly, I’m not sure how valuable 1-5 are because they’re all tied to standard hardware and software, which (IMO) would be extremely stupid to run an AGI on. Linux, Intel CPUs, GPU drivers, GPUs, and everything else have exploitable bugs, and they will continue to have exploitable bugs. They are all massively complex systems saddled with legacy compatibility, and getting them 100% correct and safe is basically hopeless. One CS research area I’m very interested in is building up simpler systems from the ground up, along with formal proofs of their correctness. I think that’s the kind of platform an AGI needs to be run on.

  3. jimrandomh says:

    While I agree in principle that a clean start would be better than working with the existing systems, I unfortunately don’t think it’s realistic to expect it to actually happen or for the cleanly-rebuilt systems to be used in practice. And even if we do have a successful clean remake of these systems, that just means (1-5) will need to be redone on the replacements, and the lessons from having done that sort of thing before will be helpful.

    Also, I’m a lot less pessimistic than you are about the prospects of getting the existing systems correct and safe, particularly in contexts where there is defense in depth. For example, CPU security bugs are extremely rare, and also very hard for an AGI to discover from the inside without segfaulting itself.

Leave a Reply

Your email address will not be published. Required fields are marked *