Category Uncategorized

The Malthusian Trap Song

Speed Read This
Posted by on December 17, 2015

The Malthusian Trap is what happens to societies that reach the carrying capacity of their environment. For most of human history, humans have lived a Malthusian existence, constantly threatened by famine and disease and warfare. I wrote this for the 2015 MIT Winter Solstice. To the tune of Garden Song.

Dawn to dusk, row by row, better make this garden grow
Move your feet with a plow in tow, on a piece of fertile ground
Inch by inch, blow by blow, gonna take these crops you sow,
From the peasants down below, ‘till the wall comes tumbling down

Pulling weeds and picking stones, chasing dreams and breaking bones
Got no choice but to grow my own ‘cause the winter’s close at hand
Grain for grain, drought and pain, find my way through nature’s chain
Tune my body and my brain to competition’s demand.

Plant those rows straight and long, thicker than with prayer and song.
Emperors will make you strong if your skillset’s very rare.
Old crow watching hungrily, from his perch in yonder tree.
In my garden I’m as free as that feathered thief up there.

OpenAI Should Hold Off On Choosing Tactics

Speed Read This
Posted by on December 14, 2015

OpenAI is a non-profit artificial intelligence research company founded this lead by Ilya Sutskever as research director with $1B of funding from Sam Altman, Greg Brockman, Elon Musk, Reid Hoffman, Jessica Livingston, Peter Thiel, Amazon Web Services, Infosys, and YC Research. The launch was announced in this blog post on December 11, saying:.

Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.

Since our research is free from financial obligations, we can better focus on a positive human impact. We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as is possible safely.

This is a noble and worthy mission. Right now, most AI research is done inside a few large for-profit corporations, and there is a considerable risk that profit motive could lead them to pursue paths that are not good for humanity as a whole. For the sorts of AI research going on now, leading towards goals like medical advances and self-driving cars, the best way to do that is to be very open. Letting research accumulate in secret silos would delay these advances and cost lives. In the future, there are going to be decisions about which paths to follow, where some paths may be safer and some paths may be more profitable, and I personally am much more comfortable with a non-profit organization making those decisions than I would be with a for-profit corporation or a government doing the same.

There are predictions that in the medium- to long-term future, powerful AGIs could be very dangerous. There are two main dangers that have been tentatively identified, and there’s a lot of uncertainty around both. The first concern is that a powerful enough AI in the hands of bad actors would let them do much more damage than they could otherwise. For example, extremists could download an AI and ask it to help them design weapons or plan attacks. Others might use the same AI help them design defenses and to find the extremists, but there’s no guarantee that these would cancel out; it could end up like computer security, where the balance of power strongly favors offense over defense. To give one particularly extreme example, suppose someone created a general-purpose question-answering system that was smart enough that, if asked, it could invent a nuclear bomb made without exotic ingredients and provide simple instructions to make one. Letting everyone in the world download that AI and run it privately on their desktop computer would be predictably disastrous, and couldn’t be allowed. On the other hand, the balance could end up favoring defenders; in that case, widespread distribution would be less of a problem. The second concern is the possibility of an AGI undergoing recursive self-improvement; if someone developed and trained an AI all the way to a point where it could do further AI research by itself, then by repeatedly upgrading its ability to upgrade itself it could quickly become very very powerful. This scenario is frightening because if the seed AI was a little bit flawed, then theory suggests that the process of recursive self-improvement might greatly magnify the effects of the flaw, resulting in something that destroys humanity. Dealing with this is going to be really tricky, because on one hand we’ll want the entire research community to be able to hunt for those flaws, but on the other hand we don’t want anyone to take an AI and tell it or let it start a recursive self-improvement process before everyone’s sure it’s safe.

At this point, no one really knows whether recursive self-improvement is possible, nor what the interaction will be between AI-empowered bad actors and AI-empowered defenders. We’ll probably know more in a decade, and more research will certainly help. OpenAI’s founding statement seemed to strike a good balance: “as broadly and evenly distributed as is possible safely”, acknowledging both the importance of sharing the benefits of AI and also the possibility that safety concerns might force them to close up in the future. As OpenAI themselves put it:

AI systems today have impressive but narrow capabilities. It seems that we’ll keep whittling away at their constraints, and in the extreme case they will reach human performance on virtually every intellectual task. It’s hard to fathom how much human-level AI could benefit society, and it’s equally hard to imagine how much it could damage society if built or used incorrectly.

Yesterday, two days after that founding statement was published, it was edited. The new version reads:

OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.

Since our research is free from financial obligations, we can better focus on a positive human impact. We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as possible.

The word “safely” has been removed. There is no announcement or acknowledgement of the change, within that blog post or anywhere else, and no indication who made it or who knew about it.

I sort of understand why they’d do this. There’s a problem right now with ignorant press articles fearmongering over research that very clearly doesn’t pose any risk, and seizing on out-of-context discussions of long-term future issues to reinforce that. But those words where there for an important reason. When an organization states its mission in a founding statement, that has effects – both cultural and legal – lasting far into the future, and there’s going to come a time, probably in this century, when some OpenAI’s researcher is going to wonder whether their latest creation might be unsafe to publish. The modified statement says: publish no matter what. If there’s a really clear-cut danger, they might refuse to publish anyways, but this will be hard to defend in the face of ambiguity.

OpenAI is less than a week old, so it’s too early to criticize them or to praise them for anything they’ve done. Still, the direction they’ve indicated worries me – both because I doubt whether openness is going to be a safe strategy a decade from now, and because they don’t seem to have waited for the data to come in before baking it into their organizational identity. OpenAI should be working with the AI safety community to figure out what strategies to pursue in the short and long term. They have a lot of flexibility by virtue of being a non-profit, and they shouldn’t throw that flexibility away. They’re going to need it.

Rationality Cardinality: The Web-Based Version

Speed Read This
Posted by on October 3, 2015

Rationality Cardinality is a card game I wrote which takes memes and concepts from the rationality/Less Wrong sphere, and mixes them with jokes to make a game. The ultimate goal is to get the game to a broad audience, so that it exposes them to the ideas and concepts they’d be better off knowing. After nearly two years of card-creation, playtesting and development, today, I’m taking the “beta” label off the web-based version of Rationality Cardinality. Go to the website and, if at least two other people visit at the same time, you can play against them.

What this means is, I’m trying to drive traffic to the site and get people to play. People who make an account will get an email about it when there are print copies for sale; I’m trying to get enough interest to be confident that, when I launch a Kickstarter to sell print copies, it will reach its goal. You can help with this by inviting your friends to play, by sharing the link, and by upvoting and giving feedback on link sharing sites.

AI Containment and FLI

Speed Read This
Posted by on July 1, 2015

Back in May, Roman Yampolskiy, Janos Kramar and I applied for a grant from the Future of Life Institute (FLI) to work in the AI containment problem. The grant proposal itself is fairly long (15 pages) and has quite a few things in it; I’m summarizing a subset of it here.

The AI containment problem is: if you’re developing an AI that could potentially become superintelligent, how do you keep it from taking over the internet, and from disabling your safety measures? This requires a moderate amount of philosophically flavored work, a lot of software infrastructure, and a detailed understanding of what’s out there. While it’s likely to be awhile before there are AIs, this needs to be done sooner rather than later. First, AI could arrive sooner than expected. Even if it doesn’t, security-sensitive software takes time to mature, and given that we’re still struggling with the security consequences of technical decisions made in the 80s, we need to pay attention to the decisions being made now and optimize them for safety-amenability.

As part of writing the proposal, we did a search and literature review. There is shockingly little written about this subject. There are the AI Box experiments by Eliezer and replication, which shows that human gatekeepers are more likely to be tricked into releasing an AI from containment than they think. There’s Roman’s paper Leakproofing the Singularity, which points out that if you limit the AI to answering a small number of yes/no questions that it can’t manipulate you anymore. And… that’s pretty much it for AI-oriented containment research; the rest of the literature review was about malware-containment, which has some overlap but significantly different requirements.

So we started studying the problem directly. The first important observation was that many of the other safety measures people have proposed, have containment as a prerequisite. If you want to create a tripwire, for example, the tripwire needs a computational high ground – you want it to be able to read the AI’s memory, but not vise versa. If you want to do testing before you let it access the Internet or self-modify, you need mechanisms to detect and inhibit self-modification, and to protect the test-evaluation code. Some proposals involve AI designs with a “parliamentary model”, where sub-agents vote or negotiate, either as a way of reconciling utility functions or as a cross-check; this, too, requires containment measures to protect the sub-agents from each other. If you’re running a series of tests, you need to protect the hardware from unwanted persistence; if the AI managed to overwrite an unnoticed BIOS, then subsequent tests would be untrustworthy.

None of these is to suggest that testing, tripwires and whatnot would make an unsafe overall design tolerable, but they would provide a saving throw against errors in an otherwise-safe AI architecture, and some chance to abort development on a fundamentally-flawed one. These things are simple enough in retrospect, but no one had written about them before. Seeing that many mechanisms called for setups with computational high- and low-ground, I determined that systems should have a layered architecture, diagrammed an example, and classified the types of vulnerabilities that components could have and the consequences of partial escape up to each layer.

Proposals like “only allow a small number of yes-or-no questions as output” make sense when you’ve got a genuine, scary superintelligence. Proposals like “make a bunch of layers with monitoring systems and tripwires pointing inwards” make sense when you’ve got an AI that isn’t superintelligent, which you’re evaluating to decide whether to let it self-improve or not. The common case, however, is neither of these things. The common case is a research group that doesn’t have an AI at all, but hopes it might some day. To address the differences, we defined a taxonomy of heavy, medium, and light containment, respectively, put down definitions and a few thoughts about how to delineate the boundaries and what technical measures might be adequate, and proposed to study the question more thoroughly.

So, what did FLI’s review committee think of all this?

The proposal summarizes existing taxonomies of containment, but doesn’t make concrete proposals about how to make effective progress on these questions.

Well, crap. Those weren’t “existing taxonomies”! When considering possible failure modes for this proposal, one possibility I didn’t consider was that original research portions would look too much like summaries of existing work. So they thought the proposed scope was much smaller than it really was, and that the scope was too small for the requested budget. In retrospect, that wasn’t made nearly as clearly as it could have been. Still, I’m rather bothered by the lack of opportunity to clarify, or really any communication at all in between submitting a proposal and receiving a no.

So, we still need funding; and sadly, FLI is the only granting agency which has expressed any interest at all in AI safety.To the best of my knowledge, there is no one else at all working on or thinking about these issues. We couldn’t find any when we were looking for names to recommend for the review panel. Without funding, I myself will not be able to work on AI containment in more than an occasional part-time capacity.

This is too important an issue for humanity to drop the ball on, but it looks like that’s probably what will happen.

A Broad View of Software Verification

Speed Read This
Posted by on June 7, 2015

Software is terrible. This is mostly okay, except that there are a few specific types of software which we would much rather wasn’t terrible. For example, the software that controls your car’s throttle. Or the software that controls spaceship navigation. Or the first artificial general intelligence, if one is eventually made.

When correctness really matters, we break out the mathematical machinery and formally prove that our software is correct. Or rather, we feel guilty about how we aren’t formally proving our software correct, while academics prove the correctness of toy versions of our software and wonder why we aren’t doing it too. (It’s not like a line count and feature set 10^3 times bigger is a fundamental obstacle).

This is my attempt to outline the problem of making large-scale provably software. At the end of this blog post, you will be ready to apply formal methods to proving the correctness of a large system have an appreciation for why formally proving the correctness of software is hard.

Like most discussions of formal proof in software, we’re going to start with a sort function. This might seem silly, because sorting isn’t that hard, but applying formal proof methods revealed a bug in OpenJDK’s sort function which had gone undetected for years and survived extensive unit tests.

The proof-making process goes like this. First, we define what it means for our function to be correct. In the case of a sort function, that means it takes a list and returns a list, elements of the input and output lists match up one-to-one, and each element of the output list other than the first element is greater than or equal to the element before it. This is the extensional definition of sorting. Then we write the sort function itself, which is a constructive definition: it defines the same thing, but from a different angle. Then we informally prove that the two definitions are equivalent, and then we convert the informal proof into a machine-checkable formal representation.

Consider, for example, this inefficient C++ implementation of selection sort.

int leastElementIndex(vector<int> &vec)
int leastElement = vec[0];
int leastIndex = 0;
for(int ii=1; ii<vec.size(); ii++) {
if(vec[ii] < leastElement) {
leastElement = vec[ii];
leastIndex = ii;
return leastIndex;

vector<int> sort(vector<int> input)
vector<int> sortedItems;
vector<int> itemsLeft = input;

for(int ii=0; ii<input.size(); ii++) {
int leastIndex = leastElementIndex(itemsLeft);
int leastValue = itemsLeft[leastIndex];
return sortedItems;

Our initial, informal proof goes like this. Upon each entry into the loop, sortedItems is sorted, every element of sortedItems is less than the smallest element in itemsLeft, and each item in input corresponds one-to-one with an item either in sortedItems or in itemsLeft. These are all true on the first entry into the loop because sortedItems is empty and itemsLeft is a copy of the input. If this was true at the start of a loop iteration, then it is true at the end of a loop iteration: sortedItems is still sorted because the item added to the end is greater than or equal to all of the items that were previously there; the one-to-one correspondence is preserved because the item that was removed from itemsLeft was added to sortedItems; and all items in sortedItems are all still less than or equal to the smallest item in itemsLeft because the item that was moved was the smallest item. The loop terminates after input.size() steps, at which point every element has been moved from itemsLeft to sortedItems so that’s the sorted list and we can return it.

This proof is incorrect and the function is not correct either. Can you spot the bug?

This function is incorrect for inputs with size >= 2^31 because input.size() is larger than the largest possible int. When you compile this code, the compiler will give you a warning. This is an overflow problem: when we encounter numbers that don’t fit in a 32-bit int, the things we know about basic arithmetic go out the window. This class of error is pervasive in real programs. An experiment found that most programmers couldn’t write a correct binary search, and this was a major reason. In most cases, the compiler will not catch these errors; our election sort was a fortunate exception.

Unfortunate Primitives

The first obstacle to formal proof is that software is usually built out of primitives that are mathematically inconvenient, creating strange caveats and impedance mismatches. We want integers, but instead we get integers mod 2^32, so n+1 isn’t necessarily bigger than n. This is why we have one less Ariane 5 rocket. We want real numbers, but instead we get floating-point numbers, and suddenly addition isn’t an associative operation. This famously caused the Patriot missile defense system to fail. We either can’t prove that values won’t suddenly change in between accesses, or we have to invest significant effort into proving they won’t, because of mutability. Some languages force us to reason about whether objects have any references left, or else we’ll have memory leaks, or worse, try to use an object after it’s been freed (causing a use-after-free security vulnerability).

Some languages do a better job at avoiding these problems than others. Languages are commonly classified as functional or imperative; the “functional” side is generally seen as doing better at avoiding these problems. However, this distinction is very fuzzy, languages that have traditionally been thought of as imperative and object-oriented have tended to adopt traits that used to identify functional languages, such as lightweight function syntax, garbage collection and immutability.

Static Analysis

Inside every major compiler, there is an inference and proof engine. No, not the type system-the other one. Actually, there’s two proof systems. Type-checking is a sort of proof system, thanks to the Curry-Howard isomorphism; it proves that we’ll never try to do particular silly things like divide a number by a string. But the main inference engine is invisible. Its purpose is to find and prove statements of the form “program P is equivalent to program P’, which is faster”. In most cases, these are independent of programming language and shared between multiple programming languages, and independent of processor architecture and shared between processor architectures.

These proof engines are less ambitious than formal verification in general, but all formal program manipulation starts roughly the same way: by translating the program to a representation that’s easier to work with and easier to prove things about. (Actually, compilers tend to translate programs through a series of representations which highlight different aspects of it, performing different optimizations in each). Because the optimizers in a compiler are entirely automated, and the correctness of their output is important, they don’t get to gloss over any details; they have to thoroughly check their assumptions, including assumptions that would normally be swept under the rug.

When we look at them, we find that many of the techniques are about figuring out and proving things that make good lemmas for a larger proof, such as:

  • Which variable usages correspond to which assignments (SSA form)
  • Which variables/memory locations might or might not change in between accesses (escape analysis)
  • Which pointers/references definitely point to the same thing, or definitely don’t point to the same thing (alias analysis)
  • Upper and lower bounds on variables (value range analysis)

Several of these steps serve to eliminate the differences between functional and imperative programming languages. In particular, the conversion to static single assignment form eliminates mutability in local variables, and the escape analysis reduces (but doesn’t always eliminate) need to worry about mutability in heap variables.

The Dual Definitions Problem

When we wrote a sort function and formally defined what properties we wanted it to have, I phrased this in terms of writing two definitions from two different angles. In practice, this can be the hardest part. There are two ways this can go wrong. First, we might not be able to write a second definition. It was hard enough figuring out what we wanted the first time; if the second specification has to be sufficiently different, we might not be able to do that at all.

This is especially hard if we have components that aren’t well modularized or well factored. Solving it is mainly a matter of good software architecture: identifying the problems your software solves and carving it at the joints, choosing the right joints and carving it cleanly. This is a very hard problem. While some speak of programmers who don’t use formal proof being lazy, this problem is not one that’s usually solved by mere effort; it requires time, deep understanding, and cleverness.

The second problem is that we could write down a second definition, with the exact same error or the exact same blind spot. This is particularly likely for corner cases, like empty lists and extreme values.

Be careful about using the following code — I’ve only proven that it works, I haven’t tested it.
— Donald Knuth

The solution to this problem is unit testing. A unit test is a written example with an expected result. This makes fundamental confusions less likely. It’s also a way of verifying corner cases that would much up a proof. There’s only one empty list, for example, so a unit test on the empty list means not needing to worry about it in a proof. There’s an important caveat, however, which is that a unit test does not necessarily represent a considered belief about what should happen; a developer could cheat, by writing code first, observing what the results are, and then writing unit tests which check for those results, without knowing whether they’re actually correct. This is why Test-Driven Development advocates writing unit tests first, before writing code.

The Complex Interface Problem

CompCert is a formally verified C compiler. It has a proof, verified by Coq, that output assembly has the same semantics as input C programs. Unsure whether it was really for real, I checked its issue tracker. This is unusually good; there are few entries, and almost all of them are new-feature requests. But this one is a straight-up bug, and a fairly serious one. What happened?

What happened was that at the boundary where CompCert-generated code interacts with other compilers’ code in a library, it formats some argument lists differently, causing values to be misinterpreted. This is outside the realm of CompCert’s proof. Applying formal proof to this problem wouldn’t help, because the same problem would show up in the specification used by the proof.

In the case of CompCert, the problem was detected by a modest amount of testing and then fixed. In other cases, however, the interfaces are much larger, and more full of special cases. Proving the correctness of a PDF renderer or a web browser, for example, would be a hopeless endeavor, because what correctness means is defined by others’ expectations about what happens at an interface, and that interface has not been formally specified anywhere.


Formal verification of software is hard, but there is more than one angle to attack the problem from. The first, traditional angle is: write better automated proof-searchers and proof-checkers. But the best proof-checker in the world can’t help if you don’t know what you’re proving, or you’re proving your software does something you don’t want, or you’re building on architectural quicksand. The second angle is purely technical: we need to take the traps out of the core abstractions we’re building on. In 2015, there is no excuse for programming languages to silently wrap integers on overflow, and yet most of them do. And the third angle is: we need to get better at software architecture. This requires improving our cognitive strategies for thinking about it, and looks somewhat like rationality research.

CVEs Per Year

Speed Read This
Posted by on June 5, 2015

If you care about security vulnerabilities that haven’t been discovered yet, then assessing software is hard. It’s very hard to ever convincingly show that something is free of vulnerabilities. There are tools that will help: static analysis and fuzz testing are the main ones. These take a lot of work to use. For high-profile software, however, there’s an easy test to apply first. The Mitre Common Vulnerabilities and Exposures List tracks publicly-disclosed security vulnerabilities, what software they were in, when they were discovered, and how severe they were. If a piece of software has had vulnerabilities discovered recently, it probably has more that haven’t been discovered yet. You can search it here.

This suggests a simple metric: CVEs per year. This is a negative filter; it can show that software is unfit for high-security applications, but can’t show the reverse, because a lack of reported vulnerabilities could just be the result of a lack of attention. With that in mind, here are results for a few important pieces of software. Note that 2015 is only half a year so far (less when you account for publication delays), and the counting process I used is slightly error-prone.

VirtualBox: 3 in 2015, 20 in 2014
VMWare Fusion: 1 in 2015, 3 in 2014, 2 in 2013, 5 in 2012, 21 in 2011-2008
Linux seccomp: 1 each in 2009, 2014, 2015
Linux kernel: 15 in 2015 (at least a few of which would break seccomp but don’t mention it by name)
Firefox: 52 in 2015
Google Chrome: 59 in 2015
Windows 8: 56 in 2015

If things were going well, this would flip from “vulnerabilities per year” to “years per vulnerability”. We have a long way to go.

The Memory Sinkhole Vulnerability

Speed Read This
Posted by on June 4, 2015

On the schedule for the Blackhat conference is a presentation by Christopher Domas titled The Memory Sinkhole – Unleashing an x86 Design Flaw Allowing Universal Privilege Escalation. There’s a discussion going on Hacker News today.

I’m studying the AGI containment problem – that is, how to test and experiment with artificial general intelligences without letting them bypass your safeguards and take over the world. While the development of working AGI is far enough in the future that this particular bug will certainly be fixed by then, this is one the categories of things that keeps me up at night.

The containment system architectures I’ve focused on so far involve layering and defense in depth. You don’t just put your AGI in a VM and call it a day; nor do you airgap it and call it a day. No, you write it in a language with strict semantics, then you put it in an operating system sandbox, that you put that in a virtual machine, and you airgap that. This is necessary both because of occasional security vulnerabilities like this one, but also to enable tripwires and other safeguards; you need the safeguards to be on a sort of high-ground, relative to the AGI, so that they can inspect it and shut it down but the AGI can’t expect them or turn them off.

Only a few details on this exploit are available so far; I’ve only read the one-paragraph synopsis, and I don’t expect more details to come out until August, so the remainder of this post is speculation and prediction on my part.

If I’m understanding the synopsis correctly, this particular exploit is an escalation from VM guest kernel space to host kernel space or system management mode. In the architecture I described, there are two layers of defense (language semantics and process sandboxing) in between the AGI and the point where it could make use of this exploit, and one more layer of defense left (airgapping) after it does. However, the layer it breaks is important, because it’s the last layer protecting tripwires, the integrity of log files, and preventing the installation of BIOS modifications to subvert the hardware after a reset.

The Failed Economics of Devtools and Code Reuse

Speed Read This
Posted by on June 2, 2015

Why do so many people and so many projects use PHP? Why is there so little code reuse, so few high-quality libraries and development tools to build on? There are several reasons, but one that hasn’t received attention is a problem in the economics.

When starting a new project, the creator picks a programming language and a set of software libraries. Consider the case of a developer starting an open source project, and choosing between an inferior tool with low up-front cost, and an overall-superior superior tool with higher up-front cost. The better tool might cost money, or it might just have a learning curve. Which will they choose?

Developers of open source projects hope to attract collaborators in the future. Unfortunately, most of the costs of a development tool or library are borne not by the person who chooses it; they are borne by the future collaborators who they hope will materialize later. While an existing collaborator might agree to learn a new programming language or pay for a new tool, a hypothetical future contributor can’t do that. Attracting volunteers to work on an open source project is very difficult to begin with, so developers are strongly averse to imposing costs on them. The perceived impact of these externalized costs is magnified many times over, so people don’t accept them.

You would think a company building a team to work on a software project wouldn’t have this problem, because it can internalize the future costs: it pays for the software and it pays employees for the time they spend getting up to speed. But there’s another problem. The quality of a software library is hard to judge until you’ve used it in a project; it’s risky to try an unfamiliar library in a big company-funded project with a team. So developers try out new libraries in smaller, less-important projects where they can take risks — and these tend to be open-source projects, where trying new libraries is a low-ranked side-goal.

This results in an equilibrium where people reject tools, not because they themselves are deterred by the up-front cost, but because they expect other people to be deterred by that cost. This means that if you write a better IDE, a good library, a better compiler, or some other piece of superior software, you’ll have a much harder time charging money for it than the direct economic proposition would suggest, and a much harder time getting people to try it and learn how to use it than the direct economic proposition would suggest.