Category Rationality

Newcomb’s Mirror

Speed Read This
Posted by on July 19, 2015


Newcomb’s Problem is a classic problem in decision theory, which goes like this. First, you meet Omega. Omega presents you with two boxes, the first opaque and the second transparent. You can either take box 1, or take both box 1 and box 2. Box 2 contains $1000. Omega has simulated you and predicted what you will choose. If you were predicted to take one box, then box 1 contains $1M; if you were predicted to take both boxes, then box 1 is empty.

A decision theory is a procedure for getting from a description like the previous paragraph to a decision. A decision theory is good if following it would mean you get more money, and bad if following it would mean you get less money. By this criterion, decision theories which say “one box” are good and decision theories which say “two box” are bad. A paragraph is a real decision theory problem if you’re presumed to know everything about the problem setup before the first decision, it’s fully unambiguous what happens and what you prefer, and what happens can be determined from your actions alone.

Omega, in this case, is philosophical shorthand for “don’t argue with the premise of the setup”. You’re supposed to assume everything Omega tells you is simply true; any doubt you may have is shunted out of decision theory and is taken as an epistemologist’s problem instead. This prevents dodging the question with reasonable-but-irrelevant ideas like “put the opaque box on a scale before you decide” and silly answers like “mug Omega for the $1M that’s still in his pocket”. This is necessary because those sorts of answers can be used to dodge the problem forever, especially if the problem involves a trolley. On the other hand, using Omega to close off aspects of a problem can block interesting lines of thought and leave an answer that’s intuitively unsatisfying.

In CDT, you first draw a causal diagram to represent the problem. Then you pick out one node, and only one node, to represent your decision (in this case indicated by making the node a square instead of a circle). Then you perform “counterfactual surgery”: imagine all the edges pointing into the decision node were severed, and you could choose anything you wanted; then choose whatever would give the best result (in this case, biggest value for node $). Timeless Decision Theory is exactly the same, except that in TDT, you introduce a notion of “your algorithm” which is separate from your decision, make a node for it.

cdt tdt
CDT (left) and TDT (right)

This takes the complexity of deciding, and pushes it over to the complexity of arranging the right causal network. The first discussions of how to do this corresponded to CDT, and in this formulation, due to some mix of historical accident and underdeveloped mathematical machinery, we only get to choose one node. TDT removes this only-one-node restriction and changes nothing else. In both cases, the main difficulty is in deciding which nodes count as “your computation”, and neither can handle cases where this is fuzzy.

By contrast, Updateless Decision Theory (UDT) throws away the causal networks formulation, and instead asks:

☐(𝔼[$|Decision=1] > 𝔼[$|Decision=2])?

Which reads as: is it provable that if I take one box, I’ll get more in expectation than if I two-box? This is progress, in that while it no longer says as much about the actual mathematical procedure for figuring this out, it at least is no longer committed to anything wrong. There are a technical oddities; you need to insert a hack to prevent it from creating self-fulfilling proofs, like “If I don’t choose X then I’ll be eaten by wolves”, which is technically true because after choosing that UDT proceeds to choose X.

Still, it feels as though there’s something key to Newcomb’s Problem which CDT, TDT, and UDT are all failing to engage with. Something important got smuggled into the problem as fait accompli.


The mirror test is a way to assess the intelligence of animals, which goes like this. First, you wait for the animal to go to sleep. You put a bright orange mark somewhere on its head, where it can’t see. When it wakes up, you show it a mirror. If it tries to remove the mark or gives some other sign of understanding that it is seeing itself in a mirror, then it passes the test. Chimps, dolphins, and humans pass the mirror test. Pandas, pigeons, and baboons do not.

Newcomb’s Problem is a sort of generalization of the mirror test – but one where the generalization from mirrors to simulations comes pre-explained, placed in the “Omega” part of the setup where you’re not supposed to engage with it. However, as soon as you try to generalize from Newcomb’s Problem to something more realistic, the mirror-test portion of the problem becomes the focus and the hard part. Here are some examples of problems people have called “Newcomblike”:

  • Parfit’s Hitchhiker: Someone reads your facial expressions to determine whether you will keep a promise to pay them later, and helps you if he predicts you will. Is their prediction related enough to your decision that you should pay them?
  • Voting: Your vote individually has too small an effect to to justify the cost, but your decision of whether or not to vote is somehow related to the decisions of others who would vote the same way.
  • Counterfactual mugging: Your decision is related to that of a hypothetical alternate version of you who doesn’t exist.

To help think about these sorts of problems, I’ve come up with two new variations on Newcomb’s problem.


Consider an alternative version of Newcomb’s problem, which we’ll call Newcomb’s Mirror Test. It goes like this. Box 1 is either empty or contains $3k (three times as much as box 2). Omega flips a coin. If the coin comes up heads, he simulates you, and puts $3k in the box if you one-box, or $0 if you two-box. If the coin comes up tails, Omega picks someone else in the world at random, and fills or doesn’t fill the box according to their choice. Then Omega shows you a brain scan of the simulation that was run. (All of the simulations see your brain scan, the other people Omega is choosing from are half one-boxers and half two-boxers).

If you accept the one-box solution to Newcomb’s original problem, then the challenge in Newcomb’s Mirror Test is whether you can recognize your own brain, as seen from the outside. If you can, then you check whether the brain scan you’re shown is your own, and if so, one-box; otherwise, two-box. This breaks the usual template of decision-theory problems because it asks you to bring in outside knowledge.

Realistic Newcomblike problems don’t usually involve brain scans and full-fidelity simulations. Instead, they involve similarities within groups, low-fidelity models, and similar ideas. To capture this, consider another scenario, which I’ll call Newcomb’s Blurry Mirror. Newcomb’s Blurry Mirror works like this. Omega starts with full-resolution models of you and everyone else on Earth. By some specified procedure, Omega removes a little bit of detail from each model, and checks whether there is any other model which, with that detail removed, is exactly identical to yours. If not, Omega removes a little more detail. This goes on until Omega has a low-resolution model which is sufficient to identify you and exactly one other person, but not to distinguish between the two of you.

Omega then simulates the other person, looking at the blurry model and then taking one or both boxes. If the other person is predicted to one-box, then box 1 will contain $3k; otherwise, it will contain $0.

This is analogous to a scenario where someone predicts what you will do based on the fact that you fall in some reference class. This has a Prisoner’s Dilemma-like aspect to it; your decision impacts the other person, and vise versa. The challenge in Newcomb’s Blurry Mirror is to look at the blurring procedure and navigate yourself into a reference class with someone who will one-box/cooperate (ideally while two-boxing/defecting yourself).

Neither Newcomb’s Mirror Test nor Newcomb’s Blurry Mirror are proper decision theory problems. Instead, they highlight boundaries between decision theory, the embodiment problem, and game theory. To the limited extent that they are decision theory problems, however, UDT handles them correctly, CDT handles them incorrectly, and TDT gets too vague to produce an answer. Newcomb’s Mirror Test asks you to bring in outside knowledge, to use it to distinguish a copy of yourself, and to be the sort of person that could be distinguished that way. Newcomb’s Blurry Mirror cares not just about what you do, but about details of how you do it and about what else you are. Nevertheless, these seem to strike pretty close to the core of what people end up calling “Newcomblike problems”.

Conservation of Virtue

Speed Read This
Posted by on June 16, 2015

In Dungeons and Dragons and many similar games, player characters are created with a point system, and have six attributes: strength, dexterity, constitution, intelligence, wisdom, and charisma. Each of these six attributes is represented by a number, and within an adventuring party, while one player’s character might be stronger or wiser or more charismatic, this will always be counterbalanced by a weakness somewhere else.

In the real world, people tend to sort themselves according to awesomeness. They try to hang out with people who are about as cool as they are. Your friends are about as cool as you are; their friends are about as cool as they are. As a result, if your friend introduces you to someone, that person is on average about as cool as you are, too. If you go to the best college you can get into and afford, you will mostly meet people for whom that was the best college they could get into and afford. If you go to the best party that will have you, you will on average tend to meet people for whom that was the best party that would have them.

This produces an odd effect. If you meet someone and find out that they have some significant weakness, this gives you evidence that they have some other strength, which you don’t know about; otherwise, they would’ve sorted into a different college or a different group of friends. Similarly, if you meet someone and find out that they have a particular strength, then this gives you evidence that they are weaker in some other way, for the same reason. I call this effect Conservation of Virtue.

There are three issues with the Conservation of Virtue effect. The first issue is that real people have more than six attributes, and no social dynamic is nearly so precise as a point system with everyone having exactly 75 attribute-points total. Even in a group that carefully filtered its members, you will sometimes meet people who are much more or less virtuous than the average, and if you let the Conservation of Virtue effect inform your intuition, you might fail to notice. And sometimes, you will meet people in ways that aren’t related to any filtering process, so the Conservation of Virtue effect no longer applies.

The bigger issue, though, is that this can make you believe things are tradeoffs when they really aren’t. For example, when I was younger, I noticed the cultural cliche of stupid athletes and smart but weak nerds – and, without ever raising the question to conscious awareness, came to the belief that I could make myself smarter by neglecting fitness as hard as I could. Similarly, I recall a case of someone I know complaining about good reasoners neglecting empiricism, and good empiricists neglecting complex reasoning. Sometimes, false tradeoffs even get baked into our terminology, like “Fox/Hedgehog” (aka generalist/specialist). This is closer to a true tradeoff because building generalist knowledge and building specialist knowledge are at least competing for time, but it is in fact possible to have both generalist knowledge and specialist knowledge; I have heard this referred to as being “T-shaped”.

These confusions can only manifest when things are left implicit. A statement like “I can make myself smarter by neglecting fitness really hard” could never hold up to conscious scrutiny in the presence of real understanding. By giving this effect a name, hopefully it will be easier to notice and to tell when it does and doesn’t apply.