Pure and mixed strategies in prediction

Consider the following very simple game: a Bernoulli trial (a trial which results in one of two possible outcomes, labelled “success” and “failure”) is carried out with success probability p. Beforehand, you are told the value of p and asked to give a definite prediction of the trial’s outcome. That is, you have to predict either success or failure; just saying “the probability of success is p” is not enough. You win if and only if you predict the correct outcome.

Here are two reasonable-sounding strategies for this game:

  1. If p > 0.5, predict success. If p < 0.5, predict failure. If p = 0.5, predict success with probability 0.5 and failure with probability 0.5.
  2. Predict success with probability p and failure with probability 1 - p.

In game-theoretic language, the difference between strategies 1 and 2 is that strategy 1 involves the use of a pure strategy if possible, i.e. one in which the choice of what to predict is made deterministically, while strategy 2 is always mixed, i.e. the choice of what to predict is made randomly.

But which is better? Note that the answer may depend on the value of p. Try to think about it for a minute before moving on to the next paragraph.

If p = 0.5, then the strategies are identical and therefore both equally good.

If p \ne 0.5, let q be the probability of the more probable outcome (i.e. p if p > 0.5 and 1 - p if p < 0.5). If the more probable outcome happens, then you win for sure under strategy 1 but you only have probability q of winning under strategy 2. If the less probable outcome happens, then you lose for sure under strategy 1 but you still have probability 1 - q of winning under strategy 2. Therefore the probability of winning is q \cdot 1 + (1 - q) \cdot 0 = q under strategy 1 and q \cdot q + (1 - q) \cdot (1 - q) = 1 - 2q(1 - q) under strategy 2. So strategy 1 is better than strategy 2 if and only if

\displaystyle q > 1 - 2q(1 - q),


\displaystyle 3q - 2q^2 - 1 > 0.

This quadratic inequality holds if and only if 0.5 < q < 1. But q is the probability of the more probable outcome, and therefore q > 0.5 for sure. Therefore, strategy 1 is always better if p \ne 0.5.

I find this result weird and a little counterintuitive when it’s stated so abstractly. It seems to me like the most natural way of obtaining a definite value from the distribution—drawing randomly from the distribution—should be the best one.

But I guess it does makes sense, if you think about it as applying to a concrete situation. For example, if you were on a jury and you thought there was a 1/1024 probability that the defendant was guilty, it would be crazy to then flip 10 coins and precommit to arguing for the defendant’s guilt if every one of them came up heads. The other jurors would think you were mad (and probably be very angry with you, if they did all come up heads).

The result has interesting implications for how people should act on their beliefs. If you believe that degrees of belief can be usefully modelled as probabilities, and you try to apply this in everyday reasoning, you will often be faced with the problem of deciding whether to act in accordance with a belief’s truth even if you only place a certain probability p on that belief being true. Should you always act in accordance with the belief if p > 0.5, or should you have probability p of acting in accordance with it at any given time? Until I wrote this post it wasn’t obvious to me, but the result in this post suggests you should do the former.

I do wonder if there is anything strategy 2 is good for, though. Comment if you have an idea!


4 responses to “Pure and mixed strategies in prediction

  1. One answer: strategy 2 is better if two people make their own predictions, and the win is if at least one person makes the correct prediction. Because under strategy 1, both people will make the same prediction (as long as p \ne 0.5), so it’s effectively the same as if there’s only one person and the probability of winning is q. But under strategy 2, the probability of winning is

    \displaystyle q (q^2 + 2q(1 - q)) + (1 - q) (2q(1 - q) + (1 - q)^2)

    which simplifies to

    \displaystyle q^3 + 2q^2(1 - q) + 2q(1 - q)^2 + (1 - q)^3 = 1 - q^2(1 - q) - q(1 - q)^2.

    The inequality

    \displaystyle q < 1 - q^2(1 - q) - q(1 - q)^2

    is equivalent to

    \displaystyle 2q - q^2 - 1 < 0

    which is true for every q.

    But of course, strategy 2 isn't the optimal one in this situation. The optimal strategy is for one person to predict success and one person to predict failure, so that winning is certain. But perhaps it’s optimal if both players have to use the same strategy. I haven’t checked this yet.

  2. Another use of strategy 2 would be protection in the case your model is really wrong. If you think you’re playing (p, 1-p) with p>1/2 but really you’re playing (q,1-q) with q < 1/2 then the mixed strategy outperforms the pure strategy. Could be useful if you're making a guess under conditions of uncertainty where the probability is close to a half.

    • If you’re not told the exact success probability, but rather told that the success probability was randomly generated from a given probability distribution F over $[0, 1]$, then the probability that the trial succeeds is

      \displaystyle \int_0^1 p \mathrm d F(p),

      which is the same as the mean of F. The probability that the trial fails is

      \displaystyle \int_0^1 (1 - p) \mathrm d F(p),

      which is the same as 1 minus the mean of F. So I think there’s no effective difference here from playing the game with the success probability given and equal to the mean of F, and that means strategy 1 is still better (with “the mean of F” in place of “p”), once the whole uncertainty is accounted for.

  3. I was thinking about responding on tumblr or something, but I never got around to putting together something more comprehensive. Although the result in counterintuitive (at least, it was for me!), an easy way to see it without grinding through several equations is to just graph it. One of them is a parabola and the other is an absolute-value graph. They coincide at the minimum and at the extreme edges (p=0 and p=1). The parabola obviously is lower than the absolute-value graph (although it’s overtaken by the parabola outside of the p \in [0, 1] interval that we care about).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s