# Bonferroni’s inequalities and the inclusion-exclusion principle

A non-negative real- or ∞-valued mapping f on a field $\mathcal F$ of sets is said to be finitely additive if and only if for every pair of disjoint sets A and B in $\mathcal F$, we have

$\displaystyle f(A \cup B) = f(A) + f(B).$

The most important examples of finitely additive mappings are measures, including probability measures, although not every finitely additive mapping is a measure (measures are mappings on σ-algebras, which are a special sort of field of sets, that are countably additive, which is a stronger property than finite additivity).

From the definition it is immediately evident that finite additivity allows us to express the value under a mapping f on a field $\mathcal F$ of sets of any binary union of pairwise disjoint sets in $\mathcal F$ in terms of the values under f of the individual sets. In fact, the same can be said for unions of any arity, provided they are pairwise disjoint. For every field $\mathcal F$ of sets, every finitely additive mapping f on $\mathcal F$, every $n \in \mathbb Z_0$[1] and every n-tuple (A1, A2, …, An) of pairwise disjoint sets in $\mathcal F$, we have

$\displaystyle (1) \quad f \left( \bigcup_{k = 1}^n A_k \right) = \sum_{k = 1}^n A_k.$

This can be proven by induction. In the base case of n = 0, we have

\displaystyle \begin{aligned} f(\emptyset) &= f(\emptyset \cup \emptyset) \\ &= f(\emptyset) + f(\emptyset) \end{aligned}

which implies that 0 = f(∅). Now, suppose $m \in \mathbb Z_0$ and (1) holds for every (m + 1)-tuple (A1, A2, …, Am + 1) of pairwise disjoint sets in $\mathcal F$. Then

\displaystyle \begin{aligned} f \left( \bigcup_{k = 1}^{m + 1} A_k \right) &= f \left( \left( \bigcup_{k = 1}^m A_k \right) \cup A_{m + 1} \right) \\ &= f \left( \bigcup_{k = 1}^m A_k \right) + f(A_{m + 1}) \\ &= \sum_{k = 1}^m f(A_k) + f(A_{m + 1}) \\ &= \sum_{k = 1}^{m + 1} f(A_k) \end{aligned}

which completes the proof.

But what about unions of sets in $\mathcal F$ that are not necessarily pairwise disjoint? Can the values under f of such unions be expressed in terms of the values under f of the individual sets then? The answer is no. However, such unions’ values under f can be expressed in terms of the values under f of the individual sets and their intersections, by what is known as the inclusion-exclusion principle. For every $n \in \mathbb Z_0$ and every $(A_1, A_2, \dotsc, A_n) \in \mathcal F^{\times n}$, we have

$\displaystyle (2) \quad f \left( \bigcup_{k = 1}^n A_k \right) = \sum_{\substack{S \subseteq \mathbb Z_1^n \\ S \ne \emptyset}} (-1)^{\#S + 1} f \left( \bigcap_{k \in S} A_k \right).$

The sum on the right-hand side of (2) is a rather complicated one, cumbersome to write down as well as computationally expensive to compute by virtue of its large number of terms (one for every non-empty subset of $\mathbb Z_1^n$, and there are 2n − 1 of those). Therefore, it is also convenient to use Bonferroni’s inequalities, which say that for every $m \in \mathbb Z_0$, we have

$\displaystyle (3) \quad f \left( \bigcup_{k = 1}^n A_k \right) \gtreqless \sum_{\substack{S \subseteq \mathbb Z_1^n \\ 0 < \#S \le m}} (-1)^{\#S + 1} f \left( \bigcap_{k \in S} n A_k \right),$

with the sign $\gtreqless$ standing for “greater than or equal to, if m is even; less than or equal to, if m is odd”. The sum on the right-hand side of (3) has terms only for every non-empty subset of $\mathbb Z_1^n$ with no more than m members, and there are only 2m − 1 of those. In particular, if m = 1 the terms are just the values under f of the individual sets A1, A2, … and An and therefore we have

$\displaystyle f \left( \bigcup_{k = 1}^n A_k \right) \le \sum_{k = 1}^n f(A_k),$

which is Boole’s inequality.

Note that Bonferroni’s inequalities hold when mn as well as when m < n. When mn, the sum on the RHS of (3) is exactly the same as the sum on the RHs of (2). Because there are both even and odd integers m such that mn, and any quantity which is both less than or equal to and greater than or equal to another quantity has to be equal to it, it follows that Bonferroni’s inequalities imply that the equation (2) holds and thus generalize the inclusion-exclusion principle.

In order to prove that Bonferroni’s inequalities hold, and thus prove the inclusion-exclusion principle, we can use induction. First, consider the case where m = 0. The only subset of $\mathbb Z_1^n$ of cardinality 0 is the empty one, so in this case the sum on the right-hand side of (3) is empty and (3) therefore reduces to the statement that

$\displaystyle f \left( \bigcup_{k = 1}^n A_k \right) \ge 0,$

which is true because f is non-negative- or ∞-valued.

Now, suppose $M \in \mathbb Z_0$ and Bonferroni’s inequalities hold in the case where m = M, and consider the case where m = M + 1. We shall use another inductive proof, within the inductive proof we’re currently carrying out, to show that Bonferroni’s inequalities hold for every $n \in \mathbb Z_0$ in when m has this particular value. In the case where n = 0, the left-hand side of (3) reduces to f(∅) and the right-hand side is the empty sum once again, because there are no subsets of $\mathbb Z_1^0$ ($\mathbb Z_1^0$ is the set of the integers greater than or equal to 1 and less than or equal to 0, and there are of course no such integers). Because f(∅) = 0 it follows that (3) is true, regardless of the direction of the inequality required.

As for the successor case, suppose that $N \in \mathbb Z_0$ and Bonferroni’s inequalities hold in the case where n = N. Consider the case where n = N + 1. It is helpful at this point to write (3) for the given values of m and n as

$\displaystyle (-1)^{M + 1} (f(A) - a) \ge 0,$

where

\displaystyle \begin{aligned} A &= \bigcup_{k = 1}^{N + 1} A_k, \\ a &= \sum_{\substack{S \subseteq \mathbb Z_1^{N + 1} \\ 0 < \#S \le M + 1}} (-1)^{\#S + 1} f \left( \bigcap_{k \in S} A_k \right). \end{aligned}

If we also let

\displaystyle \begin{aligned} B &= \bigcup_{k = 1}^N A_k, \\ b &= \sum_{\substack{S \subseteq \mathbb Z_1^N \\ 0 < \#S \le M + 1}} (-1)^{\#S + 1} f \left( \bigcap_{k \in S} A_k \right), \end{aligned}

then we have (−1)M + 1(f(B) − b) ≥ 0 because Bonferroni’s inequalities hold in the case where n = N. Now, how are f(A) and a related to f(B) and b?

We obviously have A = BAN + 1, but B and AN + 1 are not necessarily disjoint so this doesn’t immediately tell us anything about the relationship of f(A) and f(B). However, AN + 1 is certainly disjoint from B \ AN + 1, so we have

$(4) \quad \displaystyle f(A) = f(B \setminus A_{N + 1}) + f(A_{N + 1}).$

That’s about all we can usefully say for the moment. As for a and b, well, the terms of b are a subset of those of a so it’s quite easy to write down the difference ab. If we manipulate that difference a little bit, we can start getting it to look like something that could occur on the right-hand side of (3).

\displaystyle \begin{aligned} a - b &= \sum_{\substack{S \subseteq \mathbb Z_1^{N + 1} \\ N + 1 \in S \\ \#S \le M + 1}} (-1)^{\#S + 1} f \left( \bigcap_{k \in S} A_k \right) \\ &= \sum_{\substack{S \subseteq \mathbb Z_1^N \\ \#S \le M}} (-1)^{\#S + 2} f \left( \left( \bigcap_{k \in S} A_k \right) \cap A_{N + 1} \right) \\ &= f(A_{N + 1}) - \sum_{\substack{S \subseteq \mathbb Z_1^N \\ 0 < \#S \le M}} (-1)^{\#S + 1} f \left( \bigcap_{k \in S} A_k \cap A_{N + 1} \right) \end{aligned}

Let

\displaystyle \begin{aligned} C &= \bigcup_{k = 1}^N A_k \cap A_{N + 1}, \\ c &= \sum_{\substack{S \subseteq \mathbb Z_1^N \\ 0 < \#S \le M}} (-1)^{\#S + 1} f \left( \bigcap_{k \in S} A_k \cap A_{N + 1} \right). \end{aligned}

Then we have (−1)M(f(C) − c) ≥ 0, because Bonferroni’s inequalities hold in the case where m = M, and we have ab = f(AN + 1) − c.

Now, if we use the distributivity of intersection over union we can rewrite C as BAN + 1. It follows that B is the disjoint union of C and the set B \ AN + 1 which turned up above when we were contemplating the relationship of f(A) and f(B), and therefore we have f(B) = f(C) + f(B \ AN + 1). Using this new equation we may rewrite (4) as

$(4) \quad \displaystyle f(A) = f(B) - f(C) + f(A_{N + 1}),$

from which it follows that f(A) − f(B) = f(AN + 1) − f(C)—a nicely analogous equation to ab = f(AN + 1) − c. Finally, let us add the two quantities (−1)M + 1(f(B) − b) and (−1)M(f(C) − c) ≥ 0 which we know to be non-negative. The sum of two non-negative quantities is non-negative also, so we have

\displaystyle \begin{aligned} 0 &\le (-1)^{M + 1} (f(B) - b) + (-1)^M (f(C) - c) \\ &= (-1)^{M + 1} (f(B) - b - (f(C) - c)) \\ &= (-1)^{M + 1} (f(B) - f(C) - (b - c)) \\ &= (-1)^{M + 1} (f(A) - f(A_{N + 1}) - (a - f(A_{N + 1}))) \\ &= (-1)^{M + 1} (f(A) - a). \end{aligned}

$\blacksquare$

[1] For every $(a, b) \in \mathbb Z^{\times 2}$, I denote $\{n \in \mathbb Z : a \le n \le b\}$, $\{n \in \mathbb Z : a \le n\}$ and $\{n \in \mathbb Z : n \le b\}$ by $\mathbb Z_a^b$, $\mathbb Z_a$ and $\mathbb Z^b$ respectively.