A non-negative real- or ∞-valued mapping *f* on a field of sets is said to be *finitely additive* if and only if for every pair of disjoint sets *A* and *B* in , we have

The most important examples of finitely additive mappings are measures, including probability measures, although not every finitely additive mapping is a measure (measures are mappings on *σ-algebras*, which are a special sort of field of sets, that are *countably additive*, which is a stronger property than finite additivity).

From the definition it is immediately evident that finite additivity allows us to express the value under a mapping *f* on a field of sets of any *binary* union of pairwise disjoint sets in in terms of the values under *f* of the individual sets. In fact, the same can be said for unions of any arity, provided they are pairwise disjoint. For every field of sets, every finitely additive mapping *f* on , every [1] and every *n*-tuple (*A*_{1}, *A*_{2}, …, *A*_{n}) of pairwise disjoint sets in , we have

This can be proven by induction. In the base case of *n* = 0, we have

which implies that 0 = *f*(∅). Now, suppose and (1) holds for every (*m* + 1)-tuple (*A*_{1}, *A*_{2}, …, *A*_{m + 1}) of pairwise disjoint sets in . Then

which completes the proof.

But what about unions of sets in that are not necessarily pairwise disjoint? Can the values under *f* of such unions be expressed in terms of the values under *f* of the individual sets then? The answer is no. However, such unions’ values under *f* can be expressed in terms of the values under *f* of the individual sets *and their intersections*, by what is known as the *inclusion-exclusion principle*. For every and every , we have

The sum on the right-hand side of (2) is a rather complicated one, cumbersome to write down as well as computationally expensive to compute by virtue of its large number of terms (one for every non-empty subset of , and there are 2^{n} − 1 of those). Therefore, it is also convenient to use *Bonferroni’s inequalities*, which say that for every , we have

with the sign standing for “greater than or equal to, if *m* is even; less than or equal to, if *m* is odd”. The sum on the right-hand side of (3) has terms only for every non-empty subset of with no more than *m* members, and there are only 2^{m} − 1 of those. In particular, if *m* = 1 the terms are just the values under *f* of the individual sets *A*_{1}, *A*_{2}, … and *A*_{n} and therefore we have

which is *Boole’s inequality*.

Note that Bonferroni’s inequalities hold when *m* ≥ *n* as well as when *m* < *n*. When *m* ≥ *n*, the sum on the RHS of (3) is exactly the same as the sum on the RHs of (2). Because there are both even and odd integers *m* such that *m* ≥ *n*, and any quantity which is both less than or equal to and greater than or equal to another quantity has to be equal to it, it follows that Bonferroni’s inequalities imply that the equation (2) holds and thus generalize the inclusion-exclusion principle.

In order to prove that Bonferroni’s inequalities hold, and thus prove the inclusion-exclusion principle, we can use induction. First, consider the case where *m* = 0. The only subset of of cardinality 0 is the empty one, so in this case the sum on the right-hand side of (3) is empty and (3) therefore reduces to the statement that

which is true because *f* is non-negative- or ∞-valued.

Now, suppose and Bonferroni’s inequalities hold in the case where *m* = *M*, and consider the case where *m* = *M* + 1. We shall use another inductive proof, within the inductive proof we’re currently carrying out, to show that Bonferroni’s inequalities hold for every in when *m* has this particular value. In the case where *n* = 0, the left-hand side of (3) reduces to *f*(∅) and the right-hand side is the empty sum once again, because there are no subsets of ( is the set of the integers greater than or equal to 1 and less than or equal to 0, and there are of course no such integers). Because *f*(∅) = 0 it follows that (3) is true, regardless of the direction of the inequality required.

As for the successor case, suppose that and Bonferroni’s inequalities hold in the case where *n* = *N*. Consider the case where *n* = *N* + 1. It is helpful at this point to write (3) for the given values of *m* and *n* as

where

If we also let

then we have (−1)^{M + 1}(*f*(*B*) − *b*) ≥ 0 because Bonferroni’s inequalities hold in the case where *n* = *N*. Now, how are *f*(*A*) and *a* related to *f*(*B*) and *b*?

We obviously have *A* = *B* ∪ *A*_{N + 1}, but *B* and *A*_{N + 1} are not necessarily disjoint so this doesn’t immediately tell us anything about the relationship of *f*(*A*) and *f*(*B*). However, *A*_{N + 1} is certainly disjoint from *B* \ *A*_{N + 1}, so we have

That’s about all we can usefully say for the moment. As for *a* and *b*, well, the terms of *b* are a subset of those of *a* so it’s quite easy to write down the difference *a* − *b*. If we manipulate that difference a little bit, we can start getting it to look like something that could occur on the right-hand side of (3).

Let

Then we have (−1)^{M}(*f*(*C*) − *c*) ≥ 0, because Bonferroni’s inequalities hold in the case where *m* = *M*, and we have *a* − *b* = *f*(*A*_{N + 1}) − *c*.

Now, if we use the distributivity of intersection over union we can rewrite *C* as *B* ∩ *A*_{N + 1}. It follows that *B* is the disjoint union of *C* and the set *B* \ *A*_{N + 1} which turned up above when we were contemplating the relationship of *f*(*A*) and *f*(*B*), and therefore we have *f*(*B*) = *f*(*C*) + *f*(*B* \ *A*_{N + 1}). Using this new equation we may rewrite (4) as

from which it follows that *f*(*A*) − *f*(*B*) = *f*(*A*_{N + 1}) − *f*(*C*)—a nicely analogous equation to *a* − *b* = *f*(*A*_{N + 1}) − *c*. Finally, let us add the two quantities (−1)^{M + 1}(*f*(*B*) − *b*) and (−1)^{M}(*f*(*C*) − *c*) ≥ 0 which we know to be non-negative. The sum of two non-negative quantities is non-negative also, so we have

[1] For every , I denote , and by , and respectively.