Monthly Archives: December 2014

Fermat’s Christmas theorem

Since it was Christmas two days ago, I thought I’d write about a result which is sometimes called Fermat’s Christmas theorem, since Fermat first claimed to have proven in a letter dated to the 25th of December, 1640 (although, typically for Fermat, he didn’t actually give the proof, so he may not have had a correct proof). Recall that by Pythagoras’s theorem, the length {c} of the hypotenuse of a right-angled triangle is related to the lengths {a} and {b} of the other two sides by the equation

\displaystyle   c^2 = a^2 + b^2. \ \ \ \ \ (1)

Therefore, if {a} and {b} are integers, the square of {c} is also an integer. Interestingly, however, there are non-negative integers which cannot be the square of {c} if {a} and {b} are integers. For example, {c^2} cannot be 3, because if {a \ge 2} or {b \ge 2}, then {a^2 + b^2 \ge 2^2 = 4}, and if {a \le 1} and {b \le 1}, {a^2 + b^2 \le 1^2 + 1^2 = 2}. Let’s call the non-negative integers which {c^2} can be the ‘2-square’ integers, because, just as the square integers are the integers {n} such that if you have {n} blocks, you can arrange all the blocks into exactly one square, the 2-square integers are the integers {n} such that if you have {n} blocks, you can arrange all the blocks into exactly two squares. In other words, the 2-square integers are exactly those equal to equal to {a^2 + b^2} for some pair of integers {a} and {b}.

There is an obvious method for finding out whether a given integer {n} is 2-square. If {n} is negative, it is definitely not 2-square, because square integers are never negative and sums of non-negative integers are never negative. Otherwise, whether {n} is 2-square can be determined by simply testing each of the pairs of integers {a} and {b} such that {a \le \sqrt n} and {b \le \sqrt n}, of which there are only finitely many, and seeing whether {a^2 + b^2 = n}. If there is at least one pair such that this is the case, then {n} is 2-square, and if there are no pairs such that this is the case, then {n} cannot be 2-square because for any pair of integers {a} and {b} where at least one of {a} and {b} is greater than {\sqrt n}, at least one of {a^2} and {b^2} is greater than {n}, so {a^2 + b^2 > n}. However, if {n} is non-negative this method involves testing lots of different pairs of integers. It would be nice if there was a quicker method we could use in the case where {n} is non-negative.

When you want to find out whether a non-negative integer {n} is square, you can use a similar method, testing all the integers {a} such that {a \le n} and seeing whether {a^2 = n}, but there is a quicker method which can be used when you know the prime factorisation of {n}: {n} is square if and only if the multiplicity of every prime in the prime factorisation of {n} is even 1 (this method isn’t applicable when {n = 0} because only positive integers have prime factorisations, but it’s easy to see that 0 is square). Therefore, you might expect that if there is a quicker method for finding out whether {n} is 2-square, it will also involve the prime factorisation of {n}.

Let’s try just writing out the first 50 positive integers and identifying which of these are 2-square; maybe we will be able to see the pattern. We don’t need to worry about the negative integers and 0, because we already established that no negative integer is 2-square, and it is easy to see that 0 is 2-square (because it is equal to {0 + 0}). Since we already have a hunch that the prime factorisations will be involved, let’s write them as their prime factorisations. The bolded numbers are the 2-square numbers.

\displaystyle \begin{matrix} \mathbf{1} & \mathbf{2} & 3 & \mathbf{2^2} & \mathbf{5} & 2 \cdot 3 & 7 & \mathbf{2^3} & \mathbf{3^2} & \mathbf{2 \cdot 5} \\ 11 & 2^2 \cdot 3 & \mathbf{13} & 2 \cdot 7 & 3 \cdot 5 & \mathbf{2^4} & \mathbf{17} & \mathbf{2 \cdot 3^2} & 19 & \mathbf{2^2 \cdot 5} \\ 3 \cdot 7 & 2 \cdot 11 & 23 & 2^3 \cdot 3 & \mathbf{5^2} & \mathbf{2 \cdot 13} & 3^3 & 2^2 \cdot 7 & \mathbf{29} & 2 \cdot 3 \cdot 5 \\ 31 & \mathbf{2^5} & 3 \cdot 11 & \mathbf{2 \cdot 17} & 5 \cdot 7 & \mathbf{2^2 \cdot 3^2} & \mathbf{37} & 2 \cdot 19 & 39 & \mathbf{2^3 \cdot 5} \\ \mathbf{41} & 2 \cdot 3 \cdot 7 & 43 & 2^2 \cdot 11 & \mathbf{3^2 \cdot 5} & 2 \cdot 23 & 47 & 2^4 \cdot 3 & \mathbf{7^2} & \mathbf{2 \cdot 5^2} \end{matrix}

There isn’t a particularly obvious pattern here. We can see that every square integer is 2-square as well; this is because 0 is square and adding 0 to a square does not change it. Also, for every square integer {n}, {2 n} is 2-square, because {2 n = n + n}. Notice that 2 is 2-square, because it is equal to {1 + 1}. So this is actually a special case of a more general rule: for every 2-square number {m} and every square integer {n}, {m n} is 2-square as well, because there are integers {x}, {y} and {z} such that {m = x^2 + y^2} and {n = z^2}, so {m n = (x^2 + y^2) z^2 = x^2 z^2 + y^2 z^2 = (x z)^2 + (y z)^2}. You can see that this holds in the above list: {2^4} and {2^2 \cdot 3^2} are 2-square because {2^2} is 2-square, {2^2 \cdot 5} and {3^2 \cdot 5} are 2-square because 5 is 2-square, {2^5} is 2-square because {2^3} is 2-square, {2^2 \cdot 3^2} is 2-square because {3^2} is 2-square, and {2^3 \cdot 5} is 2-square because {2 \cdot 5} is 2-square. The natural question to ask at this point is: does the converse hold? That is, if {m} is not 2-square and {n} is square, is {m n} not 2-square either? Well, obviously if {n} is 0, {m n} will always be 0 which is 2-square, but does it hold if {n} is positive? Let’s see if we can find a counterexample in the list above. 3 is not 2-square, and neither are {2^2 \cdot 3}, {3^3} or {2^4 \cdot 3}. {2 \cdot 3} is not 2-square, and neither is {2^3 \cdot 3}. 7 is not 2-square, and neither is {2^2 \cdot 7}. 11 is not 2-square, and neither is {2^2 \cdot 11}. {2^2 \cdot 3} is not 2-square, and neither is {2^4 \cdot 3}. It certainly looks like the converse does hold. But it’s not obvious how this can be proven, since there’s not much you can do with a negative condition like ‘{m} is not 2-square’. For now, let’s just state it as a conjecture.

Conjecture 1. For every pair of integers {m} and {n} such that {m} is not 2-square and {n} is positive and square, {m n} is not 2-square.

If conjecture 1 does hold, then we can determine whether an integer {n} is 2-square by dividing it by one of its positive square factors and seeing whether the result is 2-square (if it is 2-square, then so is {n}, and if it is not 2-square, then neither is {n}). Therefore, we only need to be able to be able to determine whether the ‘square-free’ positive integers (those with no positive square factor other than 1) are 2-square. Let’s write out the list again, but omit all the non-square-free integers. Since the integers in the list are written as their prime factorisations, it is very easy to see which of them are square-free: the square-free integers are those whose prime factorisations do not contain multiple instances of any prime.

\displaystyle \begin{matrix} \mathbf{1} & \mathbf{2} & 3 & & \mathbf{5} & 2 \cdot 3 & 7 & & & \mathbf{2 \cdot 5} \\ 11 & & \mathbf{13} & 2 \cdot 7 & 3 \cdot 5 & & \mathbf{17} & & 19 & \\ 3 \cdot 7 & 2 \cdot 11 & 23 & & & \mathbf{2 \cdot 13} & & & \mathbf{29} & 2 \cdot 3 \cdot 5 \\ 31 & & 3 \cdot 11 & \mathbf{2 \cdot 17} & 5 \cdot 7 & & \mathbf{37} & 2 \cdot 19 & 39 & \\ \mathbf{41} & 2 \cdot 3 \cdot 7 & 43 & 2 \cdot 23 & & & 47 & & & \end{matrix}

We’re getting closer to seeing the pattern, but it’s still not very obvious. However, notice that {2 \cdot 5} is 2-square, and so are its prime factors, 2 and 5. {2 \cdot 13} is 2-square, and 13 is also 2-square. {2 \cdot 17} is 2-square, and 17 is also 2-square. On the other hand, 3 is not 2-square, and all the square-free numbers which include it as a prime factor ({2 \cdot 3}, {3 \cdot 5}, {3 \cdot 7}, {2 \cdot 3 \cdot 5}, {3 \cdot 11} and {2 \cdot 3 \cdot 7}) are not 2-square. 7 is not 2-square, and neither are {2 \cdot 7} and {5 \cdot 7}. 11 is not 2-square, and neither is {2 \cdot 11}. 19 is not 2-square, and neither is {2 \cdot 19}. 23 is not 2-square, and neither is {2 \cdot 23}. It looks like a square-free integer is 2-square if and only if all of its prime factors are 2-square.

Again, the ‘only if’ part will be difficult to prove since we are working with a negative condition. But let’s try and prove the ‘if’ part here, because in that case we do have something to work with. Now, whenever you want to prove something it’s always useful to try and identify exactly what you’re going to need to use in order to prove it. What we want to prove is that for every square-free positive integer {n} whose prime factors are all 2-square, then {n} is 2-square. But do we really need to use the fact that {n} is square-free? We don’t, actually. The 2-square primes in the list are 2, 5, 13, 17, 29, 37 and 41, and sure enough, {2^2}, {2^3}, {2^4}, {2^2 \cdot 5}, {5^2}, {2^5}, {2 \cdot 17}, {2^3 \cdot 5}, {3^2 \cdot 5} and {2 \cdot 5^2} are all 2-square, so it seems that even if {n} is not square-free, it is still the case that if all its prime factors are 2-square, {n} is 2-square as well. The fact that {n} is square-free only needs to be used in the ‘only if’ part. In fact, we don’t even need to use the fact that the factors are prime, because the statement ‘every positive integer {n} whose prime factors are all 2-square is 2-square’ is equivalent to the statement ‘for every pair of 2-square positive integers {a} and {b}, {a b} is 2-square’, assuming the ‘only if’ part holds. It’s obvious that the second statement implies the first, and to see that the first implies the second, just notice that if {a} and {b} are 2-square, so are all their prime factors (by the ‘only if’ part), and hence so are all the prime factors of {a b} (since each such factor is either a factor of {a} or a factor of {b}). So, it should be possible to prove that {a b} is 2-square as long as {a} and {b} are both 2-square; no additional information should be needed.

If {a} is 2-square, it can be written as {x^2 + y^2} where {x} and {y} are integers, and if {b} is 2-square, it can be written as {x^2 + z^2} where {w} and {z} are integers. The product {a b} is then {(x^2 + y^2) (u^2 + v^2) = x^2 u^2 + x^2 v^2 + y^2 u^2 + y^2 v^2 = (x u)^2 + (x v)^2 + (y u)^2 + (y v)^2}. Well, this is definitely a sum of four square integers. At this point it’s not totally obvious what to do, which is why it is useful to be sure that the proof can be carried out even if {a b} is not square-free or {a} and {b} are not both prime. You might just stumble across the method by which the proof is completed by trial and error, but a more illuminating way of completing the proof is to think, again, about what 2-square numbers are in geometric terms. They are the squares of hypotenuses of triangles whose other sides are of integral length. Equivalently, they are the squares of distances from the origin of points on the coordinate plane whose coordinates are integers. Equivalently, they are squares of norms of complex numbers whose real and imaginary parts are integers. Now, the norm operation on complex numbers respects multiplication, just as it does on real numbers: for every pair of complex numbers {a} and {b}, {|a b| = |a| |b|}. If {a = x + y i} and {b = u + v i}, then {a b = (x u - y v) + (x v - y u) i}, and hence the equation {|a b| = |a| |b|} is equivalent to the equation

\displaystyle   (x u + y v)^2 + (x v - y u)^2 = \left( x^2 + y^2 \right) \left( u^2 + v^2 \right), \ \ \ \ \ (2)

which shows how {(x^2 + y^2) (u^2 + v^2)} can be expressed as the sum of two square integers. And you can check, if you want, by expanding the brackets, that both sides of this equation are the same.

That takes care of the ‘if’ part; the ‘only if’ part is not yet proven, so let’s state it as our second conjecture.

Conjecture 2. For every square-free positive integer {n} which is 2-square, every prime factor of {n} is 2-square.

If conjecture 2 does hold, then we can determine whether a square-free positive integer {n} is 2-square by seeing whether its prime factors are 2-square (if they are all 2-square, then so is {n}, and if at least one is not 2-square, then neither is {n}). So let’s rewrite the list a final time, including only the prime numbers.

\displaystyle \begin{matrix} & \mathbf{2} & 3 & & \mathbf{5} & & 7 & & & \\ 11 & & \mathbf{13} & & & & \mathbf{17} & & 19 & \\ & & 23 & & & & & & \mathbf{29} & \\ 31 & & & & & & \mathbf{37} & & 39 & \\ \mathbf{41} & & 43 & & & & 47 & & & \end{matrix}

Do you see the pattern? Probably not at the moment, but there is one here. Recall that we can divide integers into two kinds, even integers and odd integers, according to their positions in the sequence of all integers, with the terms alternating between being even and being odd (for example, 0 is even, 1 is odd, 2 is even, 3 is odd, and so on). Now, apart from 2, all prime numbers are odd (because even integers are, by definition, divisible by 2). But we can do a similar thing with the sequence of all odd integers and divide all the odd integers into two kinds, which we can call ‘odd-even’ and ‘odd-odd’, according to their position in the sequence (for example, 1 is odd-even, 3 is odd-odd, 5 is odd-even, 7 is odd-odd, and so on). The more standard terms for ‘odd-even integers‘ and ‘odd-odd integers‘ are ‘integers congruent to 1 modulo 4‘ and ‘integers congruent to 3 modulo 4‘, respectively, because, just as even integers are those which are divisible by 2 and odd integers are those whose remainder upon division by 2 is 1, odd-even integers are those whose remainder upon division by 4 is 1 and odd-odd integers are those whose remainder upon division by 4 is 3, and in general we say that an integer {a} is congruent to an integer {b} modulo a positive integer {m} (and write {a \equiv b} (mod {m}) if and only if the remainder of {a} upon division by {m} is {b}. However, using this non-standard terminology makes it easy to see that this is quite a natural way to divide the prime numbers (other than 2) into two classes. Now, if you have a look at the list above, you’ll see that the 2-square integers in this list are exactly those which are odd-even (as well as 2), and the non-2-square integers in this list are exactly those which are odd-odd. It’s a lot easier to see this pattern if we make each line contain 8 numbers rather than 10.

\displaystyle \begin{matrix} & \mathbf{2} & 3 & & \mathbf{5} & & 7 & \\ & & 11 & & \mathbf{13} & & & \\ \mathbf{17} & & 19 & & & & 23 & \\ & & & & \mathbf{29} & & 31 & \\ & & & & \mathbf{37} & & 39 & \\ \mathbf{41} & & 43 & & & & 47 & \\ \end{matrix}

It’s actually very easy to prove that the odd-odd integers in this list are not 2-square, if you are familiar with how congruences work. In fact, no odd-odd integer, whether prime or not, is 2-square. I’ll go through this proof in more detail than it really needs just in case you aren’t very familiar with congruences. First note that there are only 4 possible remainders upon division by 4, 0, 1, 2 and 3, so every integer is congruent to one of these integers modulo 4. Now, integers which are congruent can be regarded as the same up to a certain point. In particular, if {a \equiv b} (mod {m}) and {c \equiv d} (mod {m}) (where {a}, {b}, {c} and {d} are integers and {m} is a positive integer), then {a + c \equiv b + d} (mod {m}) and {a c \equiv b d} (mod {m}). Therefore, for every integer {n}, since one of {n \equiv 0}, {n \equiv 1}, {n \equiv 2} and {n \equiv 3} (mod 4) holds, one of {n^2 \equiv 0}, {n^2 \equiv 1}, {n^2 \equiv 4} and {n^2 \equiv 9} (mod 4) holds. But {4 \equiv 0} (mod 4) and {9 \equiv 1} (mod 4), and congruence is transitive (if {a \equiv b} (mod {m}) and {b \equiv c} (mod {m}), then {a \equiv c} (mod {m})), so in fact {n^2} is always congruent to one of 0 and 1 modulo 4. As a result, every 2-square integer, since it can be written in the form {x^2 + y^2}, can only be congruent to one of 0 ({0 + 0}), 1 ({0 + 1}) or 2 ({1 + 1}) modulo 4, never 3.

Proving that every odd-even prime number is 2-square is much more difficult. In fact, it was probably the chief obstacle Fermat faced in making his proof. However, Fermat did come up with a method, known as Fermat’s method of descent, which, once applied, makes the proof fairly straightforward (in fact, for this reason, we can be pretty sure Fermat really did prove the theorem, unlike some). The idea of Fermat’s method of descent, as applied to this problem in particular, is to prove two things, which together, imply that every odd-even prime number is 2-square.

  1. For every odd-even prime number {p}, there is a multiple of {p} which is 2-square.
  2. For every pair of integers {n} and {k} such that {k n} is 2-square and {1 < k}, there is an integer {k'} such that {0 < k' < k} and {k' n} is 2-square.

This works because for each odd-even prime number {p}, we can find an integer {k} such that {k p} is 2-square, by (1), and then we can find a smaller {k} as many times as needed, using (2), until we get to {k = 1}. It is actually harder to prove (1) than (2), so we will begin by proving (2). Suppose {n} and {k} are integers, {k n} is 2-square and {1 < k}. Then {k n = x^2 + y^2}, where {x} and {y} are integers. We want to find integers {k'}, {x'} and {y'} such that {k n' = (x')^2 + (y')^2} and {0 < k' < k}. Due to equation 2, another 2-square integer which is a multiple of {n} can be found by multiplying {x^2 + y^2} by another 2-square integer. If {u} and {v} are integers, then

\displaystyle \left( x^2 + y^2 \right) \left( u^2 + v^2 \right) = (x u + y v)^2 + (x v - y u)^2

Of course, if {u^2} and {v^2} are not both 0, {k \le k (u^2 + v^2)}, and if {u^2 + v^2} are both 0, {k (u^2 + v^2) = 0}, so either way we haven’t found a new integer {k'} such that {0 < k' < k}. But if we choose {u} and {v} so that {u^2 + v^2} is divisible by {k}, then {(x^2 + y^2) (u^2 + v^2)} will be divisible by {k^2}. Therefore, if we can also get {x u + y v} and {x v - y u} to be divisible by {k}, we can divide both sides of the above equation by {k^2} to get

\displaystyle n \frac {u^2 + v^2} k = \left( \frac {x u + y v} k \right)^2 + \left( \frac {x v - y u} k \right)^2,

so, if we let {k' = (u^2 + v^2)/k}, we have a sum of two squares which is equal to {k' n}. And hopefully, if we choose {u} and {v} carefully, we can make it so that {0 < k' < k}. That inequality can also be written as {0 < (u^2 + v^2)/k < k}, i.e. {0 < u^2 + v^2 < k^2} (since {1 < k}). We can therefore ensure that the inequality holds by making sure that both {u^2} and {v^2} are less than {k^2/2}. But we can actually do better than that and make sure that both {u^2} and {v^2} are less than {k^2/4}, by choosing values of {u} and {v} such that {-k/2 < u \le k/2} and {-k/2 < v \le k/2}. Since there are exactly {k} values in this range, we can find values in this range which are congruent to every integer modulo {k}. In particular, we can choose {u} and {v} so that they are in this range and {u} and {v} are congruent to {x} and {y}, respectively, modulo {k}. Then {u^2 + v^2} and {x u + y v} are congruent to {x^2 + y^2} modulo {k}, so, since {x^2 + y^2} is divisible by {k}, both {u^2 + v^2} and {x u + y v} are divisible by {k}. And {x v - y u} is congruent to {x y - y x = 0} modulo {k}, so, since 0 is divisible by {k}, so is {x v - y u}. And that completes the proof of (2). Note that this process allows us to find an actual pair of integers such that {n} is the sum of their squares, if we have a sum of squares which is a multiple of {n} to start of the process.

So how do we prove (1)? Well, it really is a lot harder to prove than (2). In fact, I’ve just done a course in number theory at the University of York, and we began the course by looking at the same problem I’m talking about in this post. All the work on the problem I’ve done so far in this post was covered in the first and last lectures. Pretty much the entire remainder of the course was about how to prove (1). That said, it could have been done more briefly, it’s just that (1) turns out to be a special case of a more general problem, which it is a natural to investigate at the same time. Recall what (1) states: it states that for every odd-even prime number {p}, there are integers {x} and {y} such that {x^2 + y^2} is divisible by {p}. That means that {x^2 + y^2 \equiv 0} (mod {p}), so {x^2 \equiv -y^2} (mod {p}). Now, integers which are congruent to a square integer modulo {p} are called quadratic residues modulo {p}. So what we’re looking for is a pair of quadratic residues modulo {p} which are each other’s negations. The natural thing to do, then, is to investigate which integers are quadratic residues modulo {p}. This is a large topic, so that’s why it took most of the course to complete the proof. I’m not going to go into the full proof here, in order to prevent this post from becoming ridiculously long. But it turns out that {-1} is a quadratic residue modulo {p} if and only if {p \equiv 1} (mod 4). Therefore, if {p} is odd-even, there is an integer {x} such that {x^2 \equiv -1} (mod {p}), i.e. {x^2 + 1 \equiv 0} (mod {p}), i.e. {x^2 + 1} is a multiple of {p}. And that completes the proof of (1).

Now that we know which primes are 2-square, all that is left to do is to prove conjectures 1 and 2. Before we go ahead with that, we can state Fermat’s theorem on when a positive integer in general is 2-square.

Theorem 1 A positive integer {n} is 2-square if and only if every prime number whose multiplicity in the prime factorisation of {n} is odd is odd-even.

Proof: Suppose {n} is a positive integer. For every prime number {p}, let {\alpha(p)} denote the multiplicity of {p} in the prime factorisation of {n}. Let {s} be the product of all the prime numbers {p} such that {\alpha(p)} is odd. Then {s} is square-free (because its prime factorisation does not contain multiple instances of any prime). Furthermore, {n / s} is square (because the multiplicity of every prime number {p} in its prime factorisation is {\alpha(p)} if {\alpha(p)} is even and {\alpha(p) - 1} if {\alpha(p)} is odd, which is in either case even). Therefore, if {n} is 2-square, {s} is 2-square by conjecture 2, and hence every prime factor of {s} is 2-square, i.e. odd-even, by conjecture 1. On the other hand, if every prime factor of {s} is odd-even, i.e. 2-square, then {s}, as the product of all these prime numbers, is 2-square, and hence {n} is 2-square since {n / s} is square. \Box

We will prove both conjectures at once by showing that if {n} is a 2-square positive integer, then every prime number whose multiplicity in the prime factorisation of {n} is odd is odd-even. The proof uses, again, the fact that {-1} is a quadratic residue modulo {p} if and only if {p \equiv 1} (mod {p}).

Proof: Suppose {n} is a 2-square positive integer, so that {n = x^2 + y^2} where {x} and {y} are integers. Let {d} be the greatest common divisor of {x} and {y}, so there are coprime integers {x'} and {y'} such that {x = d x'} and {y = d y'}. Then

\displaystyle n = (d x')^2 + (d y')^2 = d^2 \left( (x')^2 + (y')^2 \right).

Suppose {p} is a prime number and the multiplicity {\alpha} of {p} in the prime factorisation of {n} is odd. This multiplicity is the sum of its multiplicities in the prime factorisations of {d^2} and {(x')^2 + (y')^2}, respectively. Since {d^2} is square, the former multiplicity is even; therefore, since {\alpha} is odd, the latter multiplicity must be odd, from which we can conclude that it is not 0. In other words, {(x')^2 + (y')^2} is divisible by {p}. That means {(x')^2 + (y')^2 \equiv 0} (mod {p}), so {(x')^2 \equiv -(y')^2} (mod {p}). Now, we can show that neither {x'} nor {y'} is divisible by {p}. First of all, since {x'} and {y'} are coprime, at least one of them must not be divisible by {p}. If {p \mid x'}, then {p \mid (x')^2}, so {(x')^2 \equiv 0} (mod {p}), hence {(y')^2 \equiv 0} (mod {p}) which means {y' \equiv 0} (mod {p}), so {p \mid y'}. If {p \mid y'}, then {p \mid (y')^2}, so {(y')^2 \equiv 0} (mod {p}), hence {(x')^2 \equiv 0} (mod {p}) which means {x' \equiv 0} (mod {p}), so {p \mid x'}. So if one of them is divisible by {p}, the other one is as well, and it follows that neither can be divisible by {p}. What can we do with this information? Well, one of the basic results in number theory states that if an integer {a} is not divisible by a prime {p}, there is another integer {b} such that {a b \equiv 1} (mod {p}), just like how {a (1 / a) = 1}. So there are integers {u} and {v} such that {x' u \equiv 1} (mod {p}) and {y' v \equiv 1} (mod {p}). So we can take the congruence {(x')^2 \equiv -(y')^2} and multiply both sides by {v^2}, giving {(x')^2 v^2 \equiv -(y')^2 v^2} (mod {p}), i.e. {(x' v)^2 \equiv -(y' v)^2} (mod {p}), i.e. {(x' v)^2 \equiv -1} (mod {p}). Therefore, {-1} is a quadratic residue modulo {p}, and that means {p \equiv 1} (mod 4), so {p} is 2-square. \Box

So, we have, at last, found the solution to the problem. The 2-square integers are those with prime factorisations in which every prime number with even multiplicity is odd-even (i.e. congruent to 1 modulo 4).

I’ll leave you with this to think about. We can generalise the idea of a 2-square integer and define an integer to be ‘{k}-square’ (where {k} is a positive integer) if and only if it can be expressed as a sum of exactly {k} squares. Now that we know which integers are 1-square and 2-square, it’s natural to wonder which integers are 3-square, 4-square, 5-square and so on. If this post has interested you (and it probably has if you’ve got this far), I encourage you to try and investigate which integers are 4-square yourself using the same sort of method I used in this post. You’ll find that the pattern for 4-square integers is pretty easy to spot.

Footnotes

  1. ^ The multiplicity of a prime number {p} in the prime factorisation of a positive integer {n} is defined as the greatest integer {\alpha} such that {n} is divisible by {p^{\alpha}}. If you think of the prime factorisation of {n} as a bag (a set which can contain multiple elements) of prime numbers, the multiplicity of each prime number {p} in the prime factorisation of {n} can be thought of the number of instances of {p} which the bag contains. If we write {n} as its prime factorisation {p_1^{\alpha_1} p_2^{\alpha_2} \dotsm p_k^{\alpha_k}}, then for every positive integer {i} such that {i \le k}, {\alpha_i} is the multiplicity of {p_i} in the prime factorisation of {n}.
Advertisements

Set isomorphisms

There doesn’t seem to be any standard adjective to describe the property of a pair of sets having the same cardinality. Some authors use ‘equivalent’, but this adjective could be used for any equivalence relation; it might be useful to reserve it for the equivalence relation which is currently most prominent in the context. Other use ‘equinumerous’, and I guess that’s fine. There’s a natural counterpart of this adjective for the property of one set having cardinality less than or equal to the other, too, viz. ‘subnumerous’, although I don’t know if anyone actually uses this term. Anyway, if I was going to write a textbook on set theory, I think I would go for the adjective ‘isomorphic’. Consider the definition of a structure in abstract algebra, which can be stated as follows.

Definition 1. Structures are ordered tuples of the form (A, R_0, R_1, \dotsc, R_{n - 1}), where A is a set called the underlying set of the structure, and for every natural number i such that i < n, there is a natural number k, called the arity of R_i, such that R_i is a subset of the Cartesian product A^k. R_i is called the ith relation of the structure.

A set can be seen as a degenerate structure with no relations. Now, consider the definition of a homomorphism:

Definition 2. For every pair of structures (A, R_0, R_1, \dotsc, R_{n - 1}) and (B, S_0, S_1, \dotsc, S_{n - 1}) with the same number of relations and with the property that for every natural number i such that i < n, R_i and S_i have the same arity, homomorphisms are mappings f from A to B such that for every natural number i such that i < n and every k-tuple of members a_0, a_1, … and a_{k - 1} of A (where k is the common arity of R_i and S_i), (a_1, a_2, \dotsc, a_{k - 1}) \in R_i if and only if (f(a_1), f(a_2), \dotsc, f_(a_{k - 1})) \in S_i.

In the trivial case where the two structures have no relations, a homomorphism between the structures is simply a function between their underlying sets. Accordingly, an isomorphism between the structures is simply a bijection between their underlying sets. Since we say that two structures are isomorphic if and only if there is an isomorphism between them, two sets are isomorphic if and only if there is a bijection between them, i.e. they have the same cardinality. That’s the reasoning behind using the adjective ‘isomorphic’. Since injective homomorphisms are sometimes called monomorphisms, and surjective homomorphisms are sometimes called epimorphisms, for every pair of sets A and B, we can also say that A is monomorphic to B if and only if |A| \le |B| and A is epimorphic to B if and only if |B| \le |A|. The only problem I can see with this terminology is that since structures are often referred to by their underlying set when it is clear from context which relations should be taken as part of the structure; for example, people might speak of the ring \mathbb Z of integers and the ring \mathbb Q of rational numbers. So the statement that \mathbb Z and \mathbb Q are isomorphic might be taken to mean the rings \mathbb Z and \mathbb Q are isomorphic, which is false. Of course, this problem can be resolved just by saying ‘the sets \mathbb Z and \mathbb Q are isomorphic’ or ‘\mathbb Z and \mathbb Q are isomorphic as sets’.

0.999…, 0.000…1 and other numbers with infinitely many digits

The `number’ 0.000…1 sometimes comes up in the discussion of the infamous question of whether 0.999… = 1 (in case you didn’t know, 0.999… is equal to 11). For every natural number {n}, {1 - 0.99\dotso9} (where there are {n} 9s in the subtracted number) is 0.00…1 (in which there are {n} 0s, including the one before the decimal point), so naturally, people generalise the pattern to the case where there are infinitely many {n}s (although you can’t always generalise in this way), and conclude that {1 - 0.999\dotso} (where there are infinitely many 9s in the second number) is 0.000…1 (in which there are infinitely many 0s), which seems to be a positive number, not 0 as it should be if {0.999\dotso = 1}.

Even though it’s wrong, this argument is quite an intelligent one. Its problem is that it relies on a lack of understanding of exactly how decimal notation works, which is understandable, given that the subtleties of decimal notation are rarely given a proper explanation in maths education. It’s easy to understand how decimal notation works when there are only finitely many digits, and I think most people understand it on some level (though they may not be able to articulate it in the way I’m about to articulate it). Given a string of digits with a decimal point in between one pair of consecutive digits, one assigns an index to each digit {d} by looking at its position relative to the decimal point. If {d} is in front of the decimal point, the index is the number of digits between {d} and the decimal point, with {d} not being counted among these digits, and if {d} is behind the decimal point, the index is the negation of the number of digits between {d} and the decimal point, with {d} being counted among these digits. In other words, the digit in front of the decimal point has index 0, the digit behind it has index {-1}, and the other indices continue the sequence. For example, in 17.25, the digit 1 has index 1, the digit 7 has index 0, the digit 2 has index {-1} and the digit 5 has index {-2}. Then, one calculates the place value associated with {d} as {d \times 10^n}, where {n} is the index of {d}. The number denoted by the string of digits is simply the sum of all the place values. For example, 17.25 denotes the number {1 \times 10^1 + 7 \times 10^0 + 2 \times 10^{-1} + 5 \times 10^{-2}}. The number denoted is always non-negative, but we can denote its negation simply by putting a minus sign in front.

Decimal notation with finitely many digits is not very powerful. Every integer can be denoted in this way, but not every rational number (rational numbers are numbers that can be expressed as fractions whose numerators and denominators are both integers, in case you didn’t know). For example, {\frac 1 3} cannot be denoted in this way. In fact, the only rational numbers which can be denoted in this way are rational numbers whose denominator is not divisible by any prime number other than 2 or 5 (the prime factors of 10). So, in order to denote every rational number, you need infinitely many digits.

It’s in this case that the subtleties arise. Going from finite to infinite always brings a lot of complications with it. First of all, in order to be able to assign indices to each digit in the same way, we’re going to have to require that, even though there are infinitely many digits, there are finitely many digits between each digit and the decimal point. After all, if there are infinitely many digits between some digit {d} and the decimal point, then the number of digits between {d} and the decimal point does not exist, since infinity isn’t a number. Therefore, we can see a problem with the argument above right away. There are infinitely many digits between the digit 1 in 0.000…1 and the decimal point, so it is impossible to assign an index to this digit, and for this reason, 0.000…1 is not an example of well-formed decimal notation. There are ways to define alternative forms of decimal notation in which 0.000…1 is well-formed (and usually denotes some kind of infinitesimal number, i.e. one which is positive but smaller than every positive rational number), but I’m not going to go into them here. Certainly, none of these ways can be considered standard: if you just start talking about 0.000…1 to a mathematician without defining it first, they will ask you to clarify.

If there are finitely many digits between each digit and the decimal point, we can associate each of the infinitely many digits with a place value in the same way as before. So then the number denoted by the string of digits is just the sum of all these place values, right? Well, not quite. Remember, we have infinitely many digits, hence infinitely many place values, so this will be a sum of infinitely many terms! The ordinary definition of a sum is that it is the value obtained by letting the value be 0 at first, and then adding each term in the sum to the value, one at a time, until every term has been added. But if there are infinitely many terms, then no matter how long this process continues, terms will remain which still have not been added. So it doesn’t make sense to speak of sums of infinitely many terms, if we use this definition.

But just because this definition doesn’t work, that doesn’t mean we can’t define sums of infinitely many terms in some other way. For example, it certainly seems sensible to say that the infinite sum {0 + 0 + 0 + \dotsb} is 0, because the sum of any finite number of 0s is 0, so we can generalise this to the case where there are infinitely many 0s. In general, if only finitely many terms in an infinite sum are not equal to 0, we can just disregard the 0s, since they should have no effect on the value of the sum, and say that the infinite sum is equal to the finite sum consisting of its terms which are not equal to 0. For example, {1 + 2 + 3 + 0 + 0 + 0 + \dotsb = 1 + 2 + 3 = 6}.

These aren’t very interesting examples of infinite sums, since they are basically finite sums in disguise. It is possible to assign values to more interesting infinite sums. To see how this is done, first let me note that one way of stating the reasoning why an infinite sum of the form {x_1 + x_2 + \dotsb + x_n + 0 + 0 + 0 + \dotsb} is equal to the finite sum {x_1 + x_2 + \dotsb + x_n} is this: if we start with the value of 0, and add each term of the infinite sum in order, then, after we add {x_n}, the remaining additions do not change the value at all (since we are just adding 0). Therefore, the value stays at {x_1 + x_2 + \dotsb + x_n}, even if infinitely many additional additions are made, because these additional additions have no effect.

Now, if there are infinitely many terms which are not 0 it is impossible for the value of the incomplete sum to stay equal to a particular number after a finite number of terms are added. But it may be possible for the value to stay within a certain distance of a particular number after a finite number of terms are added, and if it does, then we can conclude that the distance of the sum of all the terms from that particular number must be less than or equal to this distance (by generalising to the infinite case). Remember, the distance between two numbers {x} and {y} is either {x - y} or {y - x}, whichever one is positive (or, if {x - y} and {y - x} are both 0, so that neither is positive, the distance is 0).

An example will help to make this clear. Consider the number denoted by 0.999\textellipsis: {\frac 9 {10} + \frac 9 {100} + \frac 9 {1000} + \dotsb}. For every positive integer {n}, the sum of the first {n} terms in this sum is {\frac 9 {10} + \frac 9 {100} + \dotsb + \frac 9 {10^n}}. What is the distance of this finite sum from 1? The sum is clearly less than 1, because it can be written (using the rule for addition of fractions) as {\frac {10^n - 1} {10^n}}, i.e. {1 - \frac 1 {10^n}}. So the distance is the difference {1 - (1 - \frac 1 {10^n})}, which is just {\frac 1 {10^n}}. Now, for every positive integer {n'} such that {n < n'}, it can be seen in the same way that the distance of the sum of the first {n'} terms from 1 is {\frac 1 {10^{n'}}}, which is less than {\frac 1 {10^n}} since {n < n'}. That is, no matter how many terms are added to the incomplete sum after the {n}th term is added, the distance stays less than {\frac 1 {10^n}}. Therefore, we can conclude that the distance of 0.999… from 1 is less than {\frac 1 {10^n}} for every positive integer {n}. In fact, it is less than every positive rational number, since every positive rational number can be expressed as a fraction {\frac a b}, where {a} and {b} are positive integers, and {\frac 1 {10^b} < \frac a b} (since {b < 10^b}).

Another example is {\frac 1 2 + \frac 1 4 + \frac 1 8 + \dotsb}, the number denoted by 0.111… in binary. By similar reasoning as above, it can be shown that its distance from 1 is less than every positive rational number. Or, you can just look at this neat diagram.

Take a square, divide it in half, divide one of the halves in half (producing two quarters of the original square), divide one of these quarters into half too, and so on, and we see that a whole unit square can be filled by squares and rectangles whose areas are reciprocals of powers of 2 greater than 1.

In cases like this, where we can show that for some rational number {L}, the distance of the infinite sum from {L} is less than {\varepsilon} for every positive rational number {\varepsilon}, it seems natural to assign the infinite sum the value of {L}, because, if the value is a rational number, its distance from {L} must also a rational number, and this distance cannot be positive, otherwise it would have to be less than itself. In other words, {L} is the only rational number which the sum could be, and therefore, {L} is a very natural value to assign to the sum. In the case of 0.999…, {L} is 1, and that’s why we assign 0.999… the value 1. Of course, we could assign values to infinite sums in a different way, but this is the standard way we do it in mathematics. I’m not familiar with any other way of assigning values to infinite sums that is more natural and convenient than this one.

There are a few more remarks I’d like to say about decimal notation. First of all, there are strings of digits such that the sum of the place values cannot be assigned a rational number as a value in this way, since there is provably no rational number {L} such that for every positive rational number {\varepsilon}, the distance of the incomplete sum from {L} stays less than {\varepsilon} once a certain finite number of terms have been added. In fact, the only strings which can be assigned a rational number as a value in this way are those such that the digit at every index greater than some integer {n} is 0 and the other digits consist of a finite sequence repeated indefinitely (for example, 0.333… (\frac 1 3 ) or 0.142857142857142857… (\frac 1 7 )). However, as long as the digit at every index greater than some integer {n} is 0, a similar but slightly weaker condition is satisfied: for every positive rational number {\varepsilon}, once a certain finite number of terms have been added, if one picks any pair of incomplete sums which can be formed by adding on more terms, the distance between these two sums is less than {\varepsilon}. It’s as if the incomplete sums are approaching a number, but that number is not rational. We can therefore define a whole set of new numbers which are called the irrational numbers, with each irrational number being identified with one of these infinite sums which cannot be defined as a rational number. The numbers in the class that is obtained by adding these irrational numbers to the rational numbers are called real numbers. Although I won’t go into the details, it is possible to define how to add and multiply real numbers, by referring to the sequences that define them, in such a way that all the expected properties of addition and multiplication hold, so real numbers do deserve to be called numbers.

Footnotes

  1. ^ But some people just won’t accept it…

Definitions of finiteness

Perhaps the most intuitive definition of finiteness as a property of sets is that a set {A} is finite if and only if its cardinality is a natural number. Unfortunately, the most intuitive definition of the natural numbers, at least to me, is that the natural numbers are the cardinalities of finite sets. Therefore, in order to define both of these notions properly, one of these definitions must be altered. I think the most natural one to alter is the definition of finiteness, since, at least to me, finiteness seems like a more fundamental notion than the notion of the natural numbers.

The alternative definition which I’m most satisfied with is Kuratowski’s, given below. The essential idea behind it is that finite sets are sets that can be formed from the empty set by making a finite number of adjoinments (i.e. additions of a single object). The formal statement may seem a little peculiar if you haven’t seen this kind of definition before.

Definition 1 For every set {A}, {A} is Kuratowski finite if and only if it is a member of every family {\mathcal F} of sets such that {\emptyset \in \mathcal F} and for every member {B} of {\mathcal F} and every object {x}, {B \cup \{x\} \in \mathcal F}.

The key thing to notice here is that it immediately follows from the definition that a variant of mathematical induction can be performed on Kuratowski finite sets. Suppose you wish to prove that every Kuratowski finite set has a certain property. Well, if you let {\mathcal F} be the family of all sets with the property, you can see from the definition that one way to do this is to show that {\emptyset} has the property (i.e. {\emptyset \in \mathcal F}) and that if you adjoin an object to a set with the property, it will still have the property (i.e. for every member {B} of {\mathcal F} and every object {x}, {B \cup \{x\} \in \mathcal F}). Definitions similar to the one above turn up in quite a few places in mathematics, and they are always associated with a principle of induction. For example, consider the definition of the subgroup generated by a subset {H} of a group {G}. The intuitive definition of this subgroup is that it is the set of all objects of the form {a_1 a_2 \dotsm a_n}, where {n} is a natural number and {a_1}, {a_2}, … and {a_n} are members of {H} or inverses of members of {H}. However, this definition can be awkward to work with and it is complex to state as a logical formula (it involves a quantification over all finite sets). Alternatively, one can define this subgroup as the intersection of all subgroups {H'} of {G} such that {H \subseteq H'}. The corresponding principle of induction is that if every member of {H} has a certain property, and the set of all members of the group with the property is a subgroup of {G}, then every member of the subgroup generated by {H} has the property too.

I should note that this definition is not possible in ZF: {\mathcal F} must be allowed to range over proper classes as well as sets, because for every possible value of {\mathcal F}, we have that for every object {x}, {\emptyset \cup \{x\} = \{x\} \in \mathcal F}, so {\mathcal F} is a superclass of the class of all singletons, which is proper by the axiom of replacement, since {x \in V \mapsto \{x\}} is an injection. Quantifiers can only range over sets in ZF. However, a minor amendment makes the definition possible in ZF: we can just require {x} to be a member of {A}. It can be proven that this amended definition (definition 2) is equivalent to the definition as stated above (definition 1).

Proof: Suppose {A} is a finite set by definition 2 and {\mathcal F} is a family of sets such that {\emptyset \in \mathcal F} and for every member {B} of {\mathcal F} and every object {x}, {B \cup \{x\} \in \mathcal F}. Every member {x} of {A} is an object, so for every member {B} of {\mathcal F}, {B \cup \{x\} \in \mathcal F}, which means {A} is a member of {\mathcal F} (by definition 2). Therefore, {A} is finite by definition 1 as well.

As for the converse, well, {\emptyset} is clearly finite by definition 2. Now, suppose {A} is a finite set by definition 2, {x} is an object and {\mathcal F} is a family of sets such that {\emptyset \in \mathcal F} and for every member {B} of {\mathcal F} and every member {y} of {A \cup \{x\}}, {B \cup \{y\} \in \mathcal F}. Since {A} is finite by definition 2, {A \in \mathcal F}, and since {x} is a member of {A \cup \{x\}}, that means {A \cup \{x\} \in \mathcal F}. Therefore, {A \cup \{x\}} is finite by definition 2 as well. It follows by the principle of induction on finite sets by definition 1 that every set which is finite by definition 1 is finite by definition 2 as well. \Box

The reason I’ve chosen definition 1 is that it is easier to see the idea behind the definition.

Of course, we must ensure that Kuratowski finiteness is actually equivalent to ordinary finiteness. Since we can induct over natural numbers and we can induct over Kuratowski finite sets, the proof is straightforward.

Proof: {|\emptyset| = 0 \in \mathbb N}. Suppose {A} is a finite set, {|A| = n} and {x} is an object. What is {|A \cup \{x\}|}? Well, there is a bijection {f} from {A} to {\{0, 1, \dotsc, n - 1\}}, and it is easy to see that the extension {g} of {f} to {A \cup \{x\}} such that {g(x) = n} is a bijection onto {\{0, 1, \dotsc, n\}}. Therefore, {|A \cup \{x\}|} is {n + 1}. This completes the proof that the cardinality of every Kuratowski finite set is a natural number, i.e., every Kuratowski finite set is finite.

The only set whose cardinality is 0 is {\emptyset}, which is Kuratowski finite. Suppose {n} is a natural number, every set whose cardinality is {n} is Kuratowski finite, {A} is a set and {|A| = n + 1}. Then there is a bijection {f} from {A} to {\{0, 1, \dotsc, n\}}. Let {x} be {f^{-1}(n)}. It is easy to see that the restriction {g} of {f} to {A \setminus \{x\}} is a bijection onto {\{0, 1, \dotsc, n - 1\}}, so {|A \setminus \{x\}| = n}. Therefore, {A \setminus \{x\}} is Kuratowski finite. It follows that {(A \setminus \{x\}) \cup \{x\} = A} is Kuratowski finite as well. This completes the proof that every finite set is Kuratowski finite. \Box

The other commonly-seen definition of finiteness is due to Dedekind. The main advantage of this definition is that it is somewhat simpler to state than Kuratowski’s.

Definition 2 For every set {A}, {A} is Dedekind finite if and only if for every proper subset {B} of {A}, {|B| < |A|}.

It is probably possible to prove that every finite set is Dedekind finite by induction, but an easer way to prove this is to prove the contrapositive, that every Dedekind infinite set is infinite. This proof uses the fact that for every set {A}, {A} is infinite if {\aleph_0 \le |A|} ({\aleph_0} is the cardinality of {\mathbb N}), since {n < \aleph_0} for every natural number {n}, so {|A| \ne n}.

Proof: Suppose {A} is a Dedekind infinite set, so there is a proper subset {B} of {A} such that {|A| = |B|}. Since {B \subset A}, there is a member {a} of {B \setminus A}; since {|A| = |B|}, there is a bijection {f} from {A} to {B}. Consider the infinite sequence {a}, {f(a)}, {f^2(a)}, … We can show by induction that this sequence is an injection, so {\aleph_0 \le |A|}, and therefore, since {\mathbb N} is infinite, {A} is infinite.

  • For every positive integer {n}, {f^n(a) = f(f^{n - 1}(a))}, so {f^n(a) \in B}, which means {a \ne f^n(a)} since {a \not \in B}.
  • Suppose {m} is an integer and for every integer {n} such that {m < n}, {f^m(a) \ne f^n(a)}. Then for every integer {n} such that {m + 1 < n}, {m < n - 1}, so {f^m(a) \ne f^{n - 1}(a)}, which means {f^{m + 1} \ne f^n(a)}, since {f} is bijective.

\Box

Now, if the converse of the fact we used holds, i.e. for every infinite set {A}, {\aleph_0 \le |A|}, then we can prove the converse: that every infinite set is Dedekind infinite. The proof relies on the fact that {\mathbb N} has the same cardinality as its proper subset {\{1, 2, 3, \dotsc\}}.

Proof: Suppose {A} is an infinite set. Then {\aleph_0 \le |A|}, so there is an injection {f} from {\mathbb N} to {A}. Let {B} be the image of {f} under {\{1, 2, 3, \dotsc\}}. Then {\aleph_0 \le |B|}, so {B} is infinite, but {B} is a proper subset of {A}, since {f(0) \not \in B}. \Box

The problem with Dedekind’s definition is that one runs into trouble when one tries to prove that for every infinite set {A}, {\aleph_0 \le |A|}, in ZF. The proof can be done easily using the axiom of choice.

Proof: Suppose {A} is an infinite set. If the axiom of choice holds, there is a function {f} on {\mathcal P(A) \setminus \{\emptyset\}} such that for every non-empty subset {B} of {A}, {f(B) \in B}. Let {g} be the sequence such that for every natural number {n}, {g(n) = f(A \setminus \{g(0), g(1), \dotsc, g(n - 1)\})} (this is always defined, because if {A \setminus \{g(0), g(1), \dotsc, g(n - 1)\} = \emptyset}, {A \subseteq \{g(0), g(1), \dotsc, g(n - 1)\}}, so {|A| \le n}, which means {A} is finite). It is clear that {g} is an injection into {A}, so {\aleph_0 \le |A|}. \Box

This doesn’t necessarily mean that the statement is independent of ZF, but certainly it suggests the possibility, since there is no obvious way to amend this proof so that it does not require the axiom of choice.

Perhaps there is another way we could prove that every infinite set is Dedekind infinite? Unfortunately there isn’t, because if every finite set is Dedekind infinite, then for every infinite set {A}, there is a proper subset {B} of {A} such that {|A| = |B|}, so there is a member of {A \setminus B}, and {B} is infinite too, so there is a proper subset {C} of {B} such that {|B| = |C|}, so there is a member of {B \setminus C}, and so on. This allows us to construct an injective sequence of members of {A}, so {\aleph_0 \le |A|}.

Ideally, I would finish off here with a proof that the statement is independent. Unfortunately, I don’t know how to do this. It may be possible to derive the axiom of choice from the fact that every infinite set is Dedekind infinite, in which case this fact cannot be a theorem of ZF (otherwise the axiom of choice would be a theorem of ZF, and the axiom of choice is known to be independent of ZF). On the other hand, this may not be true, in which case I’m not sure what the best way to prove independence would be. The most straightforward way would be to construct a model of ZF in which there are Dedekind finite infinite sets, but I don’t know how you might go about doing that.

Some thoughts on generating random members of infinite sets

Here’s a challenge for you. Think of a random natural number (i.e. one of the numbers in the sequence 0, 1, 2, …). If you think about this, this is a rather difficult task, because of the fact that there are infinitely many natural numbers, but if the choice is random, then every natural number should have a non-zero chance of being chosen. In fact, I can give you an argument that it is impossible. Assuming there is a positive lower bound on the length of time it takes to think of a natural number, then, since there are only finitely many minds in the universe, there are only finitely many natural numbers which have ever been thought of. And the random natural number you thought of must have been one of these natural numbers. So there are infinitely many natural numbers—the ones which have never been thought of—which you were never going to choose. Their chance of being chosen was zero, so the choice was not truly random. In fact, it was non-random in a pretty serious way: not just one or two natural numbers had zero chance of being chosen, but infinitely many natural numbers had zero chance of being chosen.

It is possible to generate a random natural number using an algorithm, provided you have some way of generating random natural numbers up to some upper bound. For example, here’s one algorithm you could use.

  1. Let n be 0.
  2. Generate a random natural number which is either 0 or 1, and let x be this number.
  3. If x is 0, stop and return n. Otherwise, add 1 to n and repeat from step 2.

For every natural number n, this algorithm will generate n if and only if 1 is generated in step 2 n times, so the probability that it will generate n is 2−(n + 1) (since the probability that 1 will be generated in step 2 is 1/2).

Of course, you could ‘think of’ a natural number by performing this algorithm in your head, which kind of invalidates the argument above. We can rescue the argument by noting that, if you perform the algorithm in your head, you will need time to perform each step. Therefore, if you are required to think of a natural number within a given length of time, you will only have time to repeat from step 2 a finite number of times. Let’s call this number of times n. Then there will not be enough time for you to generate n + 1, or any greater natural number, using the algorithm.

Now, although this algorithm can generate every natural number, it doesn’t generate them all with the same probability. 0 is twice as likely to be generated as 1, which in turn is twice as likely to be generated as 2, and so on. In fact, it is not actually possible to devise an algorithm that generates every natural number with the same probability. Suppose you have an algorithm and for every natural number n, the probability that the algorithm will generate n upon execution is Pn. Then the infinite sum Σn ∈ ℕ Pn has to be 1, because the algorithm always generates some natural number. But if the value of Pn does not depend on the value of n, this is impossible: the sum evaluates as 0 if the value of Pn is always 0, and the sum evaluates as ∞ if the value of Pn is always a real number greater than 0. On the other hand, we do have that Σn ∈ ℕ 2−(n + 1) = 1, so that’s why the algorithm above does work.