Tag Archives: maths

A controversial maths question

The following question which appeared on an Edexcel GCSE maths paper (for readers outside of the UK, this is a test that would be taken at the end of high school, at the age of 15 or 16) has gone viral, with many students complaining about its difficulty.

There are n sweets in a bag. 6 of the sweets are orange. The rest of the sweets are yellow.

Hannah takes at random a sweet from the bag. She eats the sweet.

Hannah then takes at random another sweet from the bag. She eats the sweet.

The probability that Hannah eats two orange sweets is 1/3.

Show that n2n − 90 = 0.

I sympathise with the students complaining about this question’s difficulty. I don’t think it is an easy question for GCSE students. I found it easy to answer, but I’m studying for a Maths degree.

However, there are different kinds of difficulty. Sometimes a question is difficult because it involves the application of knowledge which is complex and/or less prominently featured in the syllabus, which makes acquiring this knowledge and keeping it in your head more difficult, and also makes it harder to realise when this knowledge needs to be applied. This was not one of those questions. The knowledge required to complete this question was very basic stuff which I expect most of the students complaining about it already knew. As far as I can tell, the following mathematical knowledge is required to answer this question:

  • If m of some n objects have a certain property then the probability that one of these n objects, picked at random, will have that certain property is m/n.
  • The combined probability of two events with probabilities x and y is xy (as long as the events are independent of each other, although you could get away with not understanding that part for this question).
  • Basic algebraic manipulation techniques, so that you can see that the equation (6/n)(5/(n − 1)) = 1/3
  • can be rearranged into n2n − 90 = 0. All this takes is knowing how to multiply the two fractions together on the left-hand side (new numerator is the product of the old numerators, new denominator is the product of the old denominators), then taking the reciprocals of both sides and bringing the 30 on the left-hand side over to the right-hand side (there are different ways you could describe this process).

Certainly for some of the students, the knowledge was what was at issue here, but I think the main challenge in this question was the successful application of this knowledge. Most of the students knew the three things listed above, but they failed to see that this knowledge could be used to answer the question.

Unfortunately for the students, failure to apply knowledge successfully is in many ways a much more serious failure than failure to possess the appropriate knowledge. If you fail due to lack of knowledge, there is an obvious step you can take to prevent further failure in the same situation: acquire the knowledge that you lack, by having somebody or something teach you it. Crucially, when you successfully learn something you not only end up knowing it, but you also know that you know it.

If, on the other hand, your problem is that you failed to apply your knowledge successfully, then it is much less clear what the next step you should take is. And, also, you never know whether you are capable of applying your knowledge in every situation where it might be useful, because there are usually a whole lot of different situations where it might be useful and it is impossible for you to be familiar with them all. This is why cramming for tests is not a good idea. Cramming may be the most efficient way to obtain the knowledge you need for a test (I don’t know whether this is actually true, but I don’t think it’s impossible) but it certainly won’t help you with applying your knowledge.

There is one relatively straightforward way to become better at applying your knowledge. If you remember how you applied a certain piece of knowledge in a previous situation, then, if the exact same situation occurs again, you won’t have any trouble, because you will just apply the knowledge in the same way again. In a similar way, the more similar the new situation is to the old one which you remember how to deal with, the more likely you are to be able to successfully apply your knowledge.

But, in a way, this is a trick. I am about to get a bit esoteric here, but bear with me. Let’s say your mind has two “departments” which work together to solve a problem. One of them is the Department of Knowledge-Recall, or the DKR for short, and the other is the Department of Knowledge-Application, or the DKA for short. I think that when you carry out the strategy above, what is happening in your mind is that the DKA is pre-emptively passing on the tough part of the work to the DKR for them to deal with. If you remember how you used a certain piece of knowledge (let’s call it X) to deal with a previous situation, then that memory, in itself, has become knowledge (let’s call it Y). The DKR has worked so that Y is easily recalled. When the situation comes up again, the DKA’s task is really easy. It’s looking at the most prominent pieces of knowledge that the DKR has made you recall, and it notices that Y has a kind of direct link to this situation. If the DKR hadn’t worked to make Y a piece of easily-recalled knowledge for you, the DKA would have to do more work, sifting through the knowledge that the DKR has made you recall, possibly asking the DKR for more, inspecting them more closely for any connection to the situation at hand.

There’s a good chance the above paragraph made no sense. But basically, I’m trying to say that familiarising yourself with the situations where you need to apply your knowledge works because it is a process of acquiring specialised knowledge about where you need to apply your knowledge—it is not truly applying your knowledge. But probably the more important point to make is that this process is inefficient. It would be much simpler if you could simply apply your knowledge to unfamiliar situations straight away. And it’s often impossible to familiarise yourself with every conceivable situation, because the range of conceivable situations is so vast.

Students taking tests try to carry out the familiarisation process by doing past papers. This is often effective because the range of questions that can be on a paper is often quite limited, so it really is possible to familiarise yourself with every situation where you might need to apply your knowledge. But this isn’t a good thing!

As I argued (somewhat incoherently) above, the skill of being able to apply your knowledge is only separate from the skill of being able to recall your knowledge if it includes the sub-skill of being able to apply your knowledge to unfamiliar situations. That particular sub-skill is one which is hard to improve. Arguably, this sub-skill is what we mean by “intelligence” (the word “intelligence” is used to refer to a lot of different things, but this might be the most prototypical thing referred to as “intelligence”). It’s certainly possible to get better at this sub-skill, but I don’t know if it is possible to get better through conscious effort.

But intelligence (by which I mean, the ability to apply your knowledge to unfamiliar situations) is often the main thing the tests that students take are trying to assess. After all, there are few vocations, if any, that a person cannot do better at by being more intelligent. You don’t always need intelligence to get to an acceptable level, sure, but intelligence always helps. The purpose of the GCSEs is not just to find out which students are more or less intelligent—they are also supposed to increase the amount of people who have useful knowledge—but it is one their main purposes.

That’s why I don’t think this question was unfair, as some students have been saying. Yes, it was quite different from anything that was on the past papers. But that was the whole point, to test people’s ability to apply their knowledge in unfamiliar situations. It is natural to be disappointed if you couldn’t answer the question and to complain about your disappointment, but saying that it was unfair is a step too far. It did what it was meant to do.

I think there is possibly a genuine problem here, though. If questions like this which strongly tested intelligence (as defined above) were usual on the GCSE papers, you wouldn’t expect this one to become such a topic of interest. Perhaps the GCSEs have suffered in the past from a lack of questions like this, which has affected students’ expectations of what these tests should be like. I should be clear that I have no idea whether this is true. I took my GCSEs in 2011, but I don’t remember what the questions were like in this respect.

PS: I’ve seen some people saying “this is easy, the answer is 10”. These people are making fools of themselves, because the answer is not 10, in fact that doesn’t even make any sense. The question is asking for a proof. (“Show that…”) It seems these people have just seen the quadratic equation at the end and assumed that the question was “Solve this equation” without actually reading it. Perhaps this is evidence that the question really isn’t easy. Or maybe these people just aren’t thinking about it very much.

Advertisements

Large and small sets

For some infinite sets {S = \{x_1, x_2, x_3, \dotsc\}} of positive integers, the sum of the reciprocals of the members of {S}, i.e.

\displaystyle \frac 1 {x_1} + \frac 1 {x_2} + \frac 1 {x_3} + \dotsb,

diverges. Such sets are said to be large. For example, {\mathbb N}, the set of all the positive integers, is large, because the harmonic series diverges. On the other hand, for some infinite sets {S} the sum of the reciprocals of the members of {S} converges instead; such sets are said to be small. For example, for every integer {p} greater than 1, the set of all the positive integral {p}th powers is small, because the series

\displaystyle 1 + \frac 1 {2^p} + \frac 1 {3^p} + \dotsb

converges. There is thus an interesting distinction between two different sizes of infinite sets of positive integers. Let’s have a look at some other infinite sets of positive integers and see whether they are large or small.

For every positive integer {m}, the set of all the positive integral multiples of {m} is easily seen to be large, due to the fact that the harmonic series diverges, and dividing the harmonic series by {m} yields the series

\displaystyle \frac 1 m + \frac 1 {2 m} + \frac 1 {3 m} + \dotsb.

In fact, this proof can easily be generalised to show that for every set {S} of positive integers and every positive integer {m}, {\{m n : n \in S\}} is large if and only if {S} is large.

What about the set {S} of all the positive integers which are not multiples of {m}? The sum of the reciprocals of the members of {S} can be written as the difference

\displaystyle \left( 1 + \frac 1 2 + \frac 1 3 + \dotsb \right) - \left( \frac 1 m + \frac 1 {2 m} + \frac 1 {3 m} + \dotsb \right).

The minuend and subtrahend of this difference are both equal to {\infty}, so the sum of the reciprocals of the members of {S} cannot be immediately seen from writing it like this. However, if we take out the factor of {1 / m} from the subtrahend, we get

\displaystyle \left( 1 + \frac 1 2 + \frac 1 3 + \dotsb \right) - \frac 1 m \left( 1 + \frac 1 2 + \frac 1 3 + \dotsb \right),

and then we can remove the common factor of {1 + 1 / 2 + 1 / 3 + \dotsb} to get

\displaystyle \left( 1 + \frac 1 2 + \frac 1 3 + \dotsb \right) \left( 1 - \frac 1 m \right).

It is easy to see that the value of this expression is also {\infty}. So {S} is large, as well.

In fact, for every finite set of pairwise coprime positive integers {p_1}, {p_2}, … and {p_{\ell}}, the set {S} of all the positive integers which are not multiples of any of these integers is large, too. This is because, if we let {H} be the value of the harmonic series, then, using the inclusion-exclusion principle, the sum of the reciprocals of the members of {S} can be written as

\displaystyle H - \sum_{k} \frac H k + \sum_{k_1, k_2} \frac H {k_1 k_2} - \dotsb + \frac {(-1)^{\ell} H} {k_1 k_2 \dotsm k_{\ell}},

in which, for every integer j such that 1 \le j \le \ell, the jth term is a sum over all the finite sets of j integers k such that 1 \le k \le \ell. By factorising this expression, we can rewrite it as

\displaystyle H \left( 1 - \frac 1 {p_1} \right) \left( 1 - \frac 1 {p_2} \right) \dotsm \left( 1 - \frac 1 {p_{\ell}} \right),

and it is easy to see that the value of this expression is {\infty}.

Generalising from this, one might expect that if we let {P = \{p_1, p_2, p_3, \dotsc\}} be the prime numbers, so that the members of {P} are pairwise coprime and every integer is a multiple of a member of {P}, then

\displaystyle   H \left( 1 - \frac 1 {p_1} \right) \left( 1 - \frac 1 {p_2} \right) \left( 1 - \frac 1 {p_3} \right) \dotsm = 1. \ \ \ \ \ (1)

To prove this rigorously, all that is needed is to note that for every positive integer {n},

\displaystyle H \left( 1 - \frac 1 {p_1} \right) \left( 1 - \frac 1 {p_2} \right) \dotsm \left( 1 - \frac 1 {p_n} \right) = 1 + K,

where {K} is the sum of the reciprocals of the integers greater than 1 which are not multiples of any of the first {n} prime numbers. The least such integer is {p_{n + 1}}, so {K} is smaller than the sum of the reciprocals of the integers greater than or equal to {p_{n + 1}}. Therefore,

\displaystyle H \left( 1 - \frac 1 {p_1} \right) \left( 1 - \frac 1 {p_2} \right) \dotsm \left( 1 - \frac 1 {p_n} \right) < 1 + \frac 1 {p_{n + 1}} + \frac 1 {p_{n + 1} + 1} + \frac 1 {p_{n + 1} + 2} + \dotsb.

As {n} tends towards positive infinity, the sum of the reciprocals of the integers greater than or equal to {p_{n + 1}} tends towards 0, because it can be written as {H - H_{p_{n + 1} - 1}}, which is less than {H - H_n}, and by definition {H_n} tends towards {H} as {n} tends towards positive infinity. So the right-hand side of the inequality above tends towards 1, and an application of the squeeze theorem completes the proof.

Rearranging (1) to solve for {H} yields

\displaystyle H = \left( \frac 1 {1 - 1 / p_1} \right) \left( \frac 1 {1 - 1 / p_2} \right) \left( \frac 1 {1 - 1 / p_3} \right) \dotsm,

so we now have a way of expressing the harmonic series as an infinite product. This is quite interesting in its own right, but it can also be used to prove a rather surprising result. By taking the natural logarithm of both sides, we see that

\displaystyle \ln H = \ln \left( \frac 1 {1 - 1 / p_1} \right) + \ln \left( \frac 1 {1 - 1 / p_2} \right) + \ln \left( \frac 1 {1 - 1 / p_3} \right) + \dotsb.

The natural logarithm of {\infty} is {\infty}, so this means the series on the right-hand side diverges. For every positive integer {n}, the {n}th term of the series on the right-hand side, {\ln (1 / (1 - 1 / p_n))}, can also be written as {\ln (1 + 1 / (p_n - 1))}, and this number is less than or equal to {1 / (p_n - 1)} (in general, for every real number {x} greater than {-1}, {\ln (1 + x) \le x}). If {n} is greater than 1, then this number, in turn, is less than or equal to {1 / p_{n - 1}}, because {p_n - 1} is greater than or equal to {p_{n - 1}}. And {1 / (2 - 1)}, i.e. 1, is less than or equal to 1. So, by the comparison test, the series

\displaystyle 1 + \frac 1 {p_1} + \frac 1 {p_2} + \frac 1 {p_3} + \dotsb

diverges, too. This shows that the set consisting of 1 as well as all the prime numbers is large, but it is then an immediate consequence that the set of all the prime numbers, not including 1, is large (since we can subtract 1 from the above series and the value of the result will still have to be {\infty}).

This is a rather interesting and surprising result. The set of all the prime numbers is large, but for every positive integer {p}, the set of all the positive integral {p}th powers is small, which means that, in a certain sense, there are more prime numbers than positive integral {p}th powers.

How to calculate cube roots in your head

Here’s a way to impress people. Ask somebody to think of a number with no more than 3 digits and tell you its cube (they can use a calculator to do this). Using the method I’m about to tell you about, you can calculate the original number, the cube root of the number you’re given, without needing to carry out any intermediate calculations other than addition and subtraction of numbers with no more than 2 digits. If you can carry out these intermediate calculations at a normal speed, you will probably be able to calculate the cube root within a minute, with some practice.

Suppose the number you’re given is {n^3} (so that {n} is the original number, the cube root you’re trying to find). You will know that {n} has at most 3 digits. If you use this method, you will calculate the three digits of {n} one by one. The initial and final digits can be calculated almost immediately; the middle digit is the one you will spend most of your time calculating, because this is the part of the calculation where you have to carry out some additions and subtractions.

1. The final digit

In order to find the final digit, you can use the fact that for every pair of numbers {x} and {y}, {x} and {y} have the same final digit if and only if {x^3} and {y^3} have the same final digit. In other words, the final digit of {n} is predictable from the final digit of {n^3}. The following table can be used.

Final digit of {n^3} 0 1 2 3 4 5 6 7 8 9
Final digit of {n} 0 1 8 7 4 5 6 3 2 9

The table is not difficult to memorise. Just remember it this way: if the final digit of {n^3} is 0, 1, 4, 5, 6, or 9, then the final digit of {n} is the same, and otherwise (if the final digit of {n^3} is 2, 3, 7 or 8), the final digit of {n} is 10 minus the final digit of {n^3}.

If you’re not interested in why this works, you can skip ahead a few paragraphs. But in case you are, I will explain why this works. First, note that for every number {x}, the final digit of {x} is its remainder on division by 10, i.e. the unique integer {d} such that {0 \le d < 10} and {x = 10 q + d} for some integer {q}. For example, {143 = 140 + 3} and {2764 = 2760 + 4}. Cubing both sides of the latter equation yields

\displaystyle  \begin{array}{rcl}  x^3 &=& (10 q + d)^3 \\ &=& 10^3 q^3 + 3 \cdot 10^2 q^2 d + 3 \cdot 10 q d^2 + d^3 \\ &=& 10 (10^2 q^3 + 3 \cdot 10 q^2 d + 3 \cdot q d^2) + d^3. \end{array}

Now, if we let {r} be the final digit of {d^3}, i.e. the remainder of {d^3} on division by 10, i.e. the unique integer {r} such that {0 \le r < 10} and {d^3 = 10 p + r} for some integer {q}, we then have

\displaystyle  \begin{array}{rcl}  x^3 &=& 10 (10^2 q^3 + 3 \cdot 10 q^2 d + 3 \cdot q d^2) + 10 p + r \\ &=& 10 (10^2 q^3 + 3 \cdot 10 q^2 d + 3 \cdot q d^2 + p) + r, \end{array}

from which we can see that {r} is also the remainder of {x^3} on division by 10, i.e. the final digit of {x^3}. This shows that for every number {x}, the final digit of {x^3} is the final digit of {d^3}, where {d} is the final digit of {x}. Let’s make a table showing what the final digit of {d^3} is for each of the 10 possible values of {d}. The final digits of each of the numbers in the second row have been bolded.

{d} 0 1 2 3 4 5 6 7 8 9
{d^3} 0 1 8 27 64 125 216 343 512 729

Now, returning to the task at hand, we know the final digit of {n^3}, which is also the final digit of {d^3}, where {d} is the final digit of {n}. Let’s call this number {d'}. From the table above, we can see that if {d} is 1, {d'} is 1; if {d} is 3, {d'} is 7; if {d} is 6, {d'} is 6; if {d} is 8, {d'} is 2, and so on. In fact, for every possible value of {d'} there is only one possible value of {d}, and therefore we can read off the value of {d} by finding the unique cell in the bottom row containing a number whose final digit is {d'}, and seeing which number is in the cell above. This is where the original table comes from.

This proof crucially relies on the fact that for every possible value of {d'} there is only one possible value of {d}. If you try to do the same thing with, say, square roots, you find that there are some possible values of {d'} for which there are multiple possible values of {d}, and therefore, while this method can narrow down the possible values of {d} it cannot give you the exact value.

2. The initial digit

In order to find the initial digit, have a look at the table above which shows the cubes of each of the non-negative integers less than 10. You will need to memorise this table, unfortunately, and there is no simple summary for it as there is with the other table. However, knowing the first 10 non-negative perfect cubes is useful in a lot of situations where you need to make mental calculations, not just in this situation.

First of all, you need to write {n^3} as a 9-digit number. That means, if it has less than 9 digits, put 0s in front of it until it has 9 digits. Now, look at the first 3 digits of {n^3}, when it is written like this; this gives you a 3-digit number. Using your knowledge of the first 10 non-negative perfect cubes, identify the greatest perfect cube which is no greater than this 3-digit number. The first digit of {n} is the cube root of this perfect cube. You can also use the following table.

First 3 digits of {n^3} 000 001–007 008–026 027–063 064–124
First digit of {n} 0 1 2 3 4

First 3 digits of {n^3} 125–215 216–342 343–511 512–728 729–999
First digit of {n} 5 6 7 8 9

To see why this works, note that for every number {x} with no more than 3 digits, the first digit of {x} is the unique integer {d} such that {10^2 d \le x < 10^2 (d + 1)}, where {n} is the number of digits of {x}. For example, {300 \le 341 < 400} and {100 \le 192 < 200}. Cubing each part of this inequality yields {10^6 d^3 \le x^3 \le 10^6 (d + 1)^3}. For example, {27000000 \le 341^3 < 64000000} and {1000000 \le 192^3 < 8000000}. And if an inequality of the form of the latter one is observed to hold, for some value of {d}, then the cube roots of each part can be taken to obtain an inequality of the form of the former one, thus showing that the first digit of {x} is {d}.

By the way, this part of the method does not rely on the fact that {n^3} is a perfect cube; it will also work for finding the first digit of irrational cube roots. It can also be generalised without complication to find the first digit of roots of any order: square roots, 4th roots, etc.

3. The middle digit

The cleverest part of the method is the calculation of the middle digit. Unlike the initial and final digits, there is no simple characterisation of how the middle digit relates to the whole number. Instead, we take advantage of a rather obscure fact that relates the three digits of {n}. The middle digit can therefore only be calculated once the other two digits are known. This obscure fact is the fact that for every number {x}, the remainder of {x} on division by 11 is same as the remainder on division by 11 of the alternating sum of the digits of {x}. By the alternating sum of the digits of {x}, I mean the sum of the digits of {x} at places corresponding to even powers of 10 (the unit place, the hundreds place, the ten thousands place and so on) minus the sum of the digits of {x} at places corresponding to odd powers of 10 (the tens place, the thousands place, the hundred thousands place and so on). For example, the alternating sum of the digits of 121281 is {1 - 8 + 2 - 1 + 2 - 1 = -5}, and, sure enough, we have {121281 = 11 \cdot 11025 + 6}, while {-5 = -11 + 6}, so the remainders of 121281 and -5 on division by 11 are the same. In practice, to calculate the alternating sum of {x}, you look at each digit of {x} in turn, going from right to left (not the order the digits are written in, which is left to right!) and alternate between adding and subtracting the digit from the total; that’s why it’s called an alternating sum. It can be helpful to take the digits in groups of two and find the difference between the two digits in the group first, then add it to the total; that way, you are less likely to get confused since you do the same thing at each step. For example, to calculate the alternating sum of 121281, you see that {1 - 8 = -7}, and set the total at {-7}, then you see that {2 - 1 = 1}, and set the total at {-6}, and finally you see that {2 - 1 = 1} and set the total at {-5}. By the way, if the alternating sum is non-negative and less than 11 then it is equal to its remainder on division by 11; otherwise, although it will often be small enough that you can immediately see its remainder on division by 11, you could find the alternating sum of the digits of the alternating sum, repeating the process, to get an even smaller number.

If you just want to see how to calculate the middle digit, skip ahead, but you might be wondering why these two remainders are the same, so I will give you a proof. Suppose {x} is an integer with {n} digits. Let {d_0} be its unit digit, let {d_1} be its tens digit, … and let {d_n} be its {10^n}s digit, so that

\displaystyle x = d_0 + 10 d_1 + 10^2 d_2 + \dotsb + 10^n d_n.

We can also write this as

\displaystyle x = d_0 + (11 - 1) d_1 + (11 - 1)^2 d_2 + \dotsb + (11 - 1)^n d_n.

For every integer {k} such that {0 \le k \le n}, {(11 - 1)^k} expands into a sum that can be written as {11 q_k + (-1)^k} for some integer {q_k} (because every term in the sum other than {(-1)^k} has at least one factor of 11). Therefore, we have

\displaystyle  \begin{array}{rcl}  x &=& d_0 + (11 - 1) d_1 + (11 \cdot 9 + 1) d_2 + \dotsb + (11 q_k + (-1)^k) d_n \\ &=& 11 (d_1 + 9 d_2 + \dotsb + q_k d_n) + d_0 - d_1 + d_2 + \dotsb + (-1)^k d_n, \end{array}

from which we can see that the remainder of {d_0 - d_1 + d_2 + \dotsb + (-1)^k d_n} on division by 11 will be the same as the remainder of {x} on division by 11 (by the same logic we used above to show that {x^3} and {d^3} had the same remainders on division by 10). This completes the proof.

So, how do you use this fact to calculate the middle digit? Well, first note that for every number {x}, if we let {r} be the remainder of {x} on division by 11, then {r^3} is the remainder of {x} on division by 11. We proved this above for division by 10 rather than division by 11, and the proof for division by 11 is pretty much exactly the same. Just replace references to 10 by references to 11. So I won’t go through the proof in detail here. Instead, we can get right awya to seeing what the remainder of {r^3} on division by 11 is for each of the 11 possible values of {r}. In the following table, the terms ‘quotient of {r^3}’ and ‘remainder of {r^3}’ should be taken to refer to the quotient and remainder on division by 11; there just isn’t room to give a full description inside the table.

{r} 0 1 2 3 4 5 6 7 8 9 10
{r^3} 0 1 8 27 64 125 216 343 512 729 1000
Quotient of {r^3} 0 0 0 2 5 11 19 31 46 66 90
Remainder of {r^3} 0 1 8 5 9 4 7 2 6 3 10

Again, it happens that this table has the convenient property that for every possible value of {r^3} there is only one possible value of {r}, and therefore we can use it to find the value of {r} given the value of {r^3}. Or, more conveniently, we can use the following inverted form of the table.

{r^3} 0 1 2 3 4 5 6 7 9 10
{r} 0 1 7 9 5 3 8 6 2 4 10

Unfortunately, there is no simple way to summarise the contents of this table as there was with the earlier one. It’s just an more or less random correspondance which you have to memorise. This is why you need a bit of practice to be able to do the calculation quickly. But if you put your mind to it and test yourself at regular intervals over a couple of days, you’ll probably be able to successfully memorise the correspondance.

So, in order to find the middle digit, you find the remainder of {r^3} on division by 11 by calculating the alternating sum of the digits of {n^3} (and taking the remainder on division by 11 of this alternating sum, although often it will be non-negative and less than 11, in which case it is the same as its remainder). Then you use the table above to find the remainder of {r} upon division by 11. Bearing in mind the fact we stated above about remainders upon division by 11, perhaps you can now see what to do. The alternating sum of the digits of {n} is {D - m + d}, where {D} is the initial digit of {n}, {m} is the middle digit of {n} and {d} is the final digit of {n}. You can work out the values of {D} and {d}, and you also know that the remainder on division by 11 of this alternating sum is {r}. At this point you might be able to immediately see what {m} has to be, but its value can be calculated systematically: we have {D - m + d = 11 q + r} for some integer {q}, so {D + d - r = 11 q + m}, which shows that {m} is the remainder on division by 11 of {D + d - r}.

This is a little complicated, so an example will be helpful. Suppose {n^3 = 257259456}. The final digit of {n} must also be 6 (using the table), and the initial digit of {n} must be 6 since {6^2 = 216 \le 257 < 343}. The next step is to calculate the alternating sum of the digits of {n^3}: {(6 - 5) + (4 - 9) + (5 - 2) + (7 - 5) + 2 = 1 + (-5) + 3 + 2 + 2 = 3}. From the table it follows that the remainder of {n} upon division by 11 is 9. Therefore, the remainder on division by 11 of {6 - m + 6}, i.e. {12 - m}, where {m} is the middle digit of {n}, must be 9. It is then easy to see that {m} must be 3, so {n = 636}. The last step could also be done by noting that {6 + 6 - 9 = 3}. Try using a calculator to confirm for yourself that {636^3 = 257259456}.

4. Summary

I’ve explained everything at great length here, so it might seem like this method is very complicated, but it isn’t really. Here’s a step-by-step presentation of the method, with rough estimates of how long it will take for you to do each step.

  1. Look at the final digit of {n^3} and use your knowledge of the correspondance to find the final digit {d} of {n}. (If {n^3} ends in 2, 3, 7 or 8, {d} is 10 minus the final digit of {n^3}, and otherwise {n} ends in the same digit as {n^3}.)
  2. Find the first 3 digits of {n^3} as a 9-digit number. Use your knowledge of the first ten non-negative perfect cubes and find the greatest integer {D} such that {D^3} is less than the resulting 3-digit number. The first digit is {D}.
  3. Find the alternating sum of the digits of {n^3}, and if needed repeat the process to get the remainder of {n^3} on division by 11.
  4. Use your knowledge of the correspondance to find the remainder {r} on division by 11 of {n}.
  5. Calculate {D + d - r} and find its remainder {m} on division by 11. The middle digit is {m}.

If you have everything memorised, I think step 1 will take at most 2 seconds, step 2 will take perhaps a little longer, but still at most 5 seconds, step 3 will take the majority of your time, at most 30 seconds, with the potential to last much longer if you get confused, step 4 will take at most 2 seconds, and step 5 will take at most 10 seconds. That’s why I can confidently say that you will be able to calculate the cube root within a minute.

I learnt about this from this post at Dead Reckonings. If you are interested in this, have a look at that post— it covers a lot of other mental calculation methods in a more concise manner than this post. I don’t know about you, but until I came across that post I always used to associate the topic of mental calculation with tedious execution of algorithms and had therefore considered it uninteresting and not really `real maths’. But as you acquire more mathematical knowledge, you find that calculations end up becoming a sort of creative process, where you often don’t stick to a single method but improvise a way of tackling the problem, drawing on what you know, so that it can be solved as quickly as possible. It’s also interesting to see why these algorithms work and think of how one might come up with them. If you’re interested you could try coming up with similar algorithms for other kinds of roots. Or, perhaps, see if there’s a way of finding cube roots of numbers with more than 9 digits (though I have no idea how you might do that).

An introduction to Turing machines

The story of how Alan Turing made important contributions to the allied war effort in World War II as part of a team cracking German codes, only to later end up being prosecuted for homosexual activity and chemically castrated as punishment, is well-known. If you didn’t already know about it, you might have seen the film about it, The Imitation Game, which came out recently. But this story, while worth knowing about, is not the whole story of Turing’s life: he was also a rather important mathematician. In fact, he is considered one of the founders of computer science. His principal contribution was the concept of the Turing machine, and in this post, I’ll try and explain what the Turing machine is and its significance, so that you can learn something about what will, in the end, probably be his most enduring claim to fame.

Essentially, a Turing machine is an abstract model of a computer, which we can use to make mathematical proofs about what computers can do and what they cannot—or, in other words, what is computable and what is not computable. It’s important to bear in mind that when Turing came up with the concept, computers as we know them today, these complex machines with CPUs and memory, did not exist. In Turing’s time, the word ‘computer’ was mainly used to refer to people. It was an agent noun, formed from the verb ‘compute’. The prototypical kind of computation, to somebody in Turing’s time, would have been arithmetical calculation: adding two numbers, multiplying two numbers, that sort of thing. This sort of computation, of course, can be done in your head, or, failing that, with the aid of a pencil and some paper. No machines need be involved, although they might be able to do it a lot faster. The point is that computation is a pretty fundamental concept that exists independently of the machines that we call computers. In fact, it is closely related to the concept of algorithms: computation can be thought of as the carrying out of algorithms. And, of course, algorithms have been an important concept in mathematics since it began—the ancient Babylonians knew and used algorithms to solve linear and quadratic equations.

The problem Turing was trying to solve was this: how can computation be defined precisely? It is, of course, fairly easy to give a non-precise definition of the concept. To compute is to carry out an algorithm. And algorithms can be characterised by the following properties, paraphrased from the Stanford Encyclopedia of Philosophy. An algorithm is a procedure for obtaining a result such that

  • the algorithm consists of a finite number of instructions, which are precise and unambiguous,
  • following the instructions results in the production of the result after a finite number of steps (the algorithm does not run forever),
  • the instructions can be followed successfully by a human without any special equipment, except paper and a pencil to aid the memory,
  • and, as long as the instructions are understood and followed correctly, they can be successfully followed; no ingenuity is required of the human.

(The term that tends to be used, rather than ‘algorithm’, in this area is ‘effective procedure’, but it means the same thing. I’ll stick with ‘algorithm’ in this post, since it’s probably more familiar.)

There is nothing wrong with non-precise definitions; most concepts do not have precise definitions, and indeed most concepts probably cannot be given precise definitions. Before Turing mathematicians had mostly managed just fine without a precise definition of computation. But Turing had a pretty good reason for wanting a more precise definition. In 1928, David Hilbert posed a problem known as the Entscheidungsproblem, or, in English, the ‘decision problem’. This problem asked whether there was an algorithm that could, basically, solve mathematics. More specifically, it asked whether there was an algorithm that could show whether an arbitrary sentence of first-order logic was true or false. First-order logic is a precisely defined language in which virtually every mathematical statement can be expressed, including conjectures like Goldbach’s conjecture (that every even integer greater than 2 is the sum of two prime numbers) that mathematicians have been trying to prove for centuries. If there was an algorithm that could show whether an arbitrary sentence of first-order logic was true or false, we could simply run the algorithm on Goldbach’s conjecture and this centuries-old problem would be solved. (It might be a bit of an exaggeration to say that mathematics would be completely solved, though, because running the algorithm might be practically infeasible, if it required too much computing power. Also, there might be interesting questions that are not expressible in first-order logic. Still, it can’t be denied that the solution of Entscheidungsproblem would have an enormous impact on our conception of mathematics.) But in order to give a proof that such an algorithm could exist or could not exist and thus solve the Entscheidungsproblem, it was necessary first to give a mathematical definition of computation. Otherwise, anyone who claimed to have found such an algorithm could be countered with the accusation that their algorithm wasn’t a real algorithm, or anyone who claimed to have proven that such an algorithm was impossible could be countered with the accusation that they hadn’t considered every possible algorithm.

But now you might be wondering: wouldn’t any precise definition of computation suffer from the same problem? Any proof based on the definition could be countered with the accusation that you weren’t using the right definition. If, for a given definition, there was an algorithm that could decide whether an arbitrary mathematical statement was true or false, people could say that the definition was too inclusive. And if it could be proven that there was no such algorithm, people could say that the definition was too exclusive. Admittedly, the first possibility was less serious, because, presumably, if there was such an algorithm, we could just see how it works, and it would be fairly easy to see, on an intuitive basis, whether it was an algorithm by checking whether it didn’t require ingenuity, et cetera. As of today, however, one thing is certain: nobody has managed to find such an algorithm. So we have to consider the second possibility, too. But, since our intuitive concept of algorithms is somewhat vague, how can we ever be sure that our definition doesn’t exclude some algorithms that we’d intuitively recognise as algorithms? This problem seems especially insurmountable given that algorithms can be enormously complicated, indeed, they can be complicated to an arbitrary degree; we don’t have a lot of intuition as to the upper limits of what algorithms are capable of.

It is certainly impossible to ever be sure in the mathematical sense that a given definition of computation is the correct one. But there is some pretty convincing evidence that Turing’s definition is the correct one. There were actually quite a few definitions of computation proposed during the 1930s, along with Turing’s definition via Turing machines. Principal among these were Alonzo Church’s definition via lambda calculus and Kurt Gödel and Jacques Herbrand’s definition via recursive functions. I won’t go into these alternative definitions of computation here, but if you look them up yourself, you’ll see that they were quite different in nature from each other and from Turing’s definition (which I’m going to get to shortly). Yet it turned out, conveniently, that all of these definitions were equivalent: the procedures defined as algorithms by Turing’s definition were exactly those defined as algorithms by Church’s definition and exactly those defined as algorithms by Gödel and Herbrand’s definition. And this is quite a coincidence. Consider the collection of all algorithms, A. Each definition picks out a certain sub-collection A′ of A (i.e. a collection which consists of algorithms, but does not necessarily contain every algorithm), and only the algorithms in this sub-collection are considered algorithms by that definition. The fact that the definitions are all equivalent means that each one picks out exactly the same sub-collection. Why would this be the case? One explanation would be that each definition is, in fact, picking out the biggest sub-collection possible—the collection A itself. That’s why the vast majority of mathematicians today believe that Turing’s definition (and Church’s, and Gödel and Herbrand’s, since they coincide with Turing’s definition) is the correct one. This belief has a name: it’s known as Church’s thesis (or the Church-Turing thesis; I suppose calling it the Church-Turing thesis would be more in accordance with the theme of this post, but ‘Church’s thesis’ is less verbose). It is not, by its nature, definitely provable to the standards mathematicians generally prefer, but I don’t think anybody seriously doubts it to be true. After all, people have come up with lots of other definitions of computation since the 1930s, and they have all ended up coinciding with the original definitions of Gödel, Herbrand, Church and Turing. And nobody has ever managed to think of an algorithm that does not satisfy these definitions. It is worth noting that while Church’s thesis cannot be definitely proven, it could be definitely disproven, if there did turn out to be an algorithm that was not considered an algorithm by the definitions. This is part of the reason why we believe it: it’s a falsifiable belief, and yet nobody has managed to falsify it.

Let’s see what Turing’s definition was, then. The idea Turing had was to describe a simple kind of machine for carrying out algorithms. He didn’t call these machines ‘Turing machines’ himself (that would be kind of egotistical), but that’s what they have become known as. An algorithm could then be defined as any procedure that could be carried out by such a machine. Again, it’s interesting to note that for Turing, it was the idea of someone working out a calculation with pencil and paper that was foremost in his mind when he thought about computation. Paper is two-dimensional, so, to simplify matters, he said that the machine would make its calculations on a one-dimensional strip of paper, a ‘tape’, which would be divided into squares. There would be infinitely many squares; the machine would not be limited by constraints of space. (After all, a person carrying out a calculation can always find more paper.) Each square would contain space for a single symbol to be written, chosen from a finite collection of available symbols. The square could of course also be blank, but we can consider that to be just another symbol. The machine would ‘look’ at a single square at a time and its action would be determined by the symbol on that square. However, the machine could be placed in a finite number of different states, so it would not always do the same thing upon seeing a given symbol; its action would also depend on its state. The possible actions would be the following:

  1. Write a new symbol on the square currently being looked at (this may be the blank symbol, so that the old symbol is erased).
  2. Move the tape one square to the left, so that the machine ends up looking at the square directly to the right of the one that was originally being looked at. This is called ‘shifting right’.
  3. Move the tape one square to the right, so that the machine ends up looking at the square directly to the left of the one that was originally being looked at. This called ‘shifting left’.
  4. Stop the computation. This is called ‘halting’.

After performing one of these actions, the machine would also possibly change to a different state (if it did not halt). In more abstract terms, the machine would be completely specified by its possible states, its possible symbols, and an association of each state q and each symbol S with an action and a state, corresponding to the action performed, and the new state entered (which may be the same state) when the machine is looking at S while in state q. If the action is action 4 (the halting action), it is pointless to specify a new state to enter, so we can adopt the convention that if there is no association of one of the state-symbol pairs with an action and a new state, then the machine halts if it looks at that symbol while in that state. The machine would have to halt at some point in order to be useful. If the machine did halt, the sequence of symbols remaining on the tape afterwards would be considered the result of the computation. This result may depend on the initial sequence of symbols that was on the tape before the computation was started. So the machine could be thought of as computing a function, which takes as input the initial sequence of symbols, and returns as output the final sequence of symbols. (If the machine does not halt for some input, the function is undefined for that input.) The sequences of symbols could be interpreted any way you like. For example, if the symbols are the blank symbol and 1, you could interpret a sequence of symbols as a sequence of numbers by interpreting each sub-sequence of consecutive 1s as the number of 1s in the sub-sequence, with the blank symbols functioning as separators. So, for example, 11 111 would represent (2, 3), 1111 would represent (4), 111 11 1 would represent (3, 2, 1).

It might not seem obvious that such a simple machine really can carry out any computation. So, I’ll show you a few examples of Turing machines that can carry out calculations with numbers, using the interpretation of sequences of blank symbols and 1s as sequences of numbers given above. Here’s a very simple example: a Turing machine that adds 1 to the input (which should be a single number). It’s represented as a flow chart. To find out what the machine will do in the state q while looking at the symbol S, look at the node which is labelled q, and then see if there is an arrow with a label of the form S : A coming out of this node. If there is such an arrow, this means the machine will carry out action A, and then change state to state p, where p is the label of the node the arrow points to. If there is no such arrow, this means the machine will halt. By the way, I’m using 0 here to represent the blank symbol.

A flowchart with two nodes, labelled ‘1’ and ‘2’. Node 1 has two arrows coming out of it; one of them is labelled ‘1 : shift left’, and points back to node 1, and the other is labelled `0 : write 1`, and points to node 2. Node 2 has no arrows coming out of it.

(sorry about the crappy hand-drawn diagram, but I couldn’t find any good software for drawing these)

In order for this machine to work, the input must only contain one string of consecutive 1s, and the machine must start by looking at one of these 1s. Then the machine will shift left until it ends up looking at the first 0 to the left of this string of consecutive ones. It will then change this 0 to a 1, and then halt. So the string of consecutive 1s has an extra 1 added to it.

Here’s a more complicated example, building on this one: a Turing machine that adds the two numbers in the input.

A flowchart with eight nodes, labelled ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’ and ‘8’. Node 1 has two arrows coming out of it; one of them is labelled ‘1 : shift right’, and points back to node 1, and the other is labelled ‘0 : shift right’, and points to node 2. Node 2 has two arrows coming out of it; one of them is labelled ‘0 : shift right’, and points back to node 2, and the other is labelled ‘1 : write 0`, and points to node 3. Node 3 has one arrow coming out of it labelled `0 : shift right’ which points to node 4. Node 4 has two arrows coming out of it; one of them is labelled ‘1 : shift left’ and points to node 5, and the other is labelled ‘0 : shift left’ and points to node 7. Node 5 has two arrows coming out of it; one of them is labelled ‘0 : shift left’ and points back to node 5, and the other is labelled ‘1 : shift left’, and points to node 6. Node 6 has two arrows coming out of it; one of them is labelled ‘1 : shift left’ and points back to node 6, and the other is labelled ‘0 : write 1` and points to state 1. Node 7 has two arrows coming out of it; one of them is labelled `0 : shift left’ and points back to node 7, and the other is labelled ‘1 : shift left’, and points to node 8. Node 8 has two arrows coming out of it; one of them is labelled ‘1 : shift left’ and points back to node 8, and the other is labelled `0 : write 1` and points to node 3.

(again, sorry about the crappy lighting, but I don’t have a scanner)

This one is probably too complicated to understand just by looking at the diagram. The idea here is that the machine will start by looking at one of the 1s in the first number in the input (i.e. the leftmost sequence of consecutive 1s). Then it will shift left, past this sequence of consecutive 1s, and further past the sequence of 0s that separates the two numbers (this requires a change of state), until it reaches the leftmost 1 in the second sequence of consecutive 1s. Then it changes this 1 to a 0, and—here’s the clever part—it shifts to look at the square immediately to the right. If this square is a 1, the machine shifts left until it reaches the first 0 to the left of the first sequence of consecutive 1s. Then it changes this 0 to a 1. Effectively, the 1 that was in the second sequence gets transferred to the first sequence. Then the machine goes back to state 1, and the process repeats. But at some point, it will change the final 1 in the second sequence to a 0, and then, when it shifts right, it will encounter a 0 rather than 1. This tells the machine that it has reached the end of the second sequence and, therefore, once it’s transferred this final 1 to the first sequence, it can halt (it does this by transferring to state 3, since state 3 has no action specified for the symbol 1).

A nice way to see how this works is to use this online Turing machine simulator. Copy and paste the following code into the text box:

0 1 1 r 0
0 _ _ r 1
1 _ _ r 1
1 1 _ * 2
2 _ _ r 3
3 1 1 l 4
4 _ _ l 4
4 1 1 l 5
5 1 1 l 5
5 _ 1 * 0
3 _ _ l 6
6 _ _ l 6
6 1 1 l 7
7 1 1 l 7
7 _ 1 * halt

and type in an input, such as 1111 111, into the textbox labelled ‘Initial input’. Then click the button labelled ‘Run’. You’ll see the machine going back and forth as it transfers the 1s from right to left, one by one. If you want, you can play around with that Turing machine simulator and try to do other things, such as multiplying two numbers, to help convince you further that Turing machines can carry out any computation. I can understand if you don’t want to, though, because, as you have probably gathered by now, programming a Turing machine is a very tedious and rather tricky task. Programming a computer can be thought of as a communication problem: you have to tell the machine what to do, but the machine only understands a very peculiar language which humans are not accustomed to thinking in. In practice, people program computers using programming languages such as Java or C++, which are closer to natural language (although still notoriously tricky to communicate to the computer with). Programs written in programming languages are translated into machine language, using programs written in machine language called compilers, so that the computer can understand them.

Now, let’s use this definition of computation to prove what we were trying to prove; that the Entscheidungsproblem is unsolvable. The proof was given by Turing in a 1936 paper, which is freely available here. For a more in-depth proof, you can look at this paper. At several points here, I’m going to have to appeal to Church’s thesis to assure you that certain things can be done by a Turing machine. But it is possible, at every point where this is done, to give the precise details of the Turing machine that could carry out the computation. It would just be a bit too tedious for me to go into. But Turing did go into the details in his original paper.

First, we will show a certain limitation on the power of Turing machines. Note that the input of a Turing machine can be taken to represent a Turing machine itself. This should be fairly clear: a Turing machine is just a set of states, a set of symbols, together with associations of some of the state-symbol pairs with action-state pairs, and clearly these things can all be specified in a finite list of symbols. So we can use Turing machines to carry out computations that tell us stuff about Turing machines. In fact, we can have something called a universal Turing machine. This is a Turing machine U such that for every Turing machine T and every sequence of symbols x, if U is given a sequence of symbols as its input that represents T together with x, then the output of U will be the output of T when it is given x as input. Or, if T halts when it is given x as input, U will halt. Basically, U is a ‘Turing machine simulator’; it can simulate any Turing machine. How can we be sure that a universal Turing machine exists? Well, Church’s thesis means that a universal Turing machine exists if and only if there is an algorithm for finding the output of an arbitrary Turing machine given an arbitrary input. And it is intuitively clear that there is such an algorithm. The fact that universal Turing machines exist is what enables computers to run arbitrary programs; computers are not just Turing machines, but in fact universal Turing machines.

Here’s an interesting question. Is there a Turing machine H that can tell us whether an arbitrary Turing machine T will halt when it is given an arbitrary sequence x of symbols as its input? Exactly how it tells us is not particularly important, but, for example, it might be that the output of H is the number 1 if T halts given x as its input and otherwise the output of H is the number 0. It is not obvious that this Turing machine exists. After all, a Turing machine can run for an arbitrarily long time before it halts. No matter how long you run a Turing machine for, there is still a possibility that it might halt in the future that you can’t rule out, unless there is some way to predict the machine’s future behaviour from its behaviour up to this point. And, intuitively, it does not seem like such a prediction is possible; the behaviour of the Turing machine can be arbitrarily complex; it might do one thing for the first million steps, then switch to doing something completely different.

That’s the intuitive argument that H cannot exist. Now, let’s prove it. This is the only non-straightforward proof in this post; it’s not particularly complicated, but it is a little mind-bending. It uses an argument known as a diagonal argument, which also turns up in Russell’s paradox and Cantor’s proof of the uncountability of the real numbers. If you have never heard of either of those two things, you might have heard of the liar paradox. This is the paradox that arises from the statement ‘this statement is false’. The truth of this statement is equivalent to its own falsehood; it is self-contradictory. Diagonal arguments are a kind of generalisation of the liar paradox. But let’s see how the argument goes in this particular case.

Every Turing machine T can be represented by a sequence x of symbols. So, we can give this representation of T as the input to T. Using H, we can find out whether T halts when it is given this input. We have just described an algorithm for finding out whether an arbitrary Turing machine halts when it is given its own representation as its input. So there is a Turing machine G that can carry out this algorithm. In other words, for every Turing machine T and every sequence of symbols x, G does one thing if T halts when it is given the input x, and it does another thing if T does not halt when it is given the input x. The useful thing to do would be to do something like returning 1 in the first case, and 0 in the second case. But let’s suppose G does something slightly different instead: it does not halt in the first case, and it halts giving an output (it doesn’t really matter what output) in the second case. So, then, G halts if and only if T does not halt when it is given the input x. Can you see the problem here? G, like any other Turing machine, has its own representation x, and we can give x to G as its input. So what happens then? By definition, G halts if and only if G does not halt when it is given the input x. So, whether G halts or not, there is an absurd situation where it both halts and does not halt. We have shown that assuming the existence of H leads to a logical absurdity; therefore, H cannot exist.

And this is a big problem if we are trying to solve the Entscheidungsproblem. Because statements about Turing machines, including statements of the form ‘does the Turing machine T halt when it is given the input x’, can be expressed in first-order logic. So if we had an algorithm that could decide whether an arbitrary sentence of first-order logic, we could, among other things, decide whether an arbitrary Turing machine halts when it is given an arbitrary input. Since this is impossible, there can be no such algorithm. Thus we have proven that the Entscheidungsproblem is unsolvable. Actually, though, I haven’t been totally rigorous here. I haven’t properly justified my claim that all statements of the form ‘does the Turing machine T halt when it is given the input x’ can be expressed in first-order logic. For a start, in order to justify this claim, I would have to specify precisely what I mean by first-order logic, because there are actually lots of variants of first-order logic, and for some of these variants the Entscheidungsproblem can be solved (such variants are said to be decidable). However, every variant of first-order logic with a sufficient amount of expressive power is undecidable; it is only the simplest variants which are decidable. I can’t give proper proofs of these results without going into technical details about how first-order logic works. For now, I will leave you, hopefully, somewhat convinced that the Entscheidungsproblem is unsolvable, but fully convinced that the halting problem (the problem of finding out whether an arbitrary Turing machine halts given an arbitrary input) is unsolvable.

Further reading

This post is mainly based on parts of the following two books:

  • Boolos, G., Burgess, J. P., & Jeffrey, R. C. (2002). Computability and logic. Cambridge University Press.
  • Wolf, R. S. (2005). A tour through mathematical logic. AMC, 10, 12.

Also, although I’ve already linked to it to it above, I’ll mention again that Turing’s original paper is available freely here.

An introduction to the cardinal numbers

There are lots of different kinds of numbers. In school, you learn about some of them: integers, rational numbers, irrational numbers, real numbers, imaginary numbers, complex numbers. So educated non-mathematicians are familiar with these kinds of numbers. But there are other kinds of numbers which pretty much nobody who doesn’t have an interest in maths has heard of, and that’s a pity, because some of them are really cool. Two of the most interesting examples are the cardinal numbers and the ordinal numbers. Both of these number systems are ways of extending the ordinary system of positive integers to allow for numbers representing infinite quantities. I’ll talk about the cardinal numbers in this post, and I’ll possibly talk about the ordinal numbers in another one.

In order to understand what cardinal numbers are, you need to understand what sets are, in the mathematical sense. In its ordinary sense, the word ‘set’ can refer to a collection of objects, and that’s basically what it refers to in the mathematical sense, as well. However, the first examples of sets, in the ordinary sense, that come to mind are probably all collections of objects which are physically bundled together in some way, e.g. the set of socks in your wardrobe, or the set of coins in a bag. But we can also use the word ‘set’ to refer to collections of objects which are not physically close, but are distinguished by sharing some other common property, e.g. the set of all the people you’re related to. In mathematics, the sets we talk about tend to be more like this: after all, they usually contain abstract things, like numbers, which do not have any physical location. If we wanted to, we could talk about rather silly sets, like the set consisting of John F. Kennedy, the North Pole, and the philosophical school of Stoicism. The property that distinguishes the objects in this set is the property of being either John F. Kennedy, the North Pole or Stoicism. It’s not the kind of property you’d usually want to talk about, but, still, it’s a property.

The number of objects a set {S} contains is called its cardinality and denoted {|S|}. For example, the cardinality of the set consisting of John F. Kennedy, the North Pole and Stoicism is 3. However, there is nothing in the definition of the word ‘set’ that implies sets can only contain a finite number of objects. For example, the set of integers contains infinitely many objects. Sure, if we restrict our attention to sets which contain actual, physical objects, sets may all be finite in size (depending on what you mean by ‘actual, physical objects’, and also depending on physics). But, certainly, if we want to talk about sets containing abstract mathematical objects like numbers, we’re going to have to accept that some of them are infinite in size. Sets like this are called infinite sets, as opposed to finite sets. The question which motivates the development of the concept of cardinal numbers is this: what is the cardinality of an infinite set? Clearly, it is not any integer. Of course we can introduce a new number called infinity and say that the cardinality of every infinite set is infinity, but this doesn’t really give any more information than just saying that the cardinality is not any non-negative integer. For a long time, mathematicians assumed that this was the only way to assign cardinalities to infinite sets, and, therefore, they did not consider this question very interesting. It was only in 1874 that things changed. Georg Cantor published a paper explaining how there was a quite sensible way of assigning cardinalities to infinite sets which did not assign them all the same cardinality. In particular, it assigned the set of the real numbers a larger cardinality than the set of the integers. He called the new numbers introduced in this way (together with the non-negative integers, which are the cardinalities of finite sets) the cardinal numbers.

It was the proof that there were different infinite cardinalities that was Cantor’s innovation; his definition of cardinality was not particularly original. In fact, it is probably the most natural way to define cardinality in precise terms. The idea is, basically, that you find a set’s cardinality by counting the objects in it. To see what I mean, think about what the process of counting involves. You start off knowing that the set contains at least 0 objects. So you start a counter at 0. Then you pick an object out of the set. It may not be possible to do so, in which case you know the set contains exactly 0 objects (i.e. it is empty). But if it is possible to do so, then you know that the set contains at least 1 object, so you add one to the counter, making it 1. And the process repeats like that: at each step, you see whether you can pick out another object from the set; if you can, you add one to your counter, otherwise, you stop and take the current value of the counter as the cardinality. The object picked out must always be a new object, not one of the ones you already picked out. Does this work for infinite sets, though? No, because the process will never stop: after a finite number of steps, there will always be another object to pick out.

The problem, then, is that, as we are conceptualising it, counting is a process; it has to be carried out in a finite number of steps. But there is a different way to conceptualise counting. If you count a set which contains exactly {n} objects, the sequence of values the counter takes is 0, 1, 2, …, {n}. Each of these values, except for 0, can be associated with the object which caused the counter to reach that value when it was picked out, and in this way a one-to-one correspondance is set up between the objects in the set and the integers 0, 1, 2, … and {n}: each of the objects corresponds to exactly one of the integers and each of the integers corresponds to exactly one of the objects. So one way of describing what you do when you count a finite set {S} is that you find a non-negative integer {n} such that you can set up a one-to-one correspondance between the objects in {S} and the integers 0, 1, 2, … and {n}. In general, two sets {S} and {T} have the same cardinality if and only if there is a one-to-one correspondance between the objects in {S} and the objects in {T}. The crucial point here is that you can set up one-to-one correspondances between the objects in infinite sets, too. So, even though there are infinitely many non-negative integers, we can define a new number, {\omega}, by saying that the cardinality of a set {S} is {\omega} if you can set up a one-to-one correspondance between the objects in {S} and the non-negative integers. This new number is our first example of a cardinal number which is not a non-negative integer.

In his book, Two New Sciences, Galileo Galilei was already talking about this way back in 1638. But Galileo noticed that this definition gives some rather odd results. For example, it seems obvious that there are more non-negative integers than square integers. After all, there are infinitely many non-negative integers which are not square integers, and, furthermore, as you go further into the sequence of non-negative integers, the gaps between successive square integers get wider and wider: so it seems like the number of square integers, if it makes sense to speak of this number, should be just a fraction of the number of non-negative integers. But every non-negative integer has exactly one square, and every square integer has exactly one non-negative square root. So there is a one-to-one correspondance between the square integers and the non-negative integers, and, therefore, if we use this definition of cardinality, we have to say that the cardinality of the set of square integers is {\omega}, which is exactly the same as the cardinality of the set of the non-negative integers.

Galileo considered this result absurd enough that the only sensible conclusion to reach was that it was meaningless to talk of infinite quantities being greater than, less than or equal to other infinite quantities. Indeed, it seems likely, if we can set up one-to-one correspondances between two sets which seem very different in size, like these two, then we will be able to set up one-to-one correspondances between any two infinite sets. But, as Cantor showed, this is not the case. Let’s see how he showed it.

Cantor gave his original proof by showing that the cardinality of the set of the real numbers was not {\omega}, but this proof was rather complicated, so I won’t go into it here. Instead, I’ll give a different example of a set whose cardinality is not {\omega}; in fact, I’ll show how an infinite sequence of new cardinal numbers which are all different from one another can be constructed. First, I need to tell you some more things about sets. For every set {S}, sets which only contain objects in {S} are called subsets of {S}. For example, the set of the positive integers is a subset of the set of the integers. The set of all the subsets of {S} is called the power set of {S} and denoted {\mathcal P(S)}. For example, the power set of {\{1, 2, 3\}} consists of {\emptyset}, {\{1\}}, {\{2\}}, {\{3\}}, {\{1, 2\}}, {\{1, 3\}}, {\{2, 3\}} and {\{1, 2, 3\}}. (I’ve used some notation here you may not be familiar with. {\emptyset} is the empty set, the one which doesn’t contain any objects. A list of objects surrounded by curly brackets denotes the finite set consisting of those objects.) What we can prove is that for every set {S}, the cardinality of {\mathcal P(S)} is not the same as that of {S}. If you’ve heard of Russell’s paradox, this proof will remind you of it. Suppose there is a one-to-one correspondance from the members of {S} to the subsets of {S}. For every object {x} in {S}, let {f(x)} be the subset of {S} corresponding to {x}. Consider the set {T} of the members {x} of {S} such that {f(x)} does not contain {x}. This is a subset of {S}. If it is equal to {f(x)} for some member {x} of {S}, then there are two possibilities: either {T} contains {x}, or {T} does not contain {x}. But if {T} contains {x}, then {f(x)} does not contain {x}, by the definition of {T}, but this is absurd, because {f(x)} is {T}! And, if {T} does not contain {x}, then {f(x)} does not contain {x}, because {T} is {f(x)}, but this is, again, absurd, because then {f(x)} is, by the definition of {T}, in {T}. So, if there is a one-to-one correspondance from the members of {S} to the subsets of {S}, this inevitably leads to an absurdity, and hence no such correspondance can exist. (This isn’t just a counter-intuitive absurdity, like the absurdity of there being as many squares as non-negative integers; we have shown that if such a correspondance exists, there is a statement which is both true and false at the same time. That’s a much more serious kind of absurdity.)

The notion of a subset also allows us to define what it means for one infinite cardinality to be greater than another infinite cardinality. A set {S} has greater cardinality than another set {T} if and only if {S} has the same cardinality as a subset of {T}. It is easy to see that the cardinality of {\mathcal P(S)} is, in fact, greater than the cardinality of {S}, because the set of all sets of the form {\{x\}}, where {x} is a member of {S}, is a subset of {\mathcal P(S)}, and there is an obvious one-to-one correspondance between these sets and the members of {S}. This suggests that we can define an infinite sequence of cardinal numbers which increase in size,

\displaystyle \beth_0, \beth_1, \beth_2, \dotsc,

where {\beth_0} is {\omega} and for every non-negative integer {n}, {\beth_{n + 1}} is the cardinality of the power sets of sets of cardinality {\beth_n}. In other words, {\beth_0} is the cardinality of {\mathbb N} (the set of the non-negative integers), {\beth_1} is the cardinality of {\mathcal P(\mathbb N)}, {\beth_2} is the cardinality of {\mathcal P(\mathbb N)}, and so on. (In order to make sure that these cardinal numbers actually do increase in size, we have to check that our way of ordering the cardinal numbers has the property that if {S}, {T} and {U} are sets and {|S| < |T|}, and {|T| < |U|}, then {|S| < |U|} as well. However, the proof that this is the case is quite complicated, so I will omit it here.) Note that not every cardinal number is a member of this sequence. The cardinality of the set of the real numbers is in this sequence, however: it’s {\beth_1}. You could try and prove this yourself: all you need to show is that there is a one-to-one correspondance between the real numbers and the subsets of {\mathbb N}. Here’s a hint: a subset of {\mathbb N} can be thought of as a function associating each natural number with 0 (if this natural number is not in the subset) or 1 (if this natural number is in the subset), i.e. a sequence of 0s and 1s. Is there also a way to think of real numbers as sequences of 0s and 1s?

There is a lot more that I could say about cardinal numbers to help you get comfortable with thinking of them as numbers. There is no strict definition of what a ‘number’ is in mathematics: the decision of whether to call things numbers is a subjective one, influenced by the amount of properties those things have which are similar to properties of archetypal number systems like {\mathbb N}. To use Vi Hart’s phrase, numbers are things you do can do numbery things with. We have shown that there are infinitely many cardinal numbers and they can be arranged into an increasing sequence, and that’s pretty numbery, but there are lots of other numbery things you can do with cardinal numbers: for example, you can add and multiply them, and you can take the power of one cardinal number to another (although, that said, the operations on the cardinal numbers behave rather weirdly— I won’t go into it fully here, but, for example, for every pair of infinite cardinal numbers {\kappa} and {\mu}, {\kappa + \mu} and {\kappa \mu} are both the same and just equal to the larger of {\kappa} and {\mu}). So, now you know that it’s not quite true to say that infinity isn’t a number. OK, it isn’t one number, but every infinite quantity can be described by a cardinal number, and these cardinal numbers aren’t too different from ordinary numbers.

If you were wondering whether the theory of cardinal numbers has any practical application, well, the answer is ‘no’. The same can be said about a lot of mathematics, but applications of the cardinal numbers are actually quite scarce even within mathematics. Outside of set theory, mathematicians hardly ever need to concern themselves with sets whose cardinalities are greater than {\beth_1}, the cardinality of the set of the real numbers. So the only distinction between infinite cardinal numbers which is often relevant is that between {\beth_0} and {\beth_1}. Infinite sets of cardinality {\beth_1}, as well as finite sets, are called countable sets, while infinite sets of cardinality {\beth_1}, and all other infinite sets of greater cardinality than {\beth_0}, are called uncountable sets. There’s a famous analogy, which David Hilbert came up with, which sheds some light on why this is the case.

Any hotel in our universe only has finitely many rooms. So if the hotel is full, and another person comes along, they have to be turned down. Because, if the new person goes into Room 1, the person already in Room 1 has to move into Room 2, and then the person already in Room 2 has to move into Room 3, and so on, and since there are only finitely many rooms, eventually some positive integer {n} is reached such that there is no Room {n + 1}, and hence nowhere for the person in Room {n} to move to. So the hotel either has to refuse the new person, or kick someone out. In mathematical terms, what we’re saying here is that if you take a set of a finite cardinality {n} and add another member to it, the new set has greater cardinality.

But let’s suppose the hotel has {\beth_0} rooms. In that case the new person can be let in, even though the hotel is full, because there is no positive integer {n} at which there stops being a Room {n + 1}! The person in Room 1 can move to Room 2, the person in Room 2 can move to Room 3, and so on, leaving Room 1 free for the new person. In mathematical terms, if you add another member to a set of cardinality {\beth_0} (or any other infinite cardinality), the new set still has the same cardinality. And, by repeating this process, any finite number of new people can be let in. That’s why, if you add a finite cardinal number to an infinite cardinal number, the result is the original infinite cardinal number; it is not changed by the addition.

OK, what if there’s a countably infinite number of new people? Still not a problem, we just have to take a slightly different approach: move the person in Room 1 to Room 2, move the person in Room 2 to Room 4, move the person in Room 3 to Room 6, and so on, so that all the odd-numbered rooms are left free. Again, there’s nothing stopping us from doing this: for every positive integer {n}, there is a Room {2 n}. Then, we just move the first new person into Room 1, the second new person into Room 3, the third new person into Room 5, and so on; for every positive integer {n}, there is a Room {2 n + 1} free for the {n}th new person to move into. In mathematical terms, this reflects the fact that {\beth_0} times 2 is still just {\beth_0}. And, of course, we can repeat this any finite number of times, showing that {\beth_0} times every positive integer is still {\beth_0}, or, if you take any finite number of countably infinite sets and put all their members together into one set (this set is called the union of the original sets), this set is still countably infinite.

Well, how about if there are countably infinitely many lines of new people waiting outside the hotel, each of them containing a countably infinite number of people? This isn’t a problem, although some slightly more advanced mathematics is needed. We can use the fact that every positive integer is uniquely represented as a product of prime numbers, which means that in particular, for every pair of prime numbers {p} and {q}, no two powers of {p} and {q} are equal (for otherwise the common value of the two powers would be a number which can be represented as a product of prime numbers in two different ways). So for every positive integer {n}, we can move the person in Room {n} to Room {p_n} where {p_n} is the {n}th prime number. Then, for every positive integer {n}, we move the {n}th person in the first line into Room {p_n^2}, we move the {n}th person in the second line into Room {p_n^3}, and so on. This allows all the people to get a room, and, in fact, it even leaves all the rooms whose numbers are not powers of prime numbers (e.g. Room 6) free. This indicates that even unions of countably infinitely many countably infinite sets are still countably infinite. For example, the set of the rational numbers, i.e. fractions {a / b} where {a} and {b} are integers, is countably infinite, because there are countably infinitely many possible values of {b}, and hence the set can be split into countably infinitely many subsets, with the members of each subset sharing a common denominator. And each subset is countably infinite, because the only way two of its members can be different is if they have different numerators, and there are only countably infinitely many possible values of {a}.

The point of all this is that it’s a lot harder to make a set whose cardinality is infinite into a bigger set than it is if the cardinality is finite. So, in most areas of mathematics, you end up only needing to work with countably infinite sets, such as {\mathbb N}. It’s rare that you come across a set so large that it is uncountable, and even if you do, it is probably only of cardinality {\beth_1}. It’s not surprising, then, that mathematicians didn’t feel the need to develop a theory of cardinal numbers for a long time. But the cardinal numbers are really neat, which in mathematics is a completely acceptable reason to study something (indeed, in Hardy’s view, it is the principal reason why anyone studies mathematics); they are also interesting from a philosophical point of view as they shed some light on the meaning of infinity, or at least one particular conception of the meaning of infinity. And note that I have only given a very cursory introduction to them in this post; there are very interesting things about cardinal numbers which I have not even mentioned. For example, there is the question of whether there are any cardinal numbers between {\beth_0} and {\beth_1}, which turns out to be, in a certain sense, unanswerable.

Notes on perfect numbers

This is cross-posted from Tumblr. If you view it on Tumblr, the equations will look nicer.

[The first few paragraphs of this post provide some background, but, if you want, you can skip ahead to the discussion of perfect numbers; the only thing you need to know for that discussion is that for every positive integer n, σ(n) is the sum of the positive factors of n, and σ is multiplicative: for every pair of integers a and b which are coprime, i.e. the only positive factor they share is 1, σ(ab) = σ(a)σ(b). In this section, I use the concept of a multiset, which is like a set, but can contain multiple instances of objects, because it makes it easier to reason about prime factorisations. For every multiset S and every object x, the number of instances of x in S is called the multiplicity of x in S and denoted [xS]. A submultiset of S is a multiset T such that for every object x, [xT] ≤ [xS]; if T is a submultiset of S we write TS. For every pair of submultisets S and T, the sum of S and T, denoted S + T, is the multiset U such that for every object x, xU] = [xS] + [xT].]

From the prime factorisation of a positive integer, it is possible to determine its factors in general. Suppose n and d are positive integers and N and D are their respective prime factorisations. Then d &mid; n if and only if DN. This is because for every positive integer q, if we let Q be the prime factorisation of q, then n = q d if and only if N = P + Q. Therefore, the number of positive factors that n possesses, which is denoted τ(n), is the number of submultisets that N possesses, i.e.

\tau(n) = \prod_{p \text{ prime}} \left( [p \in P] + 1 \right).

It is easy to see from this formula that the function τ is multiplicative. Another consequence of the determination of the positive factors of n by the prime factorisation of n is that the sum of the positive factors of n, which is denoted σ(n), is given by the formula

\sigma(n) = \sum_{D \subseteq N} \prod_{p \text{ prime}} p^{[p \in D]}

which can be rewritten as

\sigma(n) = \prod_{p \text{ prime}} \sum_{\alpha = 0}^{[p \in N]} p^{\alpha},

which in turn can be rewritten, using the formula for the sum of a geometric series, as

\sigma(n) = \prod_{p \text{ prime}} \frac {p^{[p \in N] + 1} - 1} {p - 1}.

It is easy to see from this formula that σ is multiplicative. It is also easy to see, just from the definition of σ, that if n ≠ 1, then n + 1 ≤ σ(n), because 1 and n are both factors of n, and, furthemore, σ(n) = n + 1 if and only if n is prime. In investigations of the values that σ takes it is therefore useful to consider instead the sum of the proper positive factors of n, which is denoted s(n) and is equal to σ(n) − n, because the infimum of the value taken by s(n) is not dependent on n. Some numbers are invariant under s, e.g. 6 (because s(6) = 1 + 2 + 3 = 6) and 28 (because s(28) = 1 + 2 + 4 + 7 + 14 = 28). The invariants under s are called the perfect numbers. Equivalently, perfect numbers are positive integers n such that σ(n) = 2n.

Perfect numbers have been a topic of great interest since ancient times, despite their almost total lack of practical applications or theoretical significance. Despite their having been studied since ancient times, there are still very basic questions about perfect numbers which remain unanswered. For example, every known perfect number is even, but it is not known whether there are any odd perfect numbers. Even if there are no odd perfect numbers, it is still not known whether there are infinitely many even perfect numbers.

There is, however, a significant and non-obvious result which we can prove concerning the even perfect numbers. Every even positive integer n is of the form 2am where 1 ≤ a and m is odd, and, therefore, by the multiplicativity of σ, σ(n) = (2a + 1 − 1) σ(m), while 2n = 2a + 1m. Hence, n is perfect if and only if

(2^{a + 1} - 1) \sigma(m) = 2^{a + 1} m,

in which case, since 2a + 1 − 1 is odd, 2a + 1 − 1 is a factor of m. Therefore, there is an integer k such that m = k(2a + 1 − 1). So we can divide both sides by 2a + 1, yielding

\sigma(m) = 2^{a + 1} k.

Note also that, expanding the equation for m, we have m = 2a + 1kk and hence m + k = 2a + 1k, i.e. m + k = σ(m). Now, m and k are both positive factors of m, and they are distinct, because, since 1 ≤ a1, 1 < 2a + 1 − 1, and hence k < m. Since σ(m) is the sum of all the positive factors of m it follows that k and m are the only two positive factors of m. But 1 is also a positive factor of m, so one of these must be equal to 1. k is the smaller one, so it is k which is equal to 1. Hence m is prime, m = 2a + 1 − 1 and n = 2a(2a + 1 − 1). Therefore, the even perfect numbers are those of the form 2a(2a + 1 − 1), where 2a + 1 − 1 is prime. Indeed, for every number n of this form we have

\begin{array}{rcl}  \sigma(n) &=& \sigma \left( 2^a \left( 2^{a + 1} - 1 \right) \right) \\  &=& \sigma \left( 2^a \right) \sigma \left( 2^{a + 1} - 1 \right) \\  &=& \left( 2^{a + 1} - 1 \right) \left( 1 + \left( 2^{a + 1} - 1 \right) \right) \\  &=& \left( 2^{a + 1} - 1 \right) 2^{a + 1} \\  &=& 2 n,  \end{array}

so n is perfect.

Prime numbers of the form 2a + 1 − 1, where a is a non-negative integer, are called Mersenne prime numbers. Note that 2a + 1 − 1 = 1 + 2 + 4 + &cdots; + 2a, so another nice characterisation of the Mersenne prime numbers is that they are the prime numbers whose binary expansions contain only the digit 1. For every Mersenne prime number 2a + 1 − 1, if we let p = 2a + 1 − 1, rearranging yields 2a + 1 = p + 1, and hence another way of stating this characterisation of the even perfect numbers is that the even perfect numbers are those of the form p (p + 1)/2 where p is a Mersenne prime number. p (p + 1)/2 is the pth triangular number, so, equivalently, the even perfect numbers are those numbers n such that n points can be arranged into an equilateral triangle with a Mersenne prime number of points along each side.

Unfortunately, the Mersenne prime numbers are not much less mysterious than the perfect numbers. For example, it is not known whether there are infinitely many Mersenne prime numbers, and hence it is not known whether there are infinitely many perfect numbers. But there is one thing we can say about Mersenne prime numbers. For every Mersenne prime number 2a + 1 − 1, a + 1 is prime, because for every positive factor d of a + 1, there is a positive integer k such that a + 1 = kd, hence

\begin{array}{rcl}  2^{a + 1} - 1 &=& 2^{k d} - 1 \\  &=& \left( 2^d \right)^k - 1 \\  &=& \left( 2^d - 1 \right) \left( 1 + 2^d + \left( 2^d \right)^2 + \dotsb + \left( 2^d \right)^k \right),  \end{array}

so 2d − 1 is a positive factor of 2a + 1 − 1. Since 2a + 1 − 1 is prime, it follows that 2d − 1 is either 1 or 2a + 1 − 1. In the former case, we have 2d = 2, hence d = 1. In the latter case, we have 2d = 2a + 1, hence d = a + 1.

This result means that there are less computations we need to make when searching for the even perfect numbers. We only need to see which of the numbers of the form 2p − 1, where p is a prime number, are Mersenne prime numbers. If 2p − 1 is a Mersenne prime, then 2p − 1 (2p − 1) is perfect. Let’s use this strategy to find the first few even perfect numbers.

p 2p − 1 Is 2p − 1 prime? If so, what is 2p − 1 (2p − 1)?
2 4 − 1 = 3 Yes 2 ⋅ 3 = 6
3 8 − 1 = 7 Yes 4 ⋅ 7 = 28
5 32 − 1 = 31 Yes 16 ⋅ 31 = 496
7 128 − 1 = 127 Yes 64 ⋅ 127 = 8128
11 2048 − 1 = 2047 No (2047 = 23 ⋅ 89)
13 8192 − 1 = 8191 Yes 4096 ⋅ 8191 = 33550336

The first 5 even perfect numbers are, therefore, 6, 28, 496, 8128 and 33550336. Note that 211 − 1 is not prime, so checking that each of these numbers, is, in fact, prime is essential. In fact, the start of this list gives you a misleading impression that for most prime numbers p, 2p − 1 is Mersenne prime. But in fact this is increasingly less likely as p gets larger. So far, people have checked the primality of every number of the form 2p − 1 where 1 ≤ p ≤ 43112609, and found that only 47 of these 43112609 numbers are Mersenne prime. (A 48th Mersenne prime number is also known, 257885161 − 1, which is the largest known Mersenne prime number and in fact the largest known prime number in general, but not all of the numbers 2p − 1, where 43112609 < p < 57885161, have been checked for primality.)

As for the odd perfect numbers, lots of conditions have been found that every odd perfect number must satisfy; for example, every odd perfect number must have at least 1500 digits in its decimal expansion. J. J. Sylvester opined that it seemed unlikely that a number could exist satisfying all of these conditions; however, the people at OddPerfect.org, at least, aren’t convinced. I might do a post in future about the question of whether there are odd perfect numbers; at the moment, however, I don’t have any idea whether it is likely or not.

Fermat’s Christmas theorem

Since it was Christmas two days ago, I thought I’d write about a result which is sometimes called Fermat’s Christmas theorem, since Fermat first claimed to have proven in a letter dated to the 25th of December, 1640 (although, typically for Fermat, he didn’t actually give the proof, so he may not have had a correct proof). Recall that by Pythagoras’s theorem, the length {c} of the hypotenuse of a right-angled triangle is related to the lengths {a} and {b} of the other two sides by the equation

\displaystyle   c^2 = a^2 + b^2. \ \ \ \ \ (1)

Therefore, if {a} and {b} are integers, the square of {c} is also an integer. Interestingly, however, there are non-negative integers which cannot be the square of {c} if {a} and {b} are integers. For example, {c^2} cannot be 3, because if {a \ge 2} or {b \ge 2}, then {a^2 + b^2 \ge 2^2 = 4}, and if {a \le 1} and {b \le 1}, {a^2 + b^2 \le 1^2 + 1^2 = 2}. Let’s call the non-negative integers which {c^2} can be the ‘2-square’ integers, because, just as the square integers are the integers {n} such that if you have {n} blocks, you can arrange all the blocks into exactly one square, the 2-square integers are the integers {n} such that if you have {n} blocks, you can arrange all the blocks into exactly two squares. In other words, the 2-square integers are exactly those equal to equal to {a^2 + b^2} for some pair of integers {a} and {b}.

There is an obvious method for finding out whether a given integer {n} is 2-square. If {n} is negative, it is definitely not 2-square, because square integers are never negative and sums of non-negative integers are never negative. Otherwise, whether {n} is 2-square can be determined by simply testing each of the pairs of integers {a} and {b} such that {a \le \sqrt n} and {b \le \sqrt n}, of which there are only finitely many, and seeing whether {a^2 + b^2 = n}. If there is at least one pair such that this is the case, then {n} is 2-square, and if there are no pairs such that this is the case, then {n} cannot be 2-square because for any pair of integers {a} and {b} where at least one of {a} and {b} is greater than {\sqrt n}, at least one of {a^2} and {b^2} is greater than {n}, so {a^2 + b^2 > n}. However, if {n} is non-negative this method involves testing lots of different pairs of integers. It would be nice if there was a quicker method we could use in the case where {n} is non-negative.

When you want to find out whether a non-negative integer {n} is square, you can use a similar method, testing all the integers {a} such that {a \le n} and seeing whether {a^2 = n}, but there is a quicker method which can be used when you know the prime factorisation of {n}: {n} is square if and only if the multiplicity of every prime in the prime factorisation of {n} is even 1 (this method isn’t applicable when {n = 0} because only positive integers have prime factorisations, but it’s easy to see that 0 is square). Therefore, you might expect that if there is a quicker method for finding out whether {n} is 2-square, it will also involve the prime factorisation of {n}.

Let’s try just writing out the first 50 positive integers and identifying which of these are 2-square; maybe we will be able to see the pattern. We don’t need to worry about the negative integers and 0, because we already established that no negative integer is 2-square, and it is easy to see that 0 is 2-square (because it is equal to {0 + 0}). Since we already have a hunch that the prime factorisations will be involved, let’s write them as their prime factorisations. The bolded numbers are the 2-square numbers.

\displaystyle \begin{matrix} \mathbf{1} & \mathbf{2} & 3 & \mathbf{2^2} & \mathbf{5} & 2 \cdot 3 & 7 & \mathbf{2^3} & \mathbf{3^2} & \mathbf{2 \cdot 5} \\ 11 & 2^2 \cdot 3 & \mathbf{13} & 2 \cdot 7 & 3 \cdot 5 & \mathbf{2^4} & \mathbf{17} & \mathbf{2 \cdot 3^2} & 19 & \mathbf{2^2 \cdot 5} \\ 3 \cdot 7 & 2 \cdot 11 & 23 & 2^3 \cdot 3 & \mathbf{5^2} & \mathbf{2 \cdot 13} & 3^3 & 2^2 \cdot 7 & \mathbf{29} & 2 \cdot 3 \cdot 5 \\ 31 & \mathbf{2^5} & 3 \cdot 11 & \mathbf{2 \cdot 17} & 5 \cdot 7 & \mathbf{2^2 \cdot 3^2} & \mathbf{37} & 2 \cdot 19 & 39 & \mathbf{2^3 \cdot 5} \\ \mathbf{41} & 2 \cdot 3 \cdot 7 & 43 & 2^2 \cdot 11 & \mathbf{3^2 \cdot 5} & 2 \cdot 23 & 47 & 2^4 \cdot 3 & \mathbf{7^2} & \mathbf{2 \cdot 5^2} \end{matrix}

There isn’t a particularly obvious pattern here. We can see that every square integer is 2-square as well; this is because 0 is square and adding 0 to a square does not change it. Also, for every square integer {n}, {2 n} is 2-square, because {2 n = n + n}. Notice that 2 is 2-square, because it is equal to {1 + 1}. So this is actually a special case of a more general rule: for every 2-square number {m} and every square integer {n}, {m n} is 2-square as well, because there are integers {x}, {y} and {z} such that {m = x^2 + y^2} and {n = z^2}, so {m n = (x^2 + y^2) z^2 = x^2 z^2 + y^2 z^2 = (x z)^2 + (y z)^2}. You can see that this holds in the above list: {2^4} and {2^2 \cdot 3^2} are 2-square because {2^2} is 2-square, {2^2 \cdot 5} and {3^2 \cdot 5} are 2-square because 5 is 2-square, {2^5} is 2-square because {2^3} is 2-square, {2^2 \cdot 3^2} is 2-square because {3^2} is 2-square, and {2^3 \cdot 5} is 2-square because {2 \cdot 5} is 2-square. The natural question to ask at this point is: does the converse hold? That is, if {m} is not 2-square and {n} is square, is {m n} not 2-square either? Well, obviously if {n} is 0, {m n} will always be 0 which is 2-square, but does it hold if {n} is positive? Let’s see if we can find a counterexample in the list above. 3 is not 2-square, and neither are {2^2 \cdot 3}, {3^3} or {2^4 \cdot 3}. {2 \cdot 3} is not 2-square, and neither is {2^3 \cdot 3}. 7 is not 2-square, and neither is {2^2 \cdot 7}. 11 is not 2-square, and neither is {2^2 \cdot 11}. {2^2 \cdot 3} is not 2-square, and neither is {2^4 \cdot 3}. It certainly looks like the converse does hold. But it’s not obvious how this can be proven, since there’s not much you can do with a negative condition like ‘{m} is not 2-square’. For now, let’s just state it as a conjecture.

Conjecture 1. For every pair of integers {m} and {n} such that {m} is not 2-square and {n} is positive and square, {m n} is not 2-square.

If conjecture 1 does hold, then we can determine whether an integer {n} is 2-square by dividing it by one of its positive square factors and seeing whether the result is 2-square (if it is 2-square, then so is {n}, and if it is not 2-square, then neither is {n}). Therefore, we only need to be able to be able to determine whether the ‘square-free’ positive integers (those with no positive square factor other than 1) are 2-square. Let’s write out the list again, but omit all the non-square-free integers. Since the integers in the list are written as their prime factorisations, it is very easy to see which of them are square-free: the square-free integers are those whose prime factorisations do not contain multiple instances of any prime.

\displaystyle \begin{matrix} \mathbf{1} & \mathbf{2} & 3 & & \mathbf{5} & 2 \cdot 3 & 7 & & & \mathbf{2 \cdot 5} \\ 11 & & \mathbf{13} & 2 \cdot 7 & 3 \cdot 5 & & \mathbf{17} & & 19 & \\ 3 \cdot 7 & 2 \cdot 11 & 23 & & & \mathbf{2 \cdot 13} & & & \mathbf{29} & 2 \cdot 3 \cdot 5 \\ 31 & & 3 \cdot 11 & \mathbf{2 \cdot 17} & 5 \cdot 7 & & \mathbf{37} & 2 \cdot 19 & 39 & \\ \mathbf{41} & 2 \cdot 3 \cdot 7 & 43 & 2 \cdot 23 & & & 47 & & & \end{matrix}

We’re getting closer to seeing the pattern, but it’s still not very obvious. However, notice that {2 \cdot 5} is 2-square, and so are its prime factors, 2 and 5. {2 \cdot 13} is 2-square, and 13 is also 2-square. {2 \cdot 17} is 2-square, and 17 is also 2-square. On the other hand, 3 is not 2-square, and all the square-free numbers which include it as a prime factor ({2 \cdot 3}, {3 \cdot 5}, {3 \cdot 7}, {2 \cdot 3 \cdot 5}, {3 \cdot 11} and {2 \cdot 3 \cdot 7}) are not 2-square. 7 is not 2-square, and neither are {2 \cdot 7} and {5 \cdot 7}. 11 is not 2-square, and neither is {2 \cdot 11}. 19 is not 2-square, and neither is {2 \cdot 19}. 23 is not 2-square, and neither is {2 \cdot 23}. It looks like a square-free integer is 2-square if and only if all of its prime factors are 2-square.

Again, the ‘only if’ part will be difficult to prove since we are working with a negative condition. But let’s try and prove the ‘if’ part here, because in that case we do have something to work with. Now, whenever you want to prove something it’s always useful to try and identify exactly what you’re going to need to use in order to prove it. What we want to prove is that for every square-free positive integer {n} whose prime factors are all 2-square, then {n} is 2-square. But do we really need to use the fact that {n} is square-free? We don’t, actually. The 2-square primes in the list are 2, 5, 13, 17, 29, 37 and 41, and sure enough, {2^2}, {2^3}, {2^4}, {2^2 \cdot 5}, {5^2}, {2^5}, {2 \cdot 17}, {2^3 \cdot 5}, {3^2 \cdot 5} and {2 \cdot 5^2} are all 2-square, so it seems that even if {n} is not square-free, it is still the case that if all its prime factors are 2-square, {n} is 2-square as well. The fact that {n} is square-free only needs to be used in the ‘only if’ part. In fact, we don’t even need to use the fact that the factors are prime, because the statement ‘every positive integer {n} whose prime factors are all 2-square is 2-square’ is equivalent to the statement ‘for every pair of 2-square positive integers {a} and {b}, {a b} is 2-square’, assuming the ‘only if’ part holds. It’s obvious that the second statement implies the first, and to see that the first implies the second, just notice that if {a} and {b} are 2-square, so are all their prime factors (by the ‘only if’ part), and hence so are all the prime factors of {a b} (since each such factor is either a factor of {a} or a factor of {b}). So, it should be possible to prove that {a b} is 2-square as long as {a} and {b} are both 2-square; no additional information should be needed.

If {a} is 2-square, it can be written as {x^2 + y^2} where {x} and {y} are integers, and if {b} is 2-square, it can be written as {x^2 + z^2} where {w} and {z} are integers. The product {a b} is then {(x^2 + y^2) (u^2 + v^2) = x^2 u^2 + x^2 v^2 + y^2 u^2 + y^2 v^2 = (x u)^2 + (x v)^2 + (y u)^2 + (y v)^2}. Well, this is definitely a sum of four square integers. At this point it’s not totally obvious what to do, which is why it is useful to be sure that the proof can be carried out even if {a b} is not square-free or {a} and {b} are not both prime. You might just stumble across the method by which the proof is completed by trial and error, but a more illuminating way of completing the proof is to think, again, about what 2-square numbers are in geometric terms. They are the squares of hypotenuses of triangles whose other sides are of integral length. Equivalently, they are the squares of distances from the origin of points on the coordinate plane whose coordinates are integers. Equivalently, they are squares of norms of complex numbers whose real and imaginary parts are integers. Now, the norm operation on complex numbers respects multiplication, just as it does on real numbers: for every pair of complex numbers {a} and {b}, {|a b| = |a| |b|}. If {a = x + y i} and {b = u + v i}, then {a b = (x u - y v) + (x v - y u) i}, and hence the equation {|a b| = |a| |b|} is equivalent to the equation

\displaystyle   (x u + y v)^2 + (x v - y u)^2 = \left( x^2 + y^2 \right) \left( u^2 + v^2 \right), \ \ \ \ \ (2)

which shows how {(x^2 + y^2) (u^2 + v^2)} can be expressed as the sum of two square integers. And you can check, if you want, by expanding the brackets, that both sides of this equation are the same.

That takes care of the ‘if’ part; the ‘only if’ part is not yet proven, so let’s state it as our second conjecture.

Conjecture 2. For every square-free positive integer {n} which is 2-square, every prime factor of {n} is 2-square.

If conjecture 2 does hold, then we can determine whether a square-free positive integer {n} is 2-square by seeing whether its prime factors are 2-square (if they are all 2-square, then so is {n}, and if at least one is not 2-square, then neither is {n}). So let’s rewrite the list a final time, including only the prime numbers.

\displaystyle \begin{matrix} & \mathbf{2} & 3 & & \mathbf{5} & & 7 & & & \\ 11 & & \mathbf{13} & & & & \mathbf{17} & & 19 & \\ & & 23 & & & & & & \mathbf{29} & \\ 31 & & & & & & \mathbf{37} & & 39 & \\ \mathbf{41} & & 43 & & & & 47 & & & \end{matrix}

Do you see the pattern? Probably not at the moment, but there is one here. Recall that we can divide integers into two kinds, even integers and odd integers, according to their positions in the sequence of all integers, with the terms alternating between being even and being odd (for example, 0 is even, 1 is odd, 2 is even, 3 is odd, and so on). Now, apart from 2, all prime numbers are odd (because even integers are, by definition, divisible by 2). But we can do a similar thing with the sequence of all odd integers and divide all the odd integers into two kinds, which we can call ‘odd-even’ and ‘odd-odd’, according to their position in the sequence (for example, 1 is odd-even, 3 is odd-odd, 5 is odd-even, 7 is odd-odd, and so on). The more standard terms for ‘odd-even integers‘ and ‘odd-odd integers‘ are ‘integers congruent to 1 modulo 4‘ and ‘integers congruent to 3 modulo 4‘, respectively, because, just as even integers are those which are divisible by 2 and odd integers are those whose remainder upon division by 2 is 1, odd-even integers are those whose remainder upon division by 4 is 1 and odd-odd integers are those whose remainder upon division by 4 is 3, and in general we say that an integer {a} is congruent to an integer {b} modulo a positive integer {m} (and write {a \equiv b} (mod {m}) if and only if the remainder of {a} upon division by {m} is {b}. However, using this non-standard terminology makes it easy to see that this is quite a natural way to divide the prime numbers (other than 2) into two classes. Now, if you have a look at the list above, you’ll see that the 2-square integers in this list are exactly those which are odd-even (as well as 2), and the non-2-square integers in this list are exactly those which are odd-odd. It’s a lot easier to see this pattern if we make each line contain 8 numbers rather than 10.

\displaystyle \begin{matrix} & \mathbf{2} & 3 & & \mathbf{5} & & 7 & \\ & & 11 & & \mathbf{13} & & & \\ \mathbf{17} & & 19 & & & & 23 & \\ & & & & \mathbf{29} & & 31 & \\ & & & & \mathbf{37} & & 39 & \\ \mathbf{41} & & 43 & & & & 47 & \\ \end{matrix}

It’s actually very easy to prove that the odd-odd integers in this list are not 2-square, if you are familiar with how congruences work. In fact, no odd-odd integer, whether prime or not, is 2-square. I’ll go through this proof in more detail than it really needs just in case you aren’t very familiar with congruences. First note that there are only 4 possible remainders upon division by 4, 0, 1, 2 and 3, so every integer is congruent to one of these integers modulo 4. Now, integers which are congruent can be regarded as the same up to a certain point. In particular, if {a \equiv b} (mod {m}) and {c \equiv d} (mod {m}) (where {a}, {b}, {c} and {d} are integers and {m} is a positive integer), then {a + c \equiv b + d} (mod {m}) and {a c \equiv b d} (mod {m}). Therefore, for every integer {n}, since one of {n \equiv 0}, {n \equiv 1}, {n \equiv 2} and {n \equiv 3} (mod 4) holds, one of {n^2 \equiv 0}, {n^2 \equiv 1}, {n^2 \equiv 4} and {n^2 \equiv 9} (mod 4) holds. But {4 \equiv 0} (mod 4) and {9 \equiv 1} (mod 4), and congruence is transitive (if {a \equiv b} (mod {m}) and {b \equiv c} (mod {m}), then {a \equiv c} (mod {m})), so in fact {n^2} is always congruent to one of 0 and 1 modulo 4. As a result, every 2-square integer, since it can be written in the form {x^2 + y^2}, can only be congruent to one of 0 ({0 + 0}), 1 ({0 + 1}) or 2 ({1 + 1}) modulo 4, never 3.

Proving that every odd-even prime number is 2-square is much more difficult. In fact, it was probably the chief obstacle Fermat faced in making his proof. However, Fermat did come up with a method, known as Fermat’s method of descent, which, once applied, makes the proof fairly straightforward (in fact, for this reason, we can be pretty sure Fermat really did prove the theorem, unlike some). The idea of Fermat’s method of descent, as applied to this problem in particular, is to prove two things, which together, imply that every odd-even prime number is 2-square.

  1. For every odd-even prime number {p}, there is a multiple of {p} which is 2-square.
  2. For every pair of integers {n} and {k} such that {k n} is 2-square and {1 < k}, there is an integer {k'} such that {0 < k' < k} and {k' n} is 2-square.

This works because for each odd-even prime number {p}, we can find an integer {k} such that {k p} is 2-square, by (1), and then we can find a smaller {k} as many times as needed, using (2), until we get to {k = 1}. It is actually harder to prove (1) than (2), so we will begin by proving (2). Suppose {n} and {k} are integers, {k n} is 2-square and {1 < k}. Then {k n = x^2 + y^2}, where {x} and {y} are integers. We want to find integers {k'}, {x'} and {y'} such that {k n' = (x')^2 + (y')^2} and {0 < k' < k}. Due to equation 2, another 2-square integer which is a multiple of {n} can be found by multiplying {x^2 + y^2} by another 2-square integer. If {u} and {v} are integers, then

\displaystyle \left( x^2 + y^2 \right) \left( u^2 + v^2 \right) = (x u + y v)^2 + (x v - y u)^2

Of course, if {u^2} and {v^2} are not both 0, {k \le k (u^2 + v^2)}, and if {u^2 + v^2} are both 0, {k (u^2 + v^2) = 0}, so either way we haven’t found a new integer {k'} such that {0 < k' < k}. But if we choose {u} and {v} so that {u^2 + v^2} is divisible by {k}, then {(x^2 + y^2) (u^2 + v^2)} will be divisible by {k^2}. Therefore, if we can also get {x u + y v} and {x v - y u} to be divisible by {k}, we can divide both sides of the above equation by {k^2} to get

\displaystyle n \frac {u^2 + v^2} k = \left( \frac {x u + y v} k \right)^2 + \left( \frac {x v - y u} k \right)^2,

so, if we let {k' = (u^2 + v^2)/k}, we have a sum of two squares which is equal to {k' n}. And hopefully, if we choose {u} and {v} carefully, we can make it so that {0 < k' < k}. That inequality can also be written as {0 < (u^2 + v^2)/k < k}, i.e. {0 < u^2 + v^2 < k^2} (since {1 < k}). We can therefore ensure that the inequality holds by making sure that both {u^2} and {v^2} are less than {k^2/2}. But we can actually do better than that and make sure that both {u^2} and {v^2} are less than {k^2/4}, by choosing values of {u} and {v} such that {-k/2 < u \le k/2} and {-k/2 < v \le k/2}. Since there are exactly {k} values in this range, we can find values in this range which are congruent to every integer modulo {k}. In particular, we can choose {u} and {v} so that they are in this range and {u} and {v} are congruent to {x} and {y}, respectively, modulo {k}. Then {u^2 + v^2} and {x u + y v} are congruent to {x^2 + y^2} modulo {k}, so, since {x^2 + y^2} is divisible by {k}, both {u^2 + v^2} and {x u + y v} are divisible by {k}. And {x v - y u} is congruent to {x y - y x = 0} modulo {k}, so, since 0 is divisible by {k}, so is {x v - y u}. And that completes the proof of (2). Note that this process allows us to find an actual pair of integers such that {n} is the sum of their squares, if we have a sum of squares which is a multiple of {n} to start of the process.

So how do we prove (1)? Well, it really is a lot harder to prove than (2). In fact, I’ve just done a course in number theory at the University of York, and we began the course by looking at the same problem I’m talking about in this post. All the work on the problem I’ve done so far in this post was covered in the first and last lectures. Pretty much the entire remainder of the course was about how to prove (1). That said, it could have been done more briefly, it’s just that (1) turns out to be a special case of a more general problem, which it is a natural to investigate at the same time. Recall what (1) states: it states that for every odd-even prime number {p}, there are integers {x} and {y} such that {x^2 + y^2} is divisible by {p}. That means that {x^2 + y^2 \equiv 0} (mod {p}), so {x^2 \equiv -y^2} (mod {p}). Now, integers which are congruent to a square integer modulo {p} are called quadratic residues modulo {p}. So what we’re looking for is a pair of quadratic residues modulo {p} which are each other’s negations. The natural thing to do, then, is to investigate which integers are quadratic residues modulo {p}. This is a large topic, so that’s why it took most of the course to complete the proof. I’m not going to go into the full proof here, in order to prevent this post from becoming ridiculously long. But it turns out that {-1} is a quadratic residue modulo {p} if and only if {p \equiv 1} (mod 4). Therefore, if {p} is odd-even, there is an integer {x} such that {x^2 \equiv -1} (mod {p}), i.e. {x^2 + 1 \equiv 0} (mod {p}), i.e. {x^2 + 1} is a multiple of {p}. And that completes the proof of (1).

Now that we know which primes are 2-square, all that is left to do is to prove conjectures 1 and 2. Before we go ahead with that, we can state Fermat’s theorem on when a positive integer in general is 2-square.

Theorem 1 A positive integer {n} is 2-square if and only if every prime number whose multiplicity in the prime factorisation of {n} is odd is odd-even.

Proof: Suppose {n} is a positive integer. For every prime number {p}, let {\alpha(p)} denote the multiplicity of {p} in the prime factorisation of {n}. Let {s} be the product of all the prime numbers {p} such that {\alpha(p)} is odd. Then {s} is square-free (because its prime factorisation does not contain multiple instances of any prime). Furthermore, {n / s} is square (because the multiplicity of every prime number {p} in its prime factorisation is {\alpha(p)} if {\alpha(p)} is even and {\alpha(p) - 1} if {\alpha(p)} is odd, which is in either case even). Therefore, if {n} is 2-square, {s} is 2-square by conjecture 2, and hence every prime factor of {s} is 2-square, i.e. odd-even, by conjecture 1. On the other hand, if every prime factor of {s} is odd-even, i.e. 2-square, then {s}, as the product of all these prime numbers, is 2-square, and hence {n} is 2-square since {n / s} is square. \Box

We will prove both conjectures at once by showing that if {n} is a 2-square positive integer, then every prime number whose multiplicity in the prime factorisation of {n} is odd is odd-even. The proof uses, again, the fact that {-1} is a quadratic residue modulo {p} if and only if {p \equiv 1} (mod {p}).

Proof: Suppose {n} is a 2-square positive integer, so that {n = x^2 + y^2} where {x} and {y} are integers. Let {d} be the greatest common divisor of {x} and {y}, so there are coprime integers {x'} and {y'} such that {x = d x'} and {y = d y'}. Then

\displaystyle n = (d x')^2 + (d y')^2 = d^2 \left( (x')^2 + (y')^2 \right).

Suppose {p} is a prime number and the multiplicity {\alpha} of {p} in the prime factorisation of {n} is odd. This multiplicity is the sum of its multiplicities in the prime factorisations of {d^2} and {(x')^2 + (y')^2}, respectively. Since {d^2} is square, the former multiplicity is even; therefore, since {\alpha} is odd, the latter multiplicity must be odd, from which we can conclude that it is not 0. In other words, {(x')^2 + (y')^2} is divisible by {p}. That means {(x')^2 + (y')^2 \equiv 0} (mod {p}), so {(x')^2 \equiv -(y')^2} (mod {p}). Now, we can show that neither {x'} nor {y'} is divisible by {p}. First of all, since {x'} and {y'} are coprime, at least one of them must not be divisible by {p}. If {p \mid x'}, then {p \mid (x')^2}, so {(x')^2 \equiv 0} (mod {p}), hence {(y')^2 \equiv 0} (mod {p}) which means {y' \equiv 0} (mod {p}), so {p \mid y'}. If {p \mid y'}, then {p \mid (y')^2}, so {(y')^2 \equiv 0} (mod {p}), hence {(x')^2 \equiv 0} (mod {p}) which means {x' \equiv 0} (mod {p}), so {p \mid x'}. So if one of them is divisible by {p}, the other one is as well, and it follows that neither can be divisible by {p}. What can we do with this information? Well, one of the basic results in number theory states that if an integer {a} is not divisible by a prime {p}, there is another integer {b} such that {a b \equiv 1} (mod {p}), just like how {a (1 / a) = 1}. So there are integers {u} and {v} such that {x' u \equiv 1} (mod {p}) and {y' v \equiv 1} (mod {p}). So we can take the congruence {(x')^2 \equiv -(y')^2} and multiply both sides by {v^2}, giving {(x')^2 v^2 \equiv -(y')^2 v^2} (mod {p}), i.e. {(x' v)^2 \equiv -(y' v)^2} (mod {p}), i.e. {(x' v)^2 \equiv -1} (mod {p}). Therefore, {-1} is a quadratic residue modulo {p}, and that means {p \equiv 1} (mod 4), so {p} is 2-square. \Box

So, we have, at last, found the solution to the problem. The 2-square integers are those with prime factorisations in which every prime number with even multiplicity is odd-even (i.e. congruent to 1 modulo 4).

I’ll leave you with this to think about. We can generalise the idea of a 2-square integer and define an integer to be ‘{k}-square’ (where {k} is a positive integer) if and only if it can be expressed as a sum of exactly {k} squares. Now that we know which integers are 1-square and 2-square, it’s natural to wonder which integers are 3-square, 4-square, 5-square and so on. If this post has interested you (and it probably has if you’ve got this far), I encourage you to try and investigate which integers are 4-square yourself using the same sort of method I used in this post. You’ll find that the pattern for 4-square integers is pretty easy to spot.

Footnotes

  1. ^ The multiplicity of a prime number {p} in the prime factorisation of a positive integer {n} is defined as the greatest integer {\alpha} such that {n} is divisible by {p^{\alpha}}. If you think of the prime factorisation of {n} as a bag (a set which can contain multiple elements) of prime numbers, the multiplicity of each prime number {p} in the prime factorisation of {n} can be thought of the number of instances of {p} which the bag contains. If we write {n} as its prime factorisation {p_1^{\alpha_1} p_2^{\alpha_2} \dotsm p_k^{\alpha_k}}, then for every positive integer {i} such that {i \le k}, {\alpha_i} is the multiplicity of {p_i} in the prime factorisation of {n}.

Set isomorphisms

There doesn’t seem to be any standard adjective to describe the property of a pair of sets having the same cardinality. Some authors use ‘equivalent’, but this adjective could be used for any equivalence relation; it might be useful to reserve it for the equivalence relation which is currently most prominent in the context. Other use ‘equinumerous’, and I guess that’s fine. There’s a natural counterpart of this adjective for the property of one set having cardinality less than or equal to the other, too, viz. ‘subnumerous’, although I don’t know if anyone actually uses this term. Anyway, if I was going to write a textbook on set theory, I think I would go for the adjective ‘isomorphic’. Consider the definition of a structure in abstract algebra, which can be stated as follows.

Definition 1. Structures are ordered tuples of the form (A, R_0, R_1, \dotsc, R_{n - 1}), where A is a set called the underlying set of the structure, and for every natural number i such that i < n, there is a natural number k, called the arity of R_i, such that R_i is a subset of the Cartesian product A^k. R_i is called the ith relation of the structure.

A set can be seen as a degenerate structure with no relations. Now, consider the definition of a homomorphism:

Definition 2. For every pair of structures (A, R_0, R_1, \dotsc, R_{n - 1}) and (B, S_0, S_1, \dotsc, S_{n - 1}) with the same number of relations and with the property that for every natural number i such that i < n, R_i and S_i have the same arity, homomorphisms are mappings f from A to B such that for every natural number i such that i < n and every k-tuple of members a_0, a_1, … and a_{k - 1} of A (where k is the common arity of R_i and S_i), (a_1, a_2, \dotsc, a_{k - 1}) \in R_i if and only if (f(a_1), f(a_2), \dotsc, f_(a_{k - 1})) \in S_i.

In the trivial case where the two structures have no relations, a homomorphism between the structures is simply a function between their underlying sets. Accordingly, an isomorphism between the structures is simply a bijection between their underlying sets. Since we say that two structures are isomorphic if and only if there is an isomorphism between them, two sets are isomorphic if and only if there is a bijection between them, i.e. they have the same cardinality. That’s the reasoning behind using the adjective ‘isomorphic’. Since injective homomorphisms are sometimes called monomorphisms, and surjective homomorphisms are sometimes called epimorphisms, for every pair of sets A and B, we can also say that A is monomorphic to B if and only if |A| \le |B| and A is epimorphic to B if and only if |B| \le |A|. The only problem I can see with this terminology is that since structures are often referred to by their underlying set when it is clear from context which relations should be taken as part of the structure; for example, people might speak of the ring \mathbb Z of integers and the ring \mathbb Q of rational numbers. So the statement that \mathbb Z and \mathbb Q are isomorphic might be taken to mean the rings \mathbb Z and \mathbb Q are isomorphic, which is false. Of course, this problem can be resolved just by saying ‘the sets \mathbb Z and \mathbb Q are isomorphic’ or ‘\mathbb Z and \mathbb Q are isomorphic as sets’.

0.999…, 0.000…1 and other numbers with infinitely many digits

The `number’ 0.000…1 sometimes comes up in the discussion of the infamous question of whether 0.999… = 1 (in case you didn’t know, 0.999… is equal to 11). For every natural number {n}, {1 - 0.99\dotso9} (where there are {n} 9s in the subtracted number) is 0.00…1 (in which there are {n} 0s, including the one before the decimal point), so naturally, people generalise the pattern to the case where there are infinitely many {n}s (although you can’t always generalise in this way), and conclude that {1 - 0.999\dotso} (where there are infinitely many 9s in the second number) is 0.000…1 (in which there are infinitely many 0s), which seems to be a positive number, not 0 as it should be if {0.999\dotso = 1}.

Even though it’s wrong, this argument is quite an intelligent one. Its problem is that it relies on a lack of understanding of exactly how decimal notation works, which is understandable, given that the subtleties of decimal notation are rarely given a proper explanation in maths education. It’s easy to understand how decimal notation works when there are only finitely many digits, and I think most people understand it on some level (though they may not be able to articulate it in the way I’m about to articulate it). Given a string of digits with a decimal point in between one pair of consecutive digits, one assigns an index to each digit {d} by looking at its position relative to the decimal point. If {d} is in front of the decimal point, the index is the number of digits between {d} and the decimal point, with {d} not being counted among these digits, and if {d} is behind the decimal point, the index is the negation of the number of digits between {d} and the decimal point, with {d} being counted among these digits. In other words, the digit in front of the decimal point has index 0, the digit behind it has index {-1}, and the other indices continue the sequence. For example, in 17.25, the digit 1 has index 1, the digit 7 has index 0, the digit 2 has index {-1} and the digit 5 has index {-2}. Then, one calculates the place value associated with {d} as {d \times 10^n}, where {n} is the index of {d}. The number denoted by the string of digits is simply the sum of all the place values. For example, 17.25 denotes the number {1 \times 10^1 + 7 \times 10^0 + 2 \times 10^{-1} + 5 \times 10^{-2}}. The number denoted is always non-negative, but we can denote its negation simply by putting a minus sign in front.

Decimal notation with finitely many digits is not very powerful. Every integer can be denoted in this way, but not every rational number (rational numbers are numbers that can be expressed as fractions whose numerators and denominators are both integers, in case you didn’t know). For example, {\frac 1 3} cannot be denoted in this way. In fact, the only rational numbers which can be denoted in this way are rational numbers whose denominator is not divisible by any prime number other than 2 or 5 (the prime factors of 10). So, in order to denote every rational number, you need infinitely many digits.

It’s in this case that the subtleties arise. Going from finite to infinite always brings a lot of complications with it. First of all, in order to be able to assign indices to each digit in the same way, we’re going to have to require that, even though there are infinitely many digits, there are finitely many digits between each digit and the decimal point. After all, if there are infinitely many digits between some digit {d} and the decimal point, then the number of digits between {d} and the decimal point does not exist, since infinity isn’t a number. Therefore, we can see a problem with the argument above right away. There are infinitely many digits between the digit 1 in 0.000…1 and the decimal point, so it is impossible to assign an index to this digit, and for this reason, 0.000…1 is not an example of well-formed decimal notation. There are ways to define alternative forms of decimal notation in which 0.000…1 is well-formed (and usually denotes some kind of infinitesimal number, i.e. one which is positive but smaller than every positive rational number), but I’m not going to go into them here. Certainly, none of these ways can be considered standard: if you just start talking about 0.000…1 to a mathematician without defining it first, they will ask you to clarify.

If there are finitely many digits between each digit and the decimal point, we can associate each of the infinitely many digits with a place value in the same way as before. So then the number denoted by the string of digits is just the sum of all these place values, right? Well, not quite. Remember, we have infinitely many digits, hence infinitely many place values, so this will be a sum of infinitely many terms! The ordinary definition of a sum is that it is the value obtained by letting the value be 0 at first, and then adding each term in the sum to the value, one at a time, until every term has been added. But if there are infinitely many terms, then no matter how long this process continues, terms will remain which still have not been added. So it doesn’t make sense to speak of sums of infinitely many terms, if we use this definition.

But just because this definition doesn’t work, that doesn’t mean we can’t define sums of infinitely many terms in some other way. For example, it certainly seems sensible to say that the infinite sum {0 + 0 + 0 + \dotsb} is 0, because the sum of any finite number of 0s is 0, so we can generalise this to the case where there are infinitely many 0s. In general, if only finitely many terms in an infinite sum are not equal to 0, we can just disregard the 0s, since they should have no effect on the value of the sum, and say that the infinite sum is equal to the finite sum consisting of its terms which are not equal to 0. For example, {1 + 2 + 3 + 0 + 0 + 0 + \dotsb = 1 + 2 + 3 = 6}.

These aren’t very interesting examples of infinite sums, since they are basically finite sums in disguise. It is possible to assign values to more interesting infinite sums. To see how this is done, first let me note that one way of stating the reasoning why an infinite sum of the form {x_1 + x_2 + \dotsb + x_n + 0 + 0 + 0 + \dotsb} is equal to the finite sum {x_1 + x_2 + \dotsb + x_n} is this: if we start with the value of 0, and add each term of the infinite sum in order, then, after we add {x_n}, the remaining additions do not change the value at all (since we are just adding 0). Therefore, the value stays at {x_1 + x_2 + \dotsb + x_n}, even if infinitely many additional additions are made, because these additional additions have no effect.

Now, if there are infinitely many terms which are not 0 it is impossible for the value of the incomplete sum to stay equal to a particular number after a finite number of terms are added. But it may be possible for the value to stay within a certain distance of a particular number after a finite number of terms are added, and if it does, then we can conclude that the distance of the sum of all the terms from that particular number must be less than or equal to this distance (by generalising to the infinite case). Remember, the distance between two numbers {x} and {y} is either {x - y} or {y - x}, whichever one is positive (or, if {x - y} and {y - x} are both 0, so that neither is positive, the distance is 0).

An example will help to make this clear. Consider the number denoted by 0.999\textellipsis: {\frac 9 {10} + \frac 9 {100} + \frac 9 {1000} + \dotsb}. For every positive integer {n}, the sum of the first {n} terms in this sum is {\frac 9 {10} + \frac 9 {100} + \dotsb + \frac 9 {10^n}}. What is the distance of this finite sum from 1? The sum is clearly less than 1, because it can be written (using the rule for addition of fractions) as {\frac {10^n - 1} {10^n}}, i.e. {1 - \frac 1 {10^n}}. So the distance is the difference {1 - (1 - \frac 1 {10^n})}, which is just {\frac 1 {10^n}}. Now, for every positive integer {n'} such that {n < n'}, it can be seen in the same way that the distance of the sum of the first {n'} terms from 1 is {\frac 1 {10^{n'}}}, which is less than {\frac 1 {10^n}} since {n < n'}. That is, no matter how many terms are added to the incomplete sum after the {n}th term is added, the distance stays less than {\frac 1 {10^n}}. Therefore, we can conclude that the distance of 0.999… from 1 is less than {\frac 1 {10^n}} for every positive integer {n}. In fact, it is less than every positive rational number, since every positive rational number can be expressed as a fraction {\frac a b}, where {a} and {b} are positive integers, and {\frac 1 {10^b} < \frac a b} (since {b < 10^b}).

Another example is {\frac 1 2 + \frac 1 4 + \frac 1 8 + \dotsb}, the number denoted by 0.111… in binary. By similar reasoning as above, it can be shown that its distance from 1 is less than every positive rational number. Or, you can just look at this neat diagram.

Take a square, divide it in half, divide one of the halves in half (producing two quarters of the original square), divide one of these quarters into half too, and so on, and we see that a whole unit square can be filled by squares and rectangles whose areas are reciprocals of powers of 2 greater than 1.

In cases like this, where we can show that for some rational number {L}, the distance of the infinite sum from {L} is less than {\varepsilon} for every positive rational number {\varepsilon}, it seems natural to assign the infinite sum the value of {L}, because, if the value is a rational number, its distance from {L} must also a rational number, and this distance cannot be positive, otherwise it would have to be less than itself. In other words, {L} is the only rational number which the sum could be, and therefore, {L} is a very natural value to assign to the sum. In the case of 0.999…, {L} is 1, and that’s why we assign 0.999… the value 1. Of course, we could assign values to infinite sums in a different way, but this is the standard way we do it in mathematics. I’m not familiar with any other way of assigning values to infinite sums that is more natural and convenient than this one.

There are a few more remarks I’d like to say about decimal notation. First of all, there are strings of digits such that the sum of the place values cannot be assigned a rational number as a value in this way, since there is provably no rational number {L} such that for every positive rational number {\varepsilon}, the distance of the incomplete sum from {L} stays less than {\varepsilon} once a certain finite number of terms have been added. In fact, the only strings which can be assigned a rational number as a value in this way are those such that the digit at every index greater than some integer {n} is 0 and the other digits consist of a finite sequence repeated indefinitely (for example, 0.333… (\frac 1 3 ) or 0.142857142857142857… (\frac 1 7 )). However, as long as the digit at every index greater than some integer {n} is 0, a similar but slightly weaker condition is satisfied: for every positive rational number {\varepsilon}, once a certain finite number of terms have been added, if one picks any pair of incomplete sums which can be formed by adding on more terms, the distance between these two sums is less than {\varepsilon}. It’s as if the incomplete sums are approaching a number, but that number is not rational. We can therefore define a whole set of new numbers which are called the irrational numbers, with each irrational number being identified with one of these infinite sums which cannot be defined as a rational number. The numbers in the class that is obtained by adding these irrational numbers to the rational numbers are called real numbers. Although I won’t go into the details, it is possible to define how to add and multiply real numbers, by referring to the sequences that define them, in such a way that all the expected properties of addition and multiplication hold, so real numbers do deserve to be called numbers.

Footnotes

  1. ^ But some people just won’t accept it…

Definitions of finiteness

Perhaps the most intuitive definition of finiteness as a property of sets is that a set {A} is finite if and only if its cardinality is a natural number. Unfortunately, the most intuitive definition of the natural numbers, at least to me, is that the natural numbers are the cardinalities of finite sets. Therefore, in order to define both of these notions properly, one of these definitions must be altered. I think the most natural one to alter is the definition of finiteness, since, at least to me, finiteness seems like a more fundamental notion than the notion of the natural numbers.

The alternative definition which I’m most satisfied with is Kuratowski’s, given below. The essential idea behind it is that finite sets are sets that can be formed from the empty set by making a finite number of adjoinments (i.e. additions of a single object). The formal statement may seem a little peculiar if you haven’t seen this kind of definition before.

Definition 1 For every set {A}, {A} is Kuratowski finite if and only if it is a member of every family {\mathcal F} of sets such that {\emptyset \in \mathcal F} and for every member {B} of {\mathcal F} and every object {x}, {B \cup \{x\} \in \mathcal F}.

The key thing to notice here is that it immediately follows from the definition that a variant of mathematical induction can be performed on Kuratowski finite sets. Suppose you wish to prove that every Kuratowski finite set has a certain property. Well, if you let {\mathcal F} be the family of all sets with the property, you can see from the definition that one way to do this is to show that {\emptyset} has the property (i.e. {\emptyset \in \mathcal F}) and that if you adjoin an object to a set with the property, it will still have the property (i.e. for every member {B} of {\mathcal F} and every object {x}, {B \cup \{x\} \in \mathcal F}). Definitions similar to the one above turn up in quite a few places in mathematics, and they are always associated with a principle of induction. For example, consider the definition of the subgroup generated by a subset {H} of a group {G}. The intuitive definition of this subgroup is that it is the set of all objects of the form {a_1 a_2 \dotsm a_n}, where {n} is a natural number and {a_1}, {a_2}, … and {a_n} are members of {H} or inverses of members of {H}. However, this definition can be awkward to work with and it is complex to state as a logical formula (it involves a quantification over all finite sets). Alternatively, one can define this subgroup as the intersection of all subgroups {H'} of {G} such that {H \subseteq H'}. The corresponding principle of induction is that if every member of {H} has a certain property, and the set of all members of the group with the property is a subgroup of {G}, then every member of the subgroup generated by {H} has the property too.

I should note that this definition is not possible in ZF: {\mathcal F} must be allowed to range over proper classes as well as sets, because for every possible value of {\mathcal F}, we have that for every object {x}, {\emptyset \cup \{x\} = \{x\} \in \mathcal F}, so {\mathcal F} is a superclass of the class of all singletons, which is proper by the axiom of replacement, since {x \in V \mapsto \{x\}} is an injection. Quantifiers can only range over sets in ZF. However, a minor amendment makes the definition possible in ZF: we can just require {x} to be a member of {A}. It can be proven that this amended definition (definition 2) is equivalent to the definition as stated above (definition 1).

Proof: Suppose {A} is a finite set by definition 2 and {\mathcal F} is a family of sets such that {\emptyset \in \mathcal F} and for every member {B} of {\mathcal F} and every object {x}, {B \cup \{x\} \in \mathcal F}. Every member {x} of {A} is an object, so for every member {B} of {\mathcal F}, {B \cup \{x\} \in \mathcal F}, which means {A} is a member of {\mathcal F} (by definition 2). Therefore, {A} is finite by definition 1 as well.

As for the converse, well, {\emptyset} is clearly finite by definition 2. Now, suppose {A} is a finite set by definition 2, {x} is an object and {\mathcal F} is a family of sets such that {\emptyset \in \mathcal F} and for every member {B} of {\mathcal F} and every member {y} of {A \cup \{x\}}, {B \cup \{y\} \in \mathcal F}. Since {A} is finite by definition 2, {A \in \mathcal F}, and since {x} is a member of {A \cup \{x\}}, that means {A \cup \{x\} \in \mathcal F}. Therefore, {A \cup \{x\}} is finite by definition 2 as well. It follows by the principle of induction on finite sets by definition 1 that every set which is finite by definition 1 is finite by definition 2 as well. \Box

The reason I’ve chosen definition 1 is that it is easier to see the idea behind the definition.

Of course, we must ensure that Kuratowski finiteness is actually equivalent to ordinary finiteness. Since we can induct over natural numbers and we can induct over Kuratowski finite sets, the proof is straightforward.

Proof: {|\emptyset| = 0 \in \mathbb N}. Suppose {A} is a finite set, {|A| = n} and {x} is an object. What is {|A \cup \{x\}|}? Well, there is a bijection {f} from {A} to {\{0, 1, \dotsc, n - 1\}}, and it is easy to see that the extension {g} of {f} to {A \cup \{x\}} such that {g(x) = n} is a bijection onto {\{0, 1, \dotsc, n\}}. Therefore, {|A \cup \{x\}|} is {n + 1}. This completes the proof that the cardinality of every Kuratowski finite set is a natural number, i.e., every Kuratowski finite set is finite.

The only set whose cardinality is 0 is {\emptyset}, which is Kuratowski finite. Suppose {n} is a natural number, every set whose cardinality is {n} is Kuratowski finite, {A} is a set and {|A| = n + 1}. Then there is a bijection {f} from {A} to {\{0, 1, \dotsc, n\}}. Let {x} be {f^{-1}(n)}. It is easy to see that the restriction {g} of {f} to {A \setminus \{x\}} is a bijection onto {\{0, 1, \dotsc, n - 1\}}, so {|A \setminus \{x\}| = n}. Therefore, {A \setminus \{x\}} is Kuratowski finite. It follows that {(A \setminus \{x\}) \cup \{x\} = A} is Kuratowski finite as well. This completes the proof that every finite set is Kuratowski finite. \Box

The other commonly-seen definition of finiteness is due to Dedekind. The main advantage of this definition is that it is somewhat simpler to state than Kuratowski’s.

Definition 2 For every set {A}, {A} is Dedekind finite if and only if for every proper subset {B} of {A}, {|B| < |A|}.

It is probably possible to prove that every finite set is Dedekind finite by induction, but an easer way to prove this is to prove the contrapositive, that every Dedekind infinite set is infinite. This proof uses the fact that for every set {A}, {A} is infinite if {\aleph_0 \le |A|} ({\aleph_0} is the cardinality of {\mathbb N}), since {n < \aleph_0} for every natural number {n}, so {|A| \ne n}.

Proof: Suppose {A} is a Dedekind infinite set, so there is a proper subset {B} of {A} such that {|A| = |B|}. Since {B \subset A}, there is a member {a} of {B \setminus A}; since {|A| = |B|}, there is a bijection {f} from {A} to {B}. Consider the infinite sequence {a}, {f(a)}, {f^2(a)}, … We can show by induction that this sequence is an injection, so {\aleph_0 \le |A|}, and therefore, since {\mathbb N} is infinite, {A} is infinite.

  • For every positive integer {n}, {f^n(a) = f(f^{n - 1}(a))}, so {f^n(a) \in B}, which means {a \ne f^n(a)} since {a \not \in B}.
  • Suppose {m} is an integer and for every integer {n} such that {m < n}, {f^m(a) \ne f^n(a)}. Then for every integer {n} such that {m + 1 < n}, {m < n - 1}, so {f^m(a) \ne f^{n - 1}(a)}, which means {f^{m + 1} \ne f^n(a)}, since {f} is bijective.

\Box

Now, if the converse of the fact we used holds, i.e. for every infinite set {A}, {\aleph_0 \le |A|}, then we can prove the converse: that every infinite set is Dedekind infinite. The proof relies on the fact that {\mathbb N} has the same cardinality as its proper subset {\{1, 2, 3, \dotsc\}}.

Proof: Suppose {A} is an infinite set. Then {\aleph_0 \le |A|}, so there is an injection {f} from {\mathbb N} to {A}. Let {B} be the image of {f} under {\{1, 2, 3, \dotsc\}}. Then {\aleph_0 \le |B|}, so {B} is infinite, but {B} is a proper subset of {A}, since {f(0) \not \in B}. \Box

The problem with Dedekind’s definition is that one runs into trouble when one tries to prove that for every infinite set {A}, {\aleph_0 \le |A|}, in ZF. The proof can be done easily using the axiom of choice.

Proof: Suppose {A} is an infinite set. If the axiom of choice holds, there is a function {f} on {\mathcal P(A) \setminus \{\emptyset\}} such that for every non-empty subset {B} of {A}, {f(B) \in B}. Let {g} be the sequence such that for every natural number {n}, {g(n) = f(A \setminus \{g(0), g(1), \dotsc, g(n - 1)\})} (this is always defined, because if {A \setminus \{g(0), g(1), \dotsc, g(n - 1)\} = \emptyset}, {A \subseteq \{g(0), g(1), \dotsc, g(n - 1)\}}, so {|A| \le n}, which means {A} is finite). It is clear that {g} is an injection into {A}, so {\aleph_0 \le |A|}. \Box

This doesn’t necessarily mean that the statement is independent of ZF, but certainly it suggests the possibility, since there is no obvious way to amend this proof so that it does not require the axiom of choice.

Perhaps there is another way we could prove that every infinite set is Dedekind infinite? Unfortunately there isn’t, because if every finite set is Dedekind infinite, then for every infinite set {A}, there is a proper subset {B} of {A} such that {|A| = |B|}, so there is a member of {A \setminus B}, and {B} is infinite too, so there is a proper subset {C} of {B} such that {|B| = |C|}, so there is a member of {B \setminus C}, and so on. This allows us to construct an injective sequence of members of {A}, so {\aleph_0 \le |A|}.

Ideally, I would finish off here with a proof that the statement is independent. Unfortunately, I don’t know how to do this. It may be possible to derive the axiom of choice from the fact that every infinite set is Dedekind infinite, in which case this fact cannot be a theorem of ZF (otherwise the axiom of choice would be a theorem of ZF, and the axiom of choice is known to be independent of ZF). On the other hand, this may not be true, in which case I’m not sure what the best way to prove independence would be. The most straightforward way would be to construct a model of ZF in which there are Dedekind finite infinite sets, but I don’t know how you might go about doing that.