# An introduction to Turing machines

The story of how Alan Turing made important contributions to the allied war effort in World War II as part of a team cracking German codes, only to later end up being prosecuted for homosexual activity and chemically castrated as punishment, is well-known. If you didn’t already know about it, you might have seen the film about it, The Imitation Game, which came out recently. But this story, while worth knowing about, is not the whole story of Turing’s life: he was also a rather important mathematician. In fact, he is considered one of the founders of computer science. His principal contribution was the concept of the Turing machine, and in this post, I’ll try and explain what the Turing machine is and its significance, so that you can learn something about what will, in the end, probably be his most enduring claim to fame.

Essentially, a Turing machine is an abstract model of a computer, which we can use to make mathematical proofs about what computers can do and what they cannot—or, in other words, what is computable and what is not computable. It’s important to bear in mind that when Turing came up with the concept, computers as we know them today, these complex machines with CPUs and memory, did not exist. In Turing’s time, the word ‘computer’ was mainly used to refer to people. It was an agent noun, formed from the verb ‘compute’. The prototypical kind of computation, to somebody in Turing’s time, would have been arithmetical calculation: adding two numbers, multiplying two numbers, that sort of thing. This sort of computation, of course, can be done in your head, or, failing that, with the aid of a pencil and some paper. No machines need be involved, although they might be able to do it a lot faster. The point is that computation is a pretty fundamental concept that exists independently of the machines that we call computers. In fact, it is closely related to the concept of algorithms: computation can be thought of as the carrying out of algorithms. And, of course, algorithms have been an important concept in mathematics since it began—the ancient Babylonians knew and used algorithms to solve linear and quadratic equations.

The problem Turing was trying to solve was this: how can computation be defined precisely? It is, of course, fairly easy to give a non-precise definition of the concept. To compute is to carry out an algorithm. And algorithms can be characterised by the following properties, paraphrased from the Stanford Encyclopedia of Philosophy. An algorithm is a procedure for obtaining a result such that

• the algorithm consists of a finite number of instructions, which are precise and unambiguous,
• following the instructions results in the production of the result after a finite number of steps (the algorithm does not run forever),
• the instructions can be followed successfully by a human without any special equipment, except paper and a pencil to aid the memory,
• and, as long as the instructions are understood and followed correctly, they can be successfully followed; no ingenuity is required of the human.

(The term that tends to be used, rather than ‘algorithm’, in this area is ‘effective procedure’, but it means the same thing. I’ll stick with ‘algorithm’ in this post, since it’s probably more familiar.)

There is nothing wrong with non-precise definitions; most concepts do not have precise definitions, and indeed most concepts probably cannot be given precise definitions. Before Turing mathematicians had mostly managed just fine without a precise definition of computation. But Turing had a pretty good reason for wanting a more precise definition. In 1928, David Hilbert posed a problem known as the Entscheidungsproblem, or, in English, the ‘decision problem’. This problem asked whether there was an algorithm that could, basically, solve mathematics. More specifically, it asked whether there was an algorithm that could show whether an arbitrary sentence of first-order logic was true or false. First-order logic is a precisely defined language in which virtually every mathematical statement can be expressed, including conjectures like Goldbach’s conjecture (that every even integer greater than 2 is the sum of two prime numbers) that mathematicians have been trying to prove for centuries. If there was an algorithm that could show whether an arbitrary sentence of first-order logic was true or false, we could simply run the algorithm on Goldbach’s conjecture and this centuries-old problem would be solved. (It might be a bit of an exaggeration to say that mathematics would be completely solved, though, because running the algorithm might be practically infeasible, if it required too much computing power. Also, there might be interesting questions that are not expressible in first-order logic. Still, it can’t be denied that the solution of Entscheidungsproblem would have an enormous impact on our conception of mathematics.) But in order to give a proof that such an algorithm could exist or could not exist and thus solve the Entscheidungsproblem, it was necessary first to give a mathematical definition of computation. Otherwise, anyone who claimed to have found such an algorithm could be countered with the accusation that their algorithm wasn’t a real algorithm, or anyone who claimed to have proven that such an algorithm was impossible could be countered with the accusation that they hadn’t considered every possible algorithm.

But now you might be wondering: wouldn’t any precise definition of computation suffer from the same problem? Any proof based on the definition could be countered with the accusation that you weren’t using the right definition. If, for a given definition, there was an algorithm that could decide whether an arbitrary mathematical statement was true or false, people could say that the definition was too inclusive. And if it could be proven that there was no such algorithm, people could say that the definition was too exclusive. Admittedly, the first possibility was less serious, because, presumably, if there was such an algorithm, we could just see how it works, and it would be fairly easy to see, on an intuitive basis, whether it was an algorithm by checking whether it didn’t require ingenuity, et cetera. As of today, however, one thing is certain: nobody has managed to find such an algorithm. So we have to consider the second possibility, too. But, since our intuitive concept of algorithms is somewhat vague, how can we ever be sure that our definition doesn’t exclude some algorithms that we’d intuitively recognise as algorithms? This problem seems especially insurmountable given that algorithms can be enormously complicated, indeed, they can be complicated to an arbitrary degree; we don’t have a lot of intuition as to the upper limits of what algorithms are capable of.

It is certainly impossible to ever be sure in the mathematical sense that a given definition of computation is the correct one. But there is some pretty convincing evidence that Turing’s definition is the correct one. There were actually quite a few definitions of computation proposed during the 1930s, along with Turing’s definition via Turing machines. Principal among these were Alonzo Church’s definition via lambda calculus and Kurt Gödel and Jacques Herbrand’s definition via recursive functions. I won’t go into these alternative definitions of computation here, but if you look them up yourself, you’ll see that they were quite different in nature from each other and from Turing’s definition (which I’m going to get to shortly). Yet it turned out, conveniently, that all of these definitions were equivalent: the procedures defined as algorithms by Turing’s definition were exactly those defined as algorithms by Church’s definition and exactly those defined as algorithms by Gödel and Herbrand’s definition. And this is quite a coincidence. Consider the collection of all algorithms, A. Each definition picks out a certain sub-collection A′ of A (i.e. a collection which consists of algorithms, but does not necessarily contain every algorithm), and only the algorithms in this sub-collection are considered algorithms by that definition. The fact that the definitions are all equivalent means that each one picks out exactly the same sub-collection. Why would this be the case? One explanation would be that each definition is, in fact, picking out the biggest sub-collection possible—the collection A itself. That’s why the vast majority of mathematicians today believe that Turing’s definition (and Church’s, and Gödel and Herbrand’s, since they coincide with Turing’s definition) is the correct one. This belief has a name: it’s known as Church’s thesis (or the Church-Turing thesis; I suppose calling it the Church-Turing thesis would be more in accordance with the theme of this post, but ‘Church’s thesis’ is less verbose). It is not, by its nature, definitely provable to the standards mathematicians generally prefer, but I don’t think anybody seriously doubts it to be true. After all, people have come up with lots of other definitions of computation since the 1930s, and they have all ended up coinciding with the original definitions of Gödel, Herbrand, Church and Turing. And nobody has ever managed to think of an algorithm that does not satisfy these definitions. It is worth noting that while Church’s thesis cannot be definitely proven, it could be definitely disproven, if there did turn out to be an algorithm that was not considered an algorithm by the definitions. This is part of the reason why we believe it: it’s a falsifiable belief, and yet nobody has managed to falsify it.

Let’s see what Turing’s definition was, then. The idea Turing had was to describe a simple kind of machine for carrying out algorithms. He didn’t call these machines ‘Turing machines’ himself (that would be kind of egotistical), but that’s what they have become known as. An algorithm could then be defined as any procedure that could be carried out by such a machine. Again, it’s interesting to note that for Turing, it was the idea of someone working out a calculation with pencil and paper that was foremost in his mind when he thought about computation. Paper is two-dimensional, so, to simplify matters, he said that the machine would make its calculations on a one-dimensional strip of paper, a ‘tape’, which would be divided into squares. There would be infinitely many squares; the machine would not be limited by constraints of space. (After all, a person carrying out a calculation can always find more paper.) Each square would contain space for a single symbol to be written, chosen from a finite collection of available symbols. The square could of course also be blank, but we can consider that to be just another symbol. The machine would ‘look’ at a single square at a time and its action would be determined by the symbol on that square. However, the machine could be placed in a finite number of different states, so it would not always do the same thing upon seeing a given symbol; its action would also depend on its state. The possible actions would be the following:

1. Write a new symbol on the square currently being looked at (this may be the blank symbol, so that the old symbol is erased).
2. Move the tape one square to the left, so that the machine ends up looking at the square directly to the right of the one that was originally being looked at. This is called ‘shifting right’.
3. Move the tape one square to the right, so that the machine ends up looking at the square directly to the left of the one that was originally being looked at. This called ‘shifting left’.
4. Stop the computation. This is called ‘halting’.

After performing one of these actions, the machine would also possibly change to a different state (if it did not halt). In more abstract terms, the machine would be completely specified by its possible states, its possible symbols, and an association of each state q and each symbol S with an action and a state, corresponding to the action performed, and the new state entered (which may be the same state) when the machine is looking at S while in state q. If the action is action 4 (the halting action), it is pointless to specify a new state to enter, so we can adopt the convention that if there is no association of one of the state-symbol pairs with an action and a new state, then the machine halts if it looks at that symbol while in that state. The machine would have to halt at some point in order to be useful. If the machine did halt, the sequence of symbols remaining on the tape afterwards would be considered the result of the computation. This result may depend on the initial sequence of symbols that was on the tape before the computation was started. So the machine could be thought of as computing a function, which takes as input the initial sequence of symbols, and returns as output the final sequence of symbols. (If the machine does not halt for some input, the function is undefined for that input.) The sequences of symbols could be interpreted any way you like. For example, if the symbols are the blank symbol and 1, you could interpret a sequence of symbols as a sequence of numbers by interpreting each sub-sequence of consecutive 1s as the number of 1s in the sub-sequence, with the blank symbols functioning as separators. So, for example, 11 111 would represent (2, 3), 1111 would represent (4), 111 11 1 would represent (3, 2, 1).

It might not seem obvious that such a simple machine really can carry out any computation. So, I’ll show you a few examples of Turing machines that can carry out calculations with numbers, using the interpretation of sequences of blank symbols and 1s as sequences of numbers given above. Here’s a very simple example: a Turing machine that adds 1 to the input (which should be a single number). It’s represented as a flow chart. To find out what the machine will do in the state q while looking at the symbol S, look at the node which is labelled q, and then see if there is an arrow with a label of the form S : A coming out of this node. If there is such an arrow, this means the machine will carry out action A, and then change state to state p, where p is the label of the node the arrow points to. If there is no such arrow, this means the machine will halt. By the way, I’m using 0 here to represent the blank symbol.

(sorry about the crappy hand-drawn diagram, but I couldn’t find any good software for drawing these)

In order for this machine to work, the input must only contain one string of consecutive 1s, and the machine must start by looking at one of these 1s. Then the machine will shift left until it ends up looking at the first 0 to the left of this string of consecutive ones. It will then change this 0 to a 1, and then halt. So the string of consecutive 1s has an extra 1 added to it.

Here’s a more complicated example, building on this one: a Turing machine that adds the two numbers in the input.

(again, sorry about the crappy lighting, but I don’t have a scanner)

This one is probably too complicated to understand just by looking at the diagram. The idea here is that the machine will start by looking at one of the 1s in the first number in the input (i.e. the leftmost sequence of consecutive 1s). Then it will shift left, past this sequence of consecutive 1s, and further past the sequence of 0s that separates the two numbers (this requires a change of state), until it reaches the leftmost 1 in the second sequence of consecutive 1s. Then it changes this 1 to a 0, and—here’s the clever part—it shifts to look at the square immediately to the right. If this square is a 1, the machine shifts left until it reaches the first 0 to the left of the first sequence of consecutive 1s. Then it changes this 0 to a 1. Effectively, the 1 that was in the second sequence gets transferred to the first sequence. Then the machine goes back to state 1, and the process repeats. But at some point, it will change the final 1 in the second sequence to a 0, and then, when it shifts right, it will encounter a 0 rather than 1. This tells the machine that it has reached the end of the second sequence and, therefore, once it’s transferred this final 1 to the first sequence, it can halt (it does this by transferring to state 3, since state 3 has no action specified for the symbol 1).

A nice way to see how this works is to use this online Turing machine simulator. Copy and paste the following code into the text box:

```0 1 1 r 0
0 _ _ r 1
1 _ _ r 1
1 1 _ * 2
2 _ _ r 3
3 1 1 l 4
4 _ _ l 4
4 1 1 l 5
5 1 1 l 5
5 _ 1 * 0
3 _ _ l 6
6 _ _ l 6
6 1 1 l 7
7 1 1 l 7
7 _ 1 * halt
```

and type in an input, such as 1111 111, into the textbox labelled ‘Initial input’. Then click the button labelled ‘Run’. You’ll see the machine going back and forth as it transfers the 1s from right to left, one by one. If you want, you can play around with that Turing machine simulator and try to do other things, such as multiplying two numbers, to help convince you further that Turing machines can carry out any computation. I can understand if you don’t want to, though, because, as you have probably gathered by now, programming a Turing machine is a very tedious and rather tricky task. Programming a computer can be thought of as a communication problem: you have to tell the machine what to do, but the machine only understands a very peculiar language which humans are not accustomed to thinking in. In practice, people program computers using programming languages such as Java or C++, which are closer to natural language (although still notoriously tricky to communicate to the computer with). Programs written in programming languages are translated into machine language, using programs written in machine language called compilers, so that the computer can understand them.

Now, let’s use this definition of computation to prove what we were trying to prove; that the Entscheidungsproblem is unsolvable. The proof was given by Turing in a 1936 paper, which is freely available here. For a more in-depth proof, you can look at this paper. At several points here, I’m going to have to appeal to Church’s thesis to assure you that certain things can be done by a Turing machine. But it is possible, at every point where this is done, to give the precise details of the Turing machine that could carry out the computation. It would just be a bit too tedious for me to go into. But Turing did go into the details in his original paper.

First, we will show a certain limitation on the power of Turing machines. Note that the input of a Turing machine can be taken to represent a Turing machine itself. This should be fairly clear: a Turing machine is just a set of states, a set of symbols, together with associations of some of the state-symbol pairs with action-state pairs, and clearly these things can all be specified in a finite list of symbols. So we can use Turing machines to carry out computations that tell us stuff about Turing machines. In fact, we can have something called a universal Turing machine. This is a Turing machine U such that for every Turing machine T and every sequence of symbols x, if U is given a sequence of symbols as its input that represents T together with x, then the output of U will be the output of T when it is given x as input. Or, if T halts when it is given x as input, U will halt. Basically, U is a ‘Turing machine simulator’; it can simulate any Turing machine. How can we be sure that a universal Turing machine exists? Well, Church’s thesis means that a universal Turing machine exists if and only if there is an algorithm for finding the output of an arbitrary Turing machine given an arbitrary input. And it is intuitively clear that there is such an algorithm. The fact that universal Turing machines exist is what enables computers to run arbitrary programs; computers are not just Turing machines, but in fact universal Turing machines.

Here’s an interesting question. Is there a Turing machine H that can tell us whether an arbitrary Turing machine T will halt when it is given an arbitrary sequence x of symbols as its input? Exactly how it tells us is not particularly important, but, for example, it might be that the output of H is the number 1 if T halts given x as its input and otherwise the output of H is the number 0. It is not obvious that this Turing machine exists. After all, a Turing machine can run for an arbitrarily long time before it halts. No matter how long you run a Turing machine for, there is still a possibility that it might halt in the future that you can’t rule out, unless there is some way to predict the machine’s future behaviour from its behaviour up to this point. And, intuitively, it does not seem like such a prediction is possible; the behaviour of the Turing machine can be arbitrarily complex; it might do one thing for the first million steps, then switch to doing something completely different.

That’s the intuitive argument that H cannot exist. Now, let’s prove it. This is the only non-straightforward proof in this post; it’s not particularly complicated, but it is a little mind-bending. It uses an argument known as a diagonal argument, which also turns up in Russell’s paradox and Cantor’s proof of the uncountability of the real numbers. If you have never heard of either of those two things, you might have heard of the liar paradox. This is the paradox that arises from the statement ‘this statement is false’. The truth of this statement is equivalent to its own falsehood; it is self-contradictory. Diagonal arguments are a kind of generalisation of the liar paradox. But let’s see how the argument goes in this particular case.

Every Turing machine T can be represented by a sequence x of symbols. So, we can give this representation of T as the input to T. Using H, we can find out whether T halts when it is given this input. We have just described an algorithm for finding out whether an arbitrary Turing machine halts when it is given its own representation as its input. So there is a Turing machine G that can carry out this algorithm. In other words, for every Turing machine T and every sequence of symbols x, G does one thing if T halts when it is given the input x, and it does another thing if T does not halt when it is given the input x. The useful thing to do would be to do something like returning 1 in the first case, and 0 in the second case. But let’s suppose G does something slightly different instead: it does not halt in the first case, and it halts giving an output (it doesn’t really matter what output) in the second case. So, then, G halts if and only if T does not halt when it is given the input x. Can you see the problem here? G, like any other Turing machine, has its own representation x, and we can give x to G as its input. So what happens then? By definition, G halts if and only if G does not halt when it is given the input x. So, whether G halts or not, there is an absurd situation where it both halts and does not halt. We have shown that assuming the existence of H leads to a logical absurdity; therefore, H cannot exist.

And this is a big problem if we are trying to solve the Entscheidungsproblem. Because statements about Turing machines, including statements of the form ‘does the Turing machine T halt when it is given the input x’, can be expressed in first-order logic. So if we had an algorithm that could decide whether an arbitrary sentence of first-order logic, we could, among other things, decide whether an arbitrary Turing machine halts when it is given an arbitrary input. Since this is impossible, there can be no such algorithm. Thus we have proven that the Entscheidungsproblem is unsolvable. Actually, though, I haven’t been totally rigorous here. I haven’t properly justified my claim that all statements of the form ‘does the Turing machine T halt when it is given the input x’ can be expressed in first-order logic. For a start, in order to justify this claim, I would have to specify precisely what I mean by first-order logic, because there are actually lots of variants of first-order logic, and for some of these variants the Entscheidungsproblem can be solved (such variants are said to be decidable). However, every variant of first-order logic with a sufficient amount of expressive power is undecidable; it is only the simplest variants which are decidable. I can’t give proper proofs of these results without going into technical details about how first-order logic works. For now, I will leave you, hopefully, somewhat convinced that the Entscheidungsproblem is unsolvable, but fully convinced that the halting problem (the problem of finding out whether an arbitrary Turing machine halts given an arbitrary input) is unsolvable.