A simple motivation for the notion of a natural transformation

A natural transformation is an operation on a category, or more precisely a family of operations, one for each object in the category, which is preserved by morphisms in the category.

Each operation in the family is associated with a specific object A in the category, which it is said to be on. The operation itself is an endomorphism on A, modulo application of functors in a regular way to the domain and codomain. That is, it is a morphism from F(A) to G(A) for some functors F and G which are the same for every object. We call F and G the domain and codomain of the operation, respectively.

For example, multiplication in a group G is a function from U(G)^2 to U, where U is the forgetful functor from groups to sets. So the domain and codomain of the natural transformation corresponding to multiplication in groups are FU and U respectively, where F is the endofunctor on the category of sets such that for every set A, we have F(A) = A^2, and for every function f : A \to B and any two elements x and y of A^2, we have F(f)(x, y) = (f(x), f(y)).

If, for every object A, we write the operation on A as \alpha_A, then the naturality condition requires that for every morphism f : A \to B, we have G(f) \circ \alpha_A = \alpha_B \circ F(f). This is just the requirement that f “preserves” the operation: applying f to the output is the same as using the value under f as the input (modulo the functors we need to get the types right).

Consider the example of groups: a group homomorphism f : G \to H needs to have the property that for any two elements x and y of U(G), we have U(f)(xy) = U(f)(x) U(f)(y). (Of course we don’t usually write the forgetful functor explicitly, but we’re trying to be precise here.) If we write multiplication in G and H using prefix symbols \alpha_G and \alpha_H respectively, then this equation becomes U(f)(\alpha_G(x, y)) = \alpha_H(U(f)(x), U(f)(y)). Looking at the way F acts on morphisms, we see that this can be further rewritten as

\displaystyle U(f)(\alpha_G(x, y)) = \alpha_H(F(U(f))(x, y)),


\displaystyle U(f) \circ \alpha_G = \alpha_H \circ F(U(f)).

I like this way of motivating natural transformations because—well, first of all because it’s a way of motivating natural transformations at all, which is more than pretty much every category theory introduction I’ve seen does, at least if you don’t count saying that there ought to be some notion of morphism between functors, and then declaring by fiat that natural transformations are the appropriate notion of morphism between functors, as motivation. But also because it allows us to reconcile the formal category-theoretic understanding of a category as having its morphisms as part of its data with the perhaps more natural pre-formal understanding of a category as having operations as part of its data, and morphisms being characterized by their preservation of those operations.

In general, we can say that an operation on a category X is a family of morphisms \alpha_A : F(A) \to G(A) on objects A of X, where F and G are functors from X to some other category Y. In other words, it is like a natural transformation but it doesn’t need to satisfy the naturality condition. Now, if we have an operation like this we can consider the set of the morphisms f : A \to B in X which do satisfy the natural condition, i.e. have G(f) \alpha_A = \alpha_B F(f). It turns out that this set of morphisms is always closed under composition and contains every identity morphism, and therefore induces a subcategory of X:

Proof that it’s closed under composition. Suppose f : A \to B and g : B \to C are morphisms in X and G(f) \alpha_A = \alpha_B F(f) and G(g) \alpha_B = \alpha_C F(g). Then

\displaystyle G(gf) \alpha_A = G(g) G(f) \alpha_A = G(g) \alpha_B F(f) = \alpha_C F(g) F(f) = \alpha_C F(gf).

Proof that it contains every identity morphism. Suppose A is an object in X. Then

\displaystyle G(1_A) \alpha_A = 1_{G(A)} \alpha_A = \alpha_A = \alpha_A 1_{F(A)} = \alpha_A F(1_A).

For example, if we consider the category of groups and functions between groups, the category of groups and group homomorphisms is induced in this way as a subcategory by the operation of multiplication in groups.

It’s interesting to consider the converse: given a category X and a subcategory Y of X, under what circumstances is there is a category Z, functors F and G from X to Z, and an operation \alpha : F \to G such that the morphisms in X are precisely the morphisms f : A \to B in X with G(f) \circ \alpha_A = \alpha_B \circ F(f)? Obviously Z will have to still contain every object in X (since the construction described here only limits the morphisms in the subcategory, not the objects), but I will leave any more detailed consideration of this question for readers.

Notes on periodic functions

A period of a function f : ℝ → X is a T ∈ ℝ such that for every t ∈ ℝ, we have f(t + T) = f(t). Note that X may be any set; the definition does not refer to any structure on X other than its equality relation.

For every function f : ℝ → X, the set of periods of f is an additive subgroup of :

  • For any two periods T1 and T2 of f and every t ∈ ℝ, we have
    f(t + T1 + T2) = f(t + T1) = f(t),
    so T1 + T2 is a period of f.
  • For every period T of f and every t ∈ ℝ, we have
    f(t − T) = f((t − T) + T) = f(t),
    so  − T is a period of f.

Therefore, we call it the group of periods of f.

The converse also holds: every additive subgroup G of is the group of periods of a function, namely the canonical projection f : ℝ → ℝ/G (which takes every t ∈ ℝ to its equivalence class wrt the equivalence relation {(t1, t2) : t2 − t1 ∈ G}):

  • For every T ∈ G and every t ∈ ℝ, we have (t + T) − t = T ∈ G and hence f(t + T) = f(t), so T is a period of f.
  • Conversely, for every period T of f, we have f(T) = f(0) and hence T − 0 = T ∈ G.

Henceforth, we shall always refer to additive subgroups of as groups of periods.

It is worth noting explicitly that, in addition to the two closure properties proven above, the fact that the groups of periods are groups implies that they are closed under integer multiples, and in particular contain 0.

A function is aperiodic iff its group of periods is trivial, i.e. 0 is its only period. Otherwise, it is periodic.

If a group G of periods is nontrivial and cyclic, then it has a nonzero generator a. Since the set of the generators of G is closed under negation, we can choose a to be positive. This a is then the least positive element of G, because for every positive x ∈ G, there is an n ∈ ℤ such that x = na where n ∈ ℤ, and this n has to be positive (since x and a are positive), so that x ≥ a. The logic here works for any ordered group, not just groups of periods. It also works for any positive generator of G—so the least positive element of G is the only positive generator of G.

Conversely, if a group G of periods has a least positive element a, then a generates G. To see this, suppose x ∈ G and let n be the floor of x/a, so that n ≤ x/a < n + 1, i.e. na ≤ x < (n + 1)a, i.e. 0 ≤ x − na < a. Then x − na is an element of G less than a, so it cannot be positive; it must be 0, so that x = na. (The logic here doesn’t work for any ordered group, because it relies on the Archimedean property of the real numbers.)

So a nontrivial group G of periods has a least positive element a iff it is cyclic, in which case a is the unique positive generator of G. In this situation, we call a the fundamental period of G, or more loosely just the period of G.

The “canonical” examples of periodic functions have fundamental periods. For example, the sine and cosine functions have a fundamental period of 2π. However, constant functions are also technically periodic, and they do not have fundamental periods: a function f : ℝ → X is constant iff its group of periods is . There are more obscure examples. Consider the indicator function 1 of . For every T ∈ ℚ and every t ∈ ℝ, we have t + T ∈ ℚ iff t ∈ ℚ and hence f(t + T) = f(t); so T is a period of f. On the other hand, for every irrational T ∈ ℝ, we have ( − T) + T = 0 ∈ ℚ but  − T ∉ ℚ, and hence f(( − T) + T) = 1 but f( − T) = 0; so there is a t ∈ ℝ such that f(t + T) ≠ f(t), and hence T is not a period of f. It follows that the group of periods of 1 is .

There is a topological criterion which can be used to determine whether a periodic function has a fundamental period: a group G of periods is cyclic iff it is not dense in . To see this, observe first that if G is dense in , then it has elements arbitrarily close to 0, symmetrically on both the positive and negative sides of the number line (since G is closed under negation). Therefore G does not have a least positive element and hence is acyclic.

Conversely, suppose G is acyclic and hence has no least positive element. Then for every positive x ∈ G, there is a positive y ∈ G less than x. The difference x − y is also a positive element of G less than x, and at least one of y and x − y is less than or equal to x/2 (otherwise their sum would be greater than x). So in fact there is a positive y ∈ G less than or equal to x/2. By repeatedly applying this we can see that for every n ∈ ℕ, there is a positive y ∈ G less than or equal to x/2n. So G has positive elements arbitrarily close to 0. It follows that for every a ∈ ℝ and every positive ε ∈ ℝ, there is a positive x ∈ G less than ε. The interval (a − ε, a + ε), being of length 2ε > x, must contain nx for some n ∈ ℤ. And since G is closed under integer multiples, we have nx ∈ G. So G has elements arbitrarily close to a. This holds for every a ∈ ℝ, so G is dense in .

This criterion is simplified for continuous periodic functions f : ℝ → X, where X is an accessible topological space (a topological space X is accessible iff any two distinct points in X have neighbourhoods not containing the other point). This is because the group G of periods of such a function f is closed. To see this, suppose T0 is a limit point of G; it will suffice to prove that T0 ∈ G. To see that, suppose t ∈ ℝ; it will suffice to prove that f(t + T0) = f(t). To see that, suppose N is a neighbourhood of f(t + T0); it will suffice to prove that f(t) ∈ N. Since f is continuous, so is its right-composition with adding t. Therefore there is a neighbourhood M of T0 such that f(t + M) ⊆ N. And since T0 is a limit point of G, there is a T ∈ M ∩ G. Since T ∈ M, we have f(t + T) ∈ N. Since T ∈ G, we have f(t + T) = f(t) and hence f(t) ∈ N.

Since G is closed, it is equal to its closure. So it can only be dense in (i.e. have its closure equal to ) iff it is equal to . Therefore, given that f is a continuous periodic function with an accessible codomain, it has a fundamental period iff it is nonconstant.


A proof of the boundedness theorem by induction

As usual, this post can be viewed as a PDF.

Theorem (Boundedness Theorem). A continuous real-valued function on a closed interval is bounded.

Proof. Suppose f is such a function and [a, b] is its domain. First, observe that for every c ∈ [a, b], since f is continuous at c, there is a positive \delta \in {\mathbb R} such that for every x ∈ [a, b]∩(c − δ, c + δ), we have f(x)∈(f(c)−1, f(c)+1). So there is a neighbourhood of c, namely (c − δ, c +  δ), on which f is bounded. Thus f is, in a sense, locally bounded at every point in its domain; the problem is to prove that this local boundedness implies global boundedness.

In textbook proofs of the boundedness theorem, this is generally done using what I would regard as a trick, such as supposing f isn’t bounded and using the Bolzano-Weierstrass theorem to obtain a contradiction. More advanced texts may appeal to the compactness of [a, b], but the proof that [a, b] is compact (the Heine-Borel theorem) amounts to basically the same logic, and is usually no less trickful1.

However, if we think about how to construct an algorithm to find a global bound (often a helpful move to make—the un-situated-in-time-ness of ordinary mathematical language can be quite thought-limiting) then there is a procedure which, in my opinion, is quite obvious and immediately suggests itself. Just start on the left, with the singleton interval {a}, on which f is certainly bounded, and repeatedly apply local boundedness to the right endpoint, gradually expanding the subinterval of [a, b] on which we know f to be bounded. At each step the existing subinterval has a bound, and the neighbourhood of the right endpoint has a bound; taking the maximum of these two bounds gives us a bound of the whole expanded subinterval formed by unioning2 the neighbourhood with the existing subinterval. Eventually the right endpoint will reach b and we will no longer be able to expand it further, at which point we stop and observe that we have bounded the whole of [a, b]. (Sanity check: why would this fail for (a, b)? Because then we wouldn’t be able to take the right endpoint to b; there would be nowhere to stop the procedure, without leaving a part of the right of (a, b) unbounded.)

Of course, this algorithm will not in general terminate in a finite number of steps, but this is no issue; don’t let your thinking be limited by the arbitrary limitations of physical machines. Obviously, it is still necessary to prove that this algorithm will terminate, even if it takes infinitely many steps. Since we’re now dealing with the distinctions among the infinite quantities, we’re going to need to use some set-theoretic machinery (which is why you won’t see this proof in introductory real analysis textbooks). I’ll assume that you are familiar with the ordinals, the techniques of transfinite induction and recursion, and Burali-Forti’s paradox (which says that there is no set of all ordinals).

Here’s the plan. Using transfinite recursion, we shall construct an ordinal-indexed sequence xα of members of [a, b] such that every ordinal α has the following properties:

  1. The function f is bounded on [a, xα].

  2. We have xα ≤ xα + 1, and if xα = xα + 1, then xα + 1 = b.

Then, since there are a proper class’s worth of ordinals and [a, b] is just a set, the sequence xα will have to repeat3, so there will be ordinals α and β such that α < β and xα = xβ. By (2), the sequence xα will be nondecreasing, so we will have xα ≤ xα + 1 ≤ xβ and hence xα = xα + 1 = xβ = b. Then by (1), it will follow that f is bounded on [a, b].

All we need to do to complete the proof is describe the recursion and make it clear that it is a valid recursion (i.e. all of the assumed properties used to construct the next term of the sequence are preserved by the constructed sequence with the next term added).

Base case
Let x0 = a and observe that f is bounded on {a} by |f(a)|.
Successor case
For every ordinal α, if we assume that xα ∈ [a, b], then since f is continuous, there is a positive \delta \in {\mathbb R} such that f is bounded on [xα, xα + δ] by some M. Let xα + 1 = min(b, xα + δ), and observe that:

  • If b ≤ xα + δ, then xα + 1 = b and hence a ≤ xα ≤ xα + 1 ≤ b. Otherwise, we have xα + 1 = xα + δ and hence a ≤ xα < xα + 1 ≤ b, with the middle inequality strict, so that we can only have xα = xα + 1 when xα + 1 = b.
  • We have xα + 1 ≤ xα + δ, so f is bounded by M on [xα, xα + 1] as well as [xα, xα + δ]. Assuming f is bounded on [a, xα] by some L, it follows that f is bounded on [a, xα + 1] by max(L, M).

Limit case

For every limit ordinal λ, let A = {xα : α < λ}. Then A contains x0 and hence is nonempty, and is bounded above by b, so it has a supremum. Let xλ = supA, and observe that:

  • If we assume that A ⊆ [a, b], then since A is nonempty and xλ is ge every member of A, we have xλ ≥ a; and since b bounds A above and xλ is the smallest upper bound of A, we also have xλ ≤ b.
  • Since xλ ∈ [a, b] and f is continuous, there is a positive \delta \in {\mathbb R} such that f is bounded on [xλ − δ, xλ] by some M. And since xλ = supA, there is an ordinal α < λ such that xλ − δ < xα and hence f is bounded on [xα, xλ] by M. Assuming f is bounded on [a, xα] by some L, it follows that f is bounded on [a, xλ] by max(L, M).

And that’s it.

Of course, this proof is a bit tedious. There might be ways to make it shorter (maybe we can use Zorn’s lemma instead of transfinite induction). But the advantage it has over other proofs of the boundedness theorem I’ve seen is that it falls out more or less automatically, without requiring any flashes of insight.

  1. This word doesn’t appear to exist currently, but it needs inventing.

  2. Another word that needs inventing. OK, I could use “uniting”, but I suspect people wouldn’t immediately make the connection with the union operation on sets if I used that word.

  3. This is a sort of infinitary pigeonhole principle.

Notes on proving the completeness theorem for propositional logic

NB: I’ve opted to just get straight to the point with this post rather than attempting to introduce the subject first, so it may be of little interest to readers who aren’t already interested in proving the completeness theorem for propositional logic.

A PDF version of this document is available here.

The key thing I had to realize for the proof of the completeness theorem for propositional logic to “make sense” to me was that interpretations of a language of propositional logic can be thought of as theories.

Normally, an interpretation of a language L of propositional logic is thought of as a function υ from the set of the sentence variables in L to {0, 1}, which is extended to the set of the formulas in L by letting υ(⊥)=0 and recursively letting
\upsilon(\phi \rightarrow \psi) = \upsilon(\psi)^{\upsilon(\phi)} = \begin{cases}         0 &\quad \text{if } \upsilon(\phi) = 0 \text{ or } \upsilon(\psi) = 1, \\         1 &\quad \text{if } \upsilon(\phi) = 1 \text{ and } \upsilon(\psi) = 0     \end{cases}
for every pair of sentences ϕ and ψ in L (assuming and are the primitive connectives). The value under υ of a sentence ϕ is thought of as the truth value of ϕ under υ, with 0 standing for falsity and 1 standing for truth.

A theory in L, on the other hand, is thought of as a set of sentences in L, these sentences being the nonlogical axioms of the theory. It’s a completely different type of object from an interpretation.

But if you think about what these formal concepts are trying to get at, they’re quite similar. Both of them are essentially requirements that certain sentences be true. A theory requires its nonlogical axioms to be true. An interpretation requires the sentences true under it to be true, and the sentences false under it to be false, which might appear to be a slightly more elaborate concept, but since a sentence is false iff its negation is true, you know what sentences an interpretation makes false if you know what sentences it makes true. The only substantial difference is that an interpretation is subject to certain restrictions compared to a general theory:

  1. An interpretation of L must require every sentence variable in L to be either true or false (and not both).

  2. An interpretation of L must require to be false.

  3. For every pair of sentences ϕ and ψ in L, an interpretation of L must require ϕ → ψ to be true iff it requires ϕ to be false or requires ψ to be true (or both).

Formally, we can define a function f on the set of the interpretations of L by the rule that for every interpretation υ of L, we have
f(\upsilon) = \{\phi : \upsilon(\phi) = 1\} \cup \{\neg \phi : \upsilon(\phi) = 0\}.
This function f is an injection into the set of the theories in L. But it is not surjective—for every theory M in L, the value f−1(M) exists only if the statements below hold:

  1. For every sentence variable A in L, exactly one of A and ¬A is a member of M.
  2. ⊥ ∉ M.

  3. For every pair of sentences ϕ and ψ in L, we have ϕ → ψ ∈ M iff ϕ ∉ M or ψ ∈ M.

An interpretation could reasonably be defined as a theory M in L with properties (1)-(3) above.

Now, the completeness theorem says that every syntactically consistent theory in L is semantically consistent, i.e. has a model. A model of a theory T in L is an interpretation of L under which every member of T is true, so if we adopt the definition of interpretations as theories, we can say that a model of T is an extension of T which is an interpretation of L. This leads us to the idea that in order to construct a model of an arbitrary syntactically consistent theory T in L, we will have to extend T by adding new members.

But how exactly do we extend T? Here, it helps to recall the soundness theorem, which is the converse of completeness: it says that every semantically consistent theory in L is syntactically consistent. (This is generally straightforward to prove from the definition of syntactic consequence. I’m not going to prove it here because I’m trying to be agnostic about exactly how syntactic consequences are defined, and the proof is different depending on how syntactic consequences are defined.) Every interpretation of L, thought of as a theory in L, is certainly semantically consistent (since it has itself as an extension) and hence, by soundness, syntactically consistent. Therefore one thing we definitely need to do as we add new members to T, in order for it to eventually become an interpretation of L, is preserve the syntactic consistency of T.

In fact, it’s not too difficult to see that every theory M in L which can be thought of as an interpretation of L is not only syntactically consistent, but maximally syntactically consistent: every proper extension of M is syntactically inconsistent.

To see this, first, observe that for every sentence ϕ in L, since ¬ϕ abbreviates ϕ → ⊥, property (3) tells us that ¬ϕ ∈ M iff ϕ ∉ M or ⊥ ∈ M; and M never contains . So we have ¬ϕ ∈ M iff ϕ ∉ M. In other words, M contains exactly one of ϕ and ¬ϕ.

Now, suppose M is a proper extension of M. Then it has a member ϕ which is not a member of M. The negation of ϕ must then be a member of M. Since M extends M, it follows that ¬ϕ ∈ M. But we also have ϕ ∈ M. Since every member of M is a syntactic consequence of M, it follows that M is syntactically inconsistent.

Now that we know every interpretation of L is maximally syntactically consistent, the natural next question to ask is whether the converse holds, i.e. every maximally syntactically consistent theory T in L is an interpretation of L. If the converse does hold, then to prove the completeness theorem, all we need to do is extend a syntactically consistent theory to a maximal syntactically consistent theory. As it happens, the converse does hold.

Lemma 1. For every maximally syntactically consistent theory M in L and every sentence ϕ in L, exactly one of ϕ and ¬ϕ is a member of M.

Proof. Suppose M is a maximally syntactically consistent theory in L and ϕ is a sentence in L.

If M contains both ϕ and ¬ϕ, then M is syntactically inconsistent. So M contains at most one of ϕ and ¬ϕ.

If M contains neither ϕ nor ¬ϕ, then M ∪ {ϕ} and M ∪ {¬ϕ} are proper extensions of M and hence are syntactically inconsistent, from which it follows by negation introduction and elimination that M syntactically implies both ϕ and ¬ϕ, and hence is syntactically inconsistent. So M contains at least one of ϕ and ¬ϕ. ■

Theorem 1. Every maximally syntactically consistent theory M in L is an interpretation of L.

Proof. Suppose M is a maximally syntactically consistent theory in L.

For every sentence variable A in L, exactly one of A and ¬A is a member of M by the lemma above.

If ⊥ ∈ M, then M syntactically implies and hence is syntactically inconsistent; so we have ⊥ ∉ M.

Suppose ϕ and ψ are sentences in L. We shall prove that ϕ → ψ ∈ M iff ϕ ∉ M or ψ ∈ M.

For the forward implication, suppose ϕ → ψ ∈ M. If ϕ ∈ M and ψ ∉ M, i.e. ¬ψ ∈ M, then M syntactically implies both ϕ → ψ and ¬ψ, so by modus tollens it follows that M ⊢ ¬ϕ. But we also have M ⊢ ϕ, so M is syntactically inconsistent. This is a contradiction, so one of ϕ ∈ M and ψ ∉ M must not hold.

For the backward implication:

  1. First, suppose ϕ ∉ M, i.e. ¬ϕ ∈ M. Then M ⊢ ¬ϕ, so M ⊢ ϕ → ψ.
  2. Second, suppose ψ ∈ M. Then M ⊢ ψ and hence M ⊢ ϕ → ψ.

Either way, we have M ⊢ ϕ → ψ. Therefore, if ϕ → ψ ∉ M, i.e. ¬(ϕ → ψ ∈ M), so that M ⊢ ¬(ϕ → ψ), we have that M is syntactically inconsistent, which is a contradiction. So ϕ → ψ ∈ M. ■

Now, to complete the proof, we just need to prove that an arbitrary syntactically consistent theory T in L can be extended until it is maximally syntactically consistent. The standard tool for carrying out such proofs is Zorn’s lemma. (There are other techniques that can be used, like transfinite induction; if you’re not too familiar with proofs like this, using transfinite induction generally makes things clearer. But Zorn’s lemma makes the proof more concise, so that’s what I’ll use here.)

Note that we make use of the “syntactic compactness theorem” here, which says that a theory is syntactically consistent iff each of its finite subsets is syntactically consistent. Unlike the semantic compactness theorem, which is most straightforwardly proven as a consequence of completeness, the syntactic compactness theorem is trivial; it essentially follows from the fact that proofs are finite and hence any proof of only makes use of finitely many axioms.

Theorem 2. Every syntactically consistent theory has a maximal syntactically consistent extension.

Proof. Suppose T is a syntactically consistent theory. To prove that T has a maximal syntactically consistent extension, we shall use Zorn’s lemma; so suppose 𝒯 is a chain of syntactically consistent extensions of T and let T be the union of the theories in 𝒯. To prove that T is consistent, we shall use the syntactic compactness theorem; so suppose U = {ϕ1, ϕ2, …, ϕn} is a finite subset of T. In the case where U is empty, it is certainly syntactically consistent. Otherwise, let T1, T2, …and Tn be theories in 𝒯 containing ϕ1, ϕ2, …and ϕn respectively. Then {T1, T2, …, Tn} is a finite, nonempty chain, so it has a maximum U, which is syntactically consistent and extends U. Therefore U, having a syntactically consistent extension, must be syntactically consistent itself. ■

Programming languages as theorem verifiers

(This post is also available as a PDF).

One of the most interesting ideas in modern logic is the Curry-Howard correspondence. This is the informal observation that, from a certain perspective, mathematical proofs and computer programs are the same thing.

To be more exact: there is a correspondence between mathematical formulas and data types in computer programs. In particular, the correspondence is not with data values. This is something that puzzled me a little when learning about this—why couldn’t it be with data values? But I think I can now express why it has to be with types: a mathematical formula expresses that something is the case, and a data type, when attached to a data value, can likewise be thought of as expressing that something is the case for that data value. A data value, on the other hand, is just a thing; it makes no sense to say that it expresses something. (Perhaps one could say that data values correspond to the truth values of formulas, as opposed to the formulas themselves.)

In addition, there is a correspondence between rules of inference and operations which combine programs. Just as mathematical proofs can be thought of as being built up from smaller proofs whose combination is justified by rules of inference, so computer programs can be thought of as being built up from smaller programs using various operations (different words may be used depending on the programming language—“subroutine”, “function”, etc.—but the most general way to describe how programs are built up is to say that they are built up from smaller programs).

As an illustration, let’s have a look at how the Hilbert system which I talked about in the last post can be described in terms of programs, types and operations rather than proofs, formulas and rules of inference. A Hilbert system described in such a way becomes a system of combinatory logic.

The defining characteristic of Hilbert system is that they have few rules of inference and rely on axioms for their expressive power. Accordingly, combinatory logic systems have few ways of combining programs and rely on atomic programs known as combinators for their expressive power. (Note that the programs in combinatory logic are simple ones, which just transform inputs into outputs without altering any state. As a result, they can just as easily be thought of as mathematical functions—thinking of combinatory logic as a programming language is just one way of thinking of it, albeit one which comes quite naturally.)

Although combinatory logic can be presented formally, it exists embedded in any programming language with a reasonably powerful type system, such as Haskell. Doing combinatory logic in a practical programming language helps it sink in that there’s a real correspondence there. So, here is a complete proof of the reflexivity of the conditional connective in a restricted fragment of Haskell:

s x y z = x z (y z)
k x y = x

i :: a -> a
i = s k k

main :: IO ()
main = return ()

The first two lines in this program define the k and s combinators, which correspond to the two axiom schemes in a minimal Hilbert system. These are implemented in Haskell as polymorphic functions, with instantiations of the functions with specific types corresponding to the individual axioms permitted by the schemes. Note the type signatures (I didn’t write them out explicitly in the program, since Haskell can infer them automatically):

k :: a -> b -> a
s :: (a -> b -> c) -> (a -> b) -> a -> c

The signatures are just the same as the axioms of the minimal Hilbert system, except that the function type-forming operator -> is used in place of the conditional connective, and the operands are thought of as type variables, not formula variables.

The third line, the type signature of i, is where we state the theorem that we’re trying to prove—in this particular case, it’s p → p for an arbitrary formula p. Replacing the formula variable p with a type variable a and the conditional connective with the -> operator, we get the type signature a -> a. So in combinatory logic, the theorem amounts to the assertion that there is a function that returns values of the same type as its argument.

The proof itself is the definition of the function. This definition, by itself, determines the type of the function; the type signature declared in the third line only comes into play because the Haskell compiler will check that it is compatible with the actual type determined, and if it isn’t, it will refuse to compile the program. Thus, the theorem is proven if and only if the program successfully compiles.

In the proof of p → p in a Hilbert system, the first step is to assert that
(p → (p → p)→p)→(p → p → p)→p → p
is an axiom, because it is an instance of the axiom scheme (p → q → r)→(p → q)→p → r. In Haskell, we could make this assertion explicit by defining a new function with the type signature (a -> (a -> a) -> a) -> (a -> a -> a) -> a -> a, but making the definition of this function just delegate to s:

s1 :: (a -> (a -> a) -> a) -> (a -> a -> a) -> a -> a
s1 = s

This would make the compiler check that the polymorphic function s can indeed be instantiated with this type. It isn’t absolutely necessary, though; we can just use s itself in place of s1, without requiring the compiler to verify the intermediate step. In fact, in the program I wrote above I have not asked the compiler to verify any of the intermediate steps, only the final type signature of a -> a, because the program is nice and concise that way.

The next step in the proof of p → p, in its most detailed version, would be to assert that p → (p → p)→p is a valid instance of the axiom scheme p → q → p. This can be done in Haskell in much the same way, by defining a function k1 which is the same as k but has a more specific type signature.

The next step after that is to apply the modus ponens rule of inference to these two axioms, proving the formula (p → p → p)→p → p. What is the counterpart of modus ponens in combinatory logic? It’s simply function application, because if x is a function of type a -> b and y is a value of type a, then type of the value x y returned by x, given y as its argument, is b. So the Haskell way of expressing this step is as follows:

i0 :: (a -> a -> a) -> a -> a
i0 = s1 k1

(Note that s1 and k1 can be replaced with s and k here, if they were not defined earlier.)

The final step is to apply modus ponens to the formula p → p → p)→p → p (the one we just proved) and the axiom p → p → p. This amounts to applying the i0 function to k (of course we could also define a k2 function with the specific type signature a -> a -> a, and use it in place of k, as with the other two axioms used). If we compress the whole proof into one function, we get the i function in the program above:

i :: a -> a
i = s k k

And if you’re wondering about the last two lines in the original code snippet, they’re just required in order to make the program a valid Haskell program that will compile.

There’s a lot more to say about this subject, such as the role of intuitionistic vs. classical logic, and the relationship between quantifiers and dependent types …however, I still don’t understand these deeper aspects of the subject very well, so I won’t say any more in this post.

Motivating Hilbert systems

(This post is also available as a PDF.)

Hilbert systems are formal systems that encode the notion of a mathematical proof, which are used in the branch of mathematics known as proof theory. There are other formal systems that proof theorists use, with their own advantages and disadvantages. The advantage of Hilbert systems is that they are simpler than the alternatives in terms of the number of primitive notions they involve.

Formal systems for proof theory work by deriving theorems from axioms using rules of inference. The distinctive characteristic of Hilbert systems is that they have very few primitive rules of inference—in fact a Hilbert system with just one primitive rule of inference, modus ponens, suffices to formalize proofs using first-order logic, and first-order logic is sufficient for all mathematical reasoning. Modus ponens is the rule of inference that says that if we have theorems of the forms p → q and p, where p and q are formulas, then χ is also a theorem. This makes sense given the interpretation of p → q as meaning “if p is true, then so is q”.

The simplicity of inference in Hilbert systems is compensated for by a somewhat more complicated set of axioms. For minimal propositional logic, the two axiom schemes below suffice:

  1. For every pair of formulas p and q, the formula p → (q → p) is an axiom.
  2. For every triple of formulas p, q and r, the formula (p → (q → r)) → ((p → q)→(p → r)) is an axiom.

One thing that had always bothered me when reading about Hilbert systems was that I couldn’t see how people could come up with these axioms other than by a stroke of luck or genius. They are rather complicated, and even more so, the proofs that one generates using them directly are rather complicated. To illustrate, here’s an example of a proof in a Hilbert system using these two axioms:

Theorem 1. For every formula p, the formula p → p is a theorem.


  1. From axiom scheme 1 we have that p → (p → p) is an axiom.
  2. From axiom scheme 1 we have that p → ((p → p)→p) is an axiom.
  3. From axiom scheme 2 we have that (p → ((p → p)→p)) → ((p → (p → p)) → (p → p)) is an axiom.
  4. Applying modus ponens to the theorems proved in steps 2 and 3, we see that (p → (p → p)) → (p → p) is a theorem.
  5. Applying modus ponens to the theorems proved in steps 1 and 4, we see that p → p is a theorem.

This proof is rather difficult to come up with, because the reasoning in it is totally unlike how people naturally do mathematical reasoning. A more natural informal proof that p → p is a theorem would probably go something like this:

  1. Assume that p is true.
  2. Then (repeating ourselves) we have that p is true.
  3. It follows that p → p is a theorem by applying the rule that of inference saying that if we assume some formula q is true and then derive some formula r, then q → r is a theorem.

The rule of inference mentioned in step 3 is called deduction. The rules of inference of modus ponens and deduction together encapsulate the interpretation of the conditional connective as meaning “if …then”. Whereas modus ponens tells us what we can prove from an “if …then” statement, deduction tells us how we can get to an “if …then” statement.

Although deduction isn’t a primitive rule of inference in Hilbert systems, it can be proven to be a valid derived rule of inference. This requires a formalization of the notion of assumption; fortunately this doesn’t require any extra machinery. The assumption of a formula p can be thought of as a movement from the original Hilbert system into another Hilbert system that has p as an extra axiom. Let us introduce some convenient notation: if we identify Hilbert systems with their sets of axioms, then given a Hilbert system Γ and a formula p, we can write the Hilbert system obtained by adding p as an axiom to Γ as Γ ∪ {p}, using set notation. Then we can formally state the deduction theorem like so:

Theorem 2. For every Hilbert system Γ and every pair of formulas p and q such that q is a theorem of Γ ∪ {p}, the formula p → q is a theorem of Γ.

The proof of the deduction theorem is quite straightforward if we use an inductive technique. Since modus ponens is the only rule of inference in Hilbert systems, the set of the theorems of a Hilbert system Γ can be defined as the smallest set X with the two properties listed below:

  1. Every axiom of Γ is a member of X.
  2. For every pair of members of X of the forms p → q and p, where p and q are formulas, the formula q is also a member of X.

Therefore, in order to prove that a set X contains every theorem of Γ, it suffices to prove that X has these two properties.

For the deduction theorem, if we are given some fixed formula p, we may consider the set X of the theorems q of Γ ∪ {p} such that p → q is also a theorem of Γ. Then, to prove the first property, we have to prove that X contains every axiom of Γ ∪ {p}. By Theorem 1 we have that p → p is a theorem of Γ, so X certainly contains p. As for the axioms of Γ, by axiom scheme 1 we have that for every formula q, including the axioms of Γ, the formula q → (p → q) is a theorem of Γ. Applying modus ponens, it follows that p → q is a theorem of Γ for every axiom q of Γ.

The second property is even more straightforward to prove. Suppose q and r are formulas and q → r and q are members of X, so that p → (q → r) and p → q are theorems of Γ. By axiom scheme 2, we have that (p → (q → r)) → ((p → q)→(p → r)) is a theorem of Γ. Applying modus ponens to this theorem and p → (q → r), it follows that (p → q)→(q → r) is a theorem of Γ. Applying modus ponens again to this theorem and p → q, it follows that p → r is a theorem of Γ and hence r ∈ X. This completes the proof of the deduction theorem.

Now, I realized just the other day that this proof can be used to explain where axiom schemes 1 and 2 come from. Suppose we were trying to prove the deduction theorem using modus ponens only without knowing what axiom schemes to start with. The proof would proceed as usual except for the final steps in proving each of the two properties. For the first property, we would need to be able to prove that p → p is a theorem of Γ in one case, and in the other case we would get to a state where we would have that a formula q is a theorem of Γ, and we would need to be able to conclude that p → q is also a theorem of Γ. For the second property, we would get to a state where we would have that formulas of the forms p → (q → r) and p → q, where q and r are formulas, are theorems of Γ, and we would need to be able to conclude that p → r is also a theorem of Γ. This would then naturally motivate us to introduce the three axiom schemes listed below, which are exactly the theorems we need to have available in order to draw the required conclusions straightforwardly by repeatedly applying modus ponens.

  1. For every formula p, we have p → p.
  2. For every pair of formulas p and q, we have p → (q → p).
  3. For every triple of formulas p, q and r, we have (p → (q → r)) → ((p → q)→(q → r)).

At some point we would have to realize that the first axiom scheme was unnecessary, of course, and this would probably have to happen as a surprising discovery after playing around with the system.

(One would also naturally wonder if it would be possible to prove the second axiom from the first and third, or the third axiom from the first and second, instead of the first axiom from the second and third; or even the second axiom from the third alone, or the third axiom from the second alone. Presumably it is impossible in all cases, but I don’t know how prove this.)

Nevertheless, I feel a lot more comfortable with Hilbert systems now that I can see how one might be directed towards their definition.

Tricks in computer arithmetic

(This post is also available as a PDF, with better typesetting for the mathematical formulas and syntax highlighting for the code listings.)

People who do a lot of mental arithmetic often make use of “tricks” to carry out calculations. An example familiar to most people is that you can multiply an integer represented by its decimal expansion by 10 by simply adding an extra 0 digit: for example, 321 times 10 is 3210. Another trick, which not so many people are familiar with, is that in order to determine whether an integer is divisible by 3 it suffices to examine the sum of the digits in its decimal expansion: the original integer is divisible by 3 if and only if this sum is. For example, 321 must be divisible by 3 because 3 + 2 + 1 = 6, and 6 is divisible by 3. The defining characteristic of tricks like these is that they enable people to do less calculations to reach their result, reducing the time taken, the cognitive effort required and the likelihood of error, while at the same time only being applicable in a limited range of circumstances, so that they are unable to fully replace the more cumbersome but more general algorithms that are taught to us in school.

It might come as a surprise to learn that computers also make use of such tricks. Computers can achieve much greater speeds and have a greater working memory capacity than humans, and it hardly matters how much effort they have to go to calculate things; and they hardly ever make errors, provided they are making use of a correct algorithm.1 So one might think they wouldn’t need to resort to tricks. But computers often need to perform lots of arithmetic calculations in a short amount of time, and in such circumstances, any slight speed up in one of the individual operations can have an arbitrarily large effect on the speed of the whole procedure, depending on how many times the operation needs to be repeated. So it’s primarily the fact that tricks increase the speed of calculation that makes them worthwhile for computers.

A big difference between computers and humans, when it comes to arithmetic, is that computers represent integers by their binary expansions, rather than their decimal expansions. But a lot of the tricks humans use can still be transferred fairly straightforwardly to the binary environment. For example, whereas adding a zero digit multiplies the value of a decimal expansion by 10, adding a zero digit multiplies the value of a binary expansion by 2. So computers are best at multiplying by 2, rather than by 10. This is actually a lot more useful—it’s more often necessary to double a quantity than to multiply it by 10.

What about the trick for checking whether an integer is divisible by 3? Does that have a binary counterpart? Well, let’s think about how this trick works. It’s basically a consequence of the fact that 10 is congruent to 1 modulo 3. If we have an integer x whose decimal expansion is made up of the digits d0, d1, …and dn (in that order, from least significant to most significant), then we have the equation
x = d_0 + 10 d_1 + \dotsb + 10^n d_n = \sum_{k = 0}^n 10^k d_k.

Now, if we only care about congruence modulo 3, we can replace the terms in the sum on the right-hand side with terms that are congruent modulo 3 to the original terms. We can also replace factors within those terms by factors that are congruent modulo 3 to the original factors. In particular, we can replace the 10s by 1s. Since 1 is the multiplicative identity, this allows us to eliminate the factors of 10. Therefore, we have the congruence
x \equiv \sum_{k = 0}^n d_k \pmod 3.

That is, the value of x modulo 3 is the same as the value modulo 3 of the sum of the digits in the decimal expansion of x. This proves that the trick works, because x being divisible by 3 is equivalent to x being congruent to 0 modulo 3. In fact it gives us two more tricks: an integer is 1 plus a multiple of 3 if and only if the sum of its digits is also 1 plus a multiple of 3, and an integer is 1 minus a multiple of 3 if and only if the sum of its digits is also 1 minus a multiple of 3.

Now, if d0, d1, …and dn are binary bits rather than decimal digits, then we must start with the equation
x = d_0 + 2 d_1 + \dotsb + 2_n d^n = \sum_{k = 0}^n 2^k d_k,
with the digits being multiplied by powers of 2 rather than powers of 10. The number 2 is congruent to −1, not 1, modulo 3. But this still allows us to do some simplification, since ( − 1)k is 1 for even integers k and −1 for odd integers k. The congruence simplifies to
x \equiv \sum_{k = 0}^n (-1)^k d_k \pmod 3,
showing that x is congruent modulo 3 to the alternating sum of its bits.

A computer could calculate this alternating sum by simply iterating over the bits of x, alternating between an adding and subtracting state. However, this wouldn’t be very efficient; it would require as many iterations as x has bits, which means it would likely be no quicker than simply using the division algorithm a lot of the time.

It is possible to do better. Rather than taking each bit as a unit, we can take each two-bit segment as a unit. This will eliminate the need to explicitly alternate between addition and subtraction. So, assuming n is odd (so that x has an even number of bits, including the one with place value 1), we may write
x \equiv \sum_{k = 0}^{\frac {n - 1} 2} (d_{2k} - d_{2k + 1}) \pmod 3.

Now here’s something neat: −1 is congruent to 2 modulo 3! So we can also write this congruence as
x \equiv \sum_{k = 0}^{\frac {n - 1} 2} (d_{2k} + 2d_{2k + 1}) \pmod 3.
Now, for every integer k between 0 and \frac {n - 1} 2 inclusive, the sum d2k + 2d2k + 1 is just the value of the two-bit integer whose bits are d2k (least significant) and d2k + 1 (most significant). So we can state the binary rule for divisibility by 3 as follows:

A binary expansion has a value divisible by 3 if and only if the sum of the values of its two-bit segments, interpreted as independent binary expansions, is divisible by 3. More generally, its value is congruent to this sum modulo 3.

This rule can be straightforwardly applied in a computer program. It’s just a matter of summing segments. Once we have the sum, we can use a lookup table to determine its value modulo 3, since the set of possible values of the sum will be much smaller than the set of possible values of the original integer. The summing of the segments can be done in parallel using bitwise operations in an unrolled loop. Here’s an implementation in the C programming language.

unsigned mod3(unsigned x) {
    static unsigned TABLE[48] = {
        0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2,
        0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2,
        0, 1, 2, 0, 1, 2

    /* Sum adjacent 2-bit segments in parallel. Note that 0x33333333 is
    8 repetitions of the bit sequence 0011, and 0xcccccccc is 8
    repetitions of the bit sequence 1100. */
    x = (x & 0x33333333) + ((x & 0xcccccccc) >> 2);
    /* Sum adjacent 4-bit segments in parallel. Note that 0x0f0f0f0f is
    4 repetitions of the bit sequence 00001111, and 0xf0f0f0f0 is 4
    repetitions of the bit sequence 11110000. */
    x = (x & 0x0f0f0f0f) + ((x & 0xf0f0f0f0) >> 4);

    /* Sum adjacent 8-bit segments in parallel. Note that 0xff00ff00 is
    2 repetitions of the bit sequence 0000000011111111, and 0xf0f0f0f0
    is 2 repetitions of the bit sequence 1111111100000000. */
    x = (x & 0xff00ff00) + ((x & 0x00ff00ff) >> 8);

    /* Sum the two 16-bit segments. */
    x = (x & 0x0000ffff) + ((x & 0xffff0000) >> 16);

    return TABLE[x];

Unfortunately, this implementation does not appear to actually be more efficient than a regular modulo operation. I wrote a profiler for the routine (the source code is available on GitHub at https://github.com/Andrew-Foote/odds-and-ends/blob/master/mod3.c—note that it is not written to be portable) and ran it on Windows using Microsoft’s Visual C++ compiler. I also profiled the calculation of values modulo 3 using the ordinary modulo operator for comparison. 16777216 calls to the my subroutine took about 200 milliseconds, but 16777216 ordinary modulo operations took only about 150 milliseconds.

Of course, it may be that the method could be more efficient than a regular modulo operation if my code was better. I’m not very experienced with this kind of programming.

Another trick

Although our trick of summing the 2-bit segments didn’t pay off, we can find a trick that does pay off by simply taking a C program that computes values modulo 3 in the obvious way, using the modulo operator, and compiling this program with an optimizing compiler. An optimizing compiler will optimize whatever it can, so if there is a trick that can be used to calculate values modulo 3 more efficiently, the compiler should make use of it.

To see the assembly output from a compiler, there’s no need to actually run a local compiler: Matt Godbolt’s Compiler Explorer tool at https://godbolt.org has got you covered. It’s a very neat website that lets you choose from an array of different compilers for different languages hosted on its own servers, so that you can quickly compare outputs.

Here’s the code for a C function which does an ordinary modulo operation. It deals with unsigned (non-negative) integers only, to keep things maximally simple.

unsigned mod3(unsigned x) {
    return x % 3;

Compiling with the GNU C Compiler (GCC), version 8.3, on an x86-64 architecture and using the -O3 flag (for maximal standards-compliant optimization), we get this assembly output:

  mov eax, edi
  mov edx, -1431655765
  mul edx
  mov eax, edx
  shr eax
  lea eax, [rax+rax*2]
  sub edi, eax
  mov eax, edi

This clearly isn’t just doing a div. So what’s going on? In case you can’t read x86-64 assembly, the assembly subroutine is effectively using the following formula2 to compute the value modulo 3 of the argument x:
x \bmod 3 = x - 3 \left \lfloor \frac {2863311531x} {2^{33}} \right \rfloor.

In general, the remainder of an integer a on division by another integer b can be calculated by subtracting the quotient yielded by the same division from a. So the assembly code is really calculating \lfloor \frac x 3 \rfloor first, and then calculating the remainder using this value. The interesting part is the way in which it computes \lfloor \frac x 3 \rfloor . Apparently, for nonnegative integers less than 232, we have
\left \lfloor \frac x 3 \right \rfloor = \left \lfloor \frac {2863311531x} {2^{33}} \right \rfloor. \quad (1)

Indeed, if you replace the modulo operation in our mod3 function with an integer division operation (and rename the function with the more appropriate name of quo3) you’ll see the assembly output below in the Compiler Explorer:

  mov eax, edi
  mov edx, -1431655765
  mul edx
  mov eax, edx
  shr eax

This is just equation (1) in x86 assembly language.

So, why does this equation hold? Well, let’s have a look at this mysterious constant 2863311531 that turns up in it. Often, when computers appear to be using a mysterious constant, things make more sense when you look at the binary expansion of the constant. The binary expansion of 2863311531 is this:
Aha! It’s just 15 repetitions of the two-bit sequence 10, with an extra two 1 bits on the end. Another way to put it is that it’s m + 1 where m is the integer whose binary expansion is 16 repetitions of the two-bit segment 10.

What can we do with this knowledge? Well, a repeating sequence of a number of bits or digits is nothing more than a geometric series. Let’s write m as a geometric series:
m = \sum_{k = 0}^{15} 2^{2k + 1} = \sum_{k = 0}^{15} 2 \cdot 2^{2k} = 2 \sum_{k = 0}^{15} 2 \cdot 4^k.
This geometric series has initial value 2, common ratio 4 and 16 terms. Therefore, it can be evaluated as the fraction
2 \frac {4^{16} - 1} {4 - 1} = 2 \frac {2^{32} - 1} 3 = \frac {2^{33} - 2} 3.
Now a divisor of 3 has turned up, which is promising.

In the formula, we actually multiply x by m + 1, not m itself. If we add 1 to the fraction above, we get
\frac {2^{33} + 1} 3.
Multiplying by 2−33x, this comes out as
\frac {x + 2^{-33} x} 3 = \frac x 3 + \frac {2^{-33} x} 3.
Now, since x < 232, we have
\frac {2^{-33} x} 3 < \frac 1 6.
Since \frac x 3 is an integer divided by 3, it is impossible for its floor to change when you add a real number less than 1/3 to it. Since \frac {2^{-33} x} 3 < \frac 1 6 , this proves equation (1). QED.

This technique readily generalizes to even word sizes other than 32, by the way. If the word size is an arbitrary even integer n, then to compute \lfloor \frac x n \rfloor we just have to calculate
\left \lfloor 2^{-(n + 1)} x \left( 1 + \sum_{k = 0}^{n/2 - 1} 2^{2k + 1} \right) \right \rfloor.

What about moduli other than 3? If you play around with the Compiler Explorer, you’ll see that GCC uses roughly the same sequence of operations to calculate values modulo any constant, so there is a general trick at work here. I may try to reverse-engineer it in another post soon (or I may not; I don’t have a great track record of completing planned sequences of posts on this blog 🙂 )

  1. Absolutes are rarely true, of course. All else being equal, it’s better for a computer to have a longer battery life, and this is facilitated by it not having to carry out too many complex operations. But this isn’t a critical concern, compared to things like the capacity of the computer to do what the user wants in a reasonable amount of time, considering that its battery can always be recharged. Likewise, there is a small chance of a freak mechanical failure with every operation, so the more operations are done, the more likely such errors are; however, the base chance is still so low that this hardly ever becomes a matter of concern to users.

  2. Careful readers might wonder whether this formula is oversimplified, since the multiplication of \lfloor \frac {2863311531x} {2^{33}} \rfloor by 3 might result in overflow, in which case it would be (3 \lfloor \frac {2863311531x} {2^{33}} \rfloor) \bmod {2^{32}} that would be subtracted from x, not the full product 3 \lfloor \frac {2863311531x} {2^{33}} \rfloor . However, this multiplication will actually never overflow. You can convince yourself of this by considering the case where x is as large as possible, i.e. x = 232 − 1.

The difference between key-value containers and functions

I was wondering why all of the programming languages I know treat key-value containers (i.e. sequences and associative arrays) differently from functions. From mathematics, I’m used to thinking of key-value containers as functions of one argument, taking a key as input and giving a value as output. Treating key-value containers as functions would have some nice effects, such as allowing the free use of infinite containers and allowing the use of function composition in place of the higher-order function normally called map.

Here’s my guess: one major difference between key-value containers and functions in most programming languages is that the values in containers are precomputed, while the values of functions are computed only when the function is called on the argument. This can be illustrated if we try using function composition in place of map in Racket:

(define (my-map f xs)
  (lambda (i)
    (f (xs i))))

(define (my-sequence i)
  (match i
    (0 1)
    (1 4)
    (2 9)))

(define (write-and-square x)
  (writeln x)
  (* x x))

(writeln "squared-list:")

; displays 1, 4 and 9
(define squared-list (map write-and-square (list 1 4 9)))

(writeln "my-sequence-squared created:")

; nothing displayed
(define my-sequence-squared (my-map write-and-square my-sequence))

(writeln "my-sequence-squared accessed:")

(writeln (my-sequence-squared 0)) ; displays 1 and 1
(writeln (my-sequence-squared 1)) ; displays 4 and 16
(writeln (my-sequence-squared 2)) ; displays 9 and 81

When we use Racket’s map function to define squared-list, the members of the list have write-and-square applied right away, and we see the side-effect of the function. But when we use our my-map function, the evaluation of write-and-square on a given key is delayed until the value the key maps to is actually requested. This is a sort of lazy evaluation. Although lazy evaluation has its uses, it can lead to inefficiency if a key is accessed repeatedly and the value it maps to is fully recomputed again and again needlessly without a cache being stored in memory.

To deal with this, we can use memoization. However, since an arbitrary function may have an infinite domain, we have to choose which values to memoize. Here’s a Racket function for carrying out the memoization (using some eval black magic to dynamically write a pattern match):

(define (memoize mapping keys)
  (define cases (map (lambda (key)
    (list key (mapping key))) keys))

  (lambda (key)
    (eval (append
      (list 'match key)
      (list (list '_ (list mapping key)))))))

And here’s a test showing the function has the expected behaviour.

(writeln "my-sequence-squared-2 created:")

; displays nothing
(define my-sequence-squared-2 (my-map write-and-square my-sequence))

(writeln "my-sequence-squared-3 created:")

; displays 1 and 4
(define my-sequence-squared-3 (memoize my-sequence-squared-2 (list 0 1)))

(writeln "my-sequence-squared-3 accessed:")

(my-sequence-squared-3 1) ; displays 16
(my-sequence-squared-3 2) ; displays 9 and 81

Now, for functions with finite domain, a natural default is to memoize all of the values. But how do we do that? Given an arbitrary function, we don’t know whether its domain is finite, and even if its domain is finite, we don’t know how to iterate through all of the domain’s members. So we can’t just add some extra code in my-map saying something like “if the function has a finite domain, memoize each of the values of the composition”. Our options are to either require users of my-map to explicitly memoize whatever specific values need memoizing, or add some extra information to the functions themselves specifying whether they have a finite domain and how they can be iterated over, which my-map can then take advantage of.

A natural way to supply this information is to use objects of a distinct type, a “container type”, instead of functions on finite domains. Any object with a container type can be assumed to be finite. Furthermore, we can have multiple container types, each associated with their own iteration protocol.

This, then, is the purpose, or at least a purpose, of having containers as a distinct concept from functions. Containers are functions together with the necessary additional information required to memoize all of their values behind the scenes whenever they are composed.

Source code for this blog post (note: the last couple of lines give an error when the file is run in DrRacket, but if you type them into the REPL after running the rest of the program they work fine, so I assume the problem is DrRacket’s and not mine 🙂 )

Reversing a linked list in place

The other day I finally realized how you reverse a linked list in place.

I had attempted to tackle this problem several times before, and had always ended up looking up the answer online, copying and pasting some code given in the answer and seeing that it worked, but not really understanding how it worked, as evidenced by my failing to remember how to do it the next time. Now, however, I’m confident that I can remember it for the long term.

It’s remarkable how unhelpful the online resources were here. Consider this description of the algorithm from GeeksforGeeks.com, which was the first result Google gave me for “reversing a linked list” as I was writing this post:

  1. Initialize three pointers prev as NULL, curr as head and next as NULL.

  2. Iterate trough the linked list. In loop, do following.
    // Before changing next of current,
    // store next node
    next = curr->next
    // Now change next of current
    // This is where actual reversing happens
    curr->next = prev
    // Move prev and curr one step forward
    prev = curr
    curr = next

This is a particularly bad explanation—it’s not actually complete, and even the spelling and grammar is poor—but in terms of how it describes the solution it’s not substantially different from the others I saw. It just says you have to use three pointers and do a particular sequence of reassignments until one of them is null. There is little explanation as to why this particular sequence of reassignments works. For the reader, it’s just a magic sequence that has to be memorized.

In order to understand anything complex, you have to use abstractions, to break the complexity into manageable chunks, and analogies, to allow you to apply your existing knowledge about other things. For me, the key to understanding how to reverse a linked list was realizing that what you do to the list in the loop above can be understood as simply carrying out the two operations below in succession:

  1. Remove the first item from the list.

  2. Attach this item to the front of a new list.

The processes of attachment and removal can be readily visualised, so they allow us to draw on our spatial intuition to reason about data structures. It’s obvious that if we keep removing items from the original list and attaching them to the new list until there are no items left to remove from the original list, then the new linked list will end up having the same items the original list originally had, but in reverse order. The new list can then be assigned to the variable holding the original list.

It’s really an incredibly straightforward algorithm. I think the reason it took me so long to figure it out was that I was disregarding all ideas for solutions that on a conceptual level involved moving items into a new list and then reassigning the variable. This was a case where spatial intuition was leading me astray: such solutions are conceptually not “in-place”, so I assumed any such solution would have be actually not in-place; an amount of space proportional to the size of the list would need to be used in order to store the new list. Of course, I was forgetting that you can remove items from the original list as you attach them to the new list, so that no additional space needs to be used that wasn’t already being used to store the original list.

Here’s the code I ended up writing (in C):

#include <stdlib.h>

struct node {
    int value;
    struct node *next;

void list_push(struct node **list, struct node *node) {
    node->next = *list;
    *list = node;

struct node *list_pop(struct node **list) {
    struct node *node = *list;
    *list = node->next;
    return node;

void list_reverse(struct node **list) {
    struct node *new_list = NULL;
    while (*list)
        list_push(&new_list, list_pop(list));
    *list = new_list;

Note that if you inline the list_push and list_pop functions in list_reverse, you get what is essentially the loop with three pointers that all the online resources were talking about:

void list_reverse(struct list **list) {
    struct list *new_list = NULL;
    while (*list) {
        struct list *node = *list;
        *list = node->next;
        node->next = new_list;
        new_list = node;
    *list = new_list;

Only the variable names are different: list, node and new_list correspond to the variables next, curr and prev respectively in the GeeksForGeeks.com description.

There are of course different ways of conceptualizing what happens during the loop. The conceptualization described above, where we move the items into a new list by attaching to the front, is just the one that came most naturally to me. From the variable names the writer of that GeeksForGeeks.com article used, it’s evident that their conceptualization was different—prev doesn’t make a lot of sense as a name for a new container we’re placing items into.

Now that I look at it again, I see that the GeeksforGeeks.com article includes an animation which illustrates the conceptualization they were probably using. You can think of the linked list as a bunch of items connected by arrows, where the arrows represent pointers. To reverse the list, you just flip the arrows!

Once you conceptualize the problem in this way, I think it’s fairly straightforward to work out how to do it. You have to iterate over the list, keeping both the current node and the previous node in memory so that you can point the current node to the previous one. The previous one either will have been already set to point to one before it, or is the first node in the list, in which case it needs to be pointed to NULL. We’ll also need make temporary copies of what the current node was originally pointing to on each iteration, so we can move on to the next node even after we change what the current node is pointing to. So with this conceptualization, we can easily see that we need three pointers, and that prev, curr and next are appropriate names for them.

void list_reverse(struct list **list) {
    struct list *prev = NULL;
    struct list *curr = *list;
    while (curr) {
        struct list *next = curr->next;
        curr->next = prev;
        prev = curr;
        curr = next;

This makes it more clear where the GeeksforGeeks.com article was coming from. To its writer, the sequence of assignments in the loop probably seemed like a simple atomic operation—“make the current item point to the previous one”. But the writer failed to explicitly point out that they were making use of this conceptualization, and the clarity of their explanation for a beginner suffered because of it.

It’s likely that they weren’t particularly aware that they were relying on this particular conceptualization, or that there were alternatives. I think this is a general problem with trying to explain things to other people. People tend to stick with the first conceptualization of a problem that proves helpful for them, and think only in those terms. Even if they are aware of alternative conceptualizations, they may think of those conceptualizations as misunderstandings of the problem, when it’s really just a case of different conceptualizations having their own advantages and disadvantages and suiting different thinking styles. Even if some conceptualizations are less helpful than others, it may be best to allow a beginner to work with the conceptualization they find most natural as a starting point.

Anyway, all this has got me curious—if there is anybody still reading who has ever tried to reverse a linked list, how do you prefer to conceptualize the linked list reversal problem? Moving items, or flipping pointers, or something else? Leave a comment—I’d be interested to hear your answer.

Voles and Orkney

What do voles and Orkney have to do with one another? One thing somebody knowledgeable about British wildlife might be able to tell you is that Orkney is home to a unique variety of the common European vole (Microtus arvalis) called the Orkney vole.

The most remarkable thing about the Orkney vole is that the common European vole isn’t found anywhere else in the British Isles, nor in Scandinavia—it’s a continental European animal. That raises the question of how a population of them ended up in Orkney. During the last ice age, Orkney was covered by a glacier and would have been uninhabitable by voles; and after the ice retreated, Orkney was separated from Great Britain straight away; there were never any land bridges that would have allowed voles from Great Britain to colonize Orkney. Besides, there is no evidence that M. arvalis was ever present on Great Britain, nor is there any evidence that voles other than M. arvalis were ever present on Orkney; none of the three species that inhabit Great Britain today (the field vole, Microtus agrestis, the bank vole, Myodes glareolus, and the water vole, Arvicola amphibius) were able to colonize Orkney, even though they were able to colonize some islands that were originally connected to Great Britain by land bridges (Haynes, Jaarola & Searle, 2003). The only plausible hypothesis is that the Orkney voles were introduced into Orkney by humans.

But if the Orkney voles were introduced, they were introduced at a very early date—the earliest discovered Orkney vole remains have been carbon-dated to ca. 3100 BC (Martínkova et al., 2013)—around the same time Skara Brae was first occupied, to put that in context. The only other mammals on the British Isles known to have been introduced at a similarly ancient date or earlier are the domestic dog and the domestic bovines (cattle, sheep, goats)—even the house mouse is not known to have been present before c. 500 BC (Montgomery, 2014)! The motivation for the introduction remains mysterious—voles might have been transported accidentally in livestock fodder imported from the Continent, or they might have been deliberately introduced as pets, food sources, etc.; we can only speculate. It’s interesting to note that the people of Orkney at this time seem to have been rather influential, as they introduced the Grooved Ware pottery style to other parts of the British Isles.

Anyway, there is in fact another interesting connection between voles and Orkney, which has to do with the word ‘vole’ itself. Something you might be aware of if you’ve looked at old books on British wildlife is that ‘vole’ is kind of a neologism. Traditionally, voles were not thought of as a different sort of animal from mice and rats. The relatively large animal we usually call the water vole today, Arvicola amphibius, was called the ‘water rat’ (as it still is sometimes today), or less commonly the ‘water mouse’. The smaller field vole, Microtus agrestis, was often just the ‘field mouse’, not distinguished from Apodemus sylvaticus, although it was sometimes distinguished as the ‘water mouse’ or the ‘short-tailed field mouse’ (as opposed to the ‘long-tailed field mouse’ A. sylvaticus—if you’ve ever wondered why people still call A. sylvaticus the ‘long-tailed field mouse’, even though its tail isn’t much longer than that of other British mice, that’s probably why!) The bank vole, Myodes glareolus, seems not to have been distinguished from the field vole before 1832 (the two species are similar in appearance, one distinction being that whereas the bank vole’s tail is about half its body length, the field vole’s tail is about 30% to 40% of its body length).

As an example, a reference to a species of vole as a ‘mouse’ can be found in the 1910 edition of the Encyclopedia Britannica:

The snow-mouse (Arvicola nivalis) is confined to the alpine and snow regions. (vol. 1, p. 754, under “Alps”)

Today that would be ‘the snow vole (Chionomys nivalis)’.

A number of other small British mammals were traditionally subsumed under the ‘mouse’ category, namely:

  • Shrews, which were often referred to as shrewmice from the 16th to the 19th centuries, although ‘shrew’ on its own is the older word (it is attested in Old English, but its ultimate origin is unknown).
  • Bats, which in older language could also be referred to by a number of whimsical compound words, the oldest and most common being rearmouse, from a now-obsolete verb meaning ‘stir’, but also rattlemouse, flindermouse, flickermouse, flittermouse and fluttermouse. The word rearmouse is still used today in the strange language of heraldry.
  • And, of course, dormice, which are still referred to by a compound ending in ‘-mouse’, although we generally don’t think of them as true mice today. The origin of the ‘dor-‘ prefix is uncertain; the word is attested first in c. 1425. There was an Old English word sisemūs for ‘dormouse’ whose origins are similarly mysterious, but the -mūs element is clearly ‘mouse’.

There is still some indeterminacy about the boundaries of the ‘mouse’ category when non-British rodent species are included: for example, are birch mice mice?

So, where did the word ‘vole’ come from? Well, according to the OED, it was first used in a book called History of the Orkney Islands (available from archive.org), published in 1805 and written by one George Barry, who was not a native of Orkney but a minister who preached there. In a list of the animals that inhabit Orkney, we find the following entry (alongside entries for the Shrew Mouse ſorex araneus, the [unqualified] Mouse mus muſculus, and the [unqualified] Field Mouse mus sylvaticus):

The Short-tailed Field Mouse, (mus agreſtis, Lin. Syſt.) which with us has the name of the vole mouſe, is very often found in marſhy grounds that are covered with moſs and ſhort heath, in which it makes roads or tracks of about three inches in breadth, and ſometimes miles in length, much worn by continual treading, and warped into a thouſand different directions. (p. 320)

So George Barry knew vole mouse as the local, Orkney dialectal word for the Orkney vole, which he was used to calling a ‘short-tailed field mouse’ (evidently he wasn’t aware that the Orkney voles were actually of a different species from the Scottish M. agrestis—I don’t know when the Orkney voles’ distinctiveness was first identified). Now, given that vole mouse was an Orkney dialect word, its further etymology is straightforward: the vole element is from Old Norse vǫllr ‘field’ (cf. English wold, German Wald ‘forest’), via the Norse dialect once spoken in Orkney and Shetland (sometimes known as ‘Norn’). So the Norse, like the English, thought of voles as ‘field mice’. The word vole is therefore the only English word I know, that isn’t about something particularly to do with Orkney or Shetland, that has been borrowed from Norn.

Of course, Barry only introduced vole mouse as a Orcadianism; he wasn’t proposing that the word be used to replace ‘short-tailed field mouse’. The person responsible for that seems to have been the author of the next quotation in the OED, from an 1828 book titled A History of British Animals by University of Edinburgh graduate John Fleming (available from archive.org). On p. 23, under an entry for the genus Arvicola, Fleming notes that

The species of this genus differ from the true mice, with which the older authors confounded them, by the superior size of the head, the shortness of the tail, and the coarseness of the fur.

He doesn’t explain where he got the name vole from, nor does he seem to reference Barry’s work at all, but he does list alternative common names of each of the two vole species he identifies. The species Arvicola aquatica, which he names the ‘Water Vole’ for the first time, is noted to also be called the ‘Water Rat’, ‘Llygoden y dwfr’ (in Welsh) or ‘Radan uisque’ (in Scottish Gaelic). The species Arvicola agrestis, which he names the ‘Field Vole’ for the first time, is noted to be also called the ‘Short-tailed mouse’, ‘Llygoden gwlla’r maes’ (in Welsh), or “Vole-mouse in Orkney”.

Fleming also separated the shrews, bats and dormice from the true mice, thus establishing division of the British mammals into basic one-word-labelled categories that we are familiar with today. With respect to the other British mammals, the naturalists seem to have found the traditional names to be sufficiently precise: for example, each of the three quite similar species of the genus Mustela has its own name—M. erminea being the stoat, M. nivalis being the weasel, and M. putorius being the polecat.

Fleming still didn’t distinguish the field vole and the bank vole; that innovation was made by one Mr. Yarrell in 1832, who exhibited specimens of each to the Zoological Society, demonstrated their distinctiveness and gave the ‘bank vole’ (his coinage) the Latin name Arvicola riparia. It was later found that the British bank vole was the same species as a German one described by von Schreber in 1780 as Clethrionomys glareolus, and so that name took priority (and just recently, during the 2010s, the name Myodes has come to be favoured for the genus over Clethrionomys—I don’t know why exactly).

In the report of Yarrell’s presentation in the Proceedings of the Zoological Society the animals are referred to as the ‘field Campagnol‘ and ‘bank Campagnol‘, so the French borrowing campagnol (‘thing of the field’, still the current French word for ‘vole’) seems to have been favoured by some during the 19th century, although Fleming’s recognition of voles as distinct from mice was universally accepted. The word ‘vole’ was used by other authors such as Thomas Bell in A History of British Quadrupeds including the Cetacea (1837), and eventually the Orcadian word seems to have prevailed and entered ordinary as well as naturalists’ usage.


Haynes, S., Jaarola, M., & Searle, J. B. (2003). Phylogeography of the common vole (Microtus arvalis) with particular emphasis on the colonization of the Orkney archipelago. Molecular Ecology, 12, 951–956.

Martínkova, N., Barnett, R., Cucchi, T., Struchen, R., Pascal, M., Pascal, M., Fischer, M. C., Higham, T., Brace, S., Ho, S. Y. W., Quéré, J., O’Higgins, P., Excoffier, L., Heckel, G., Rus Hoelzel, A., Dobney, K. M., & Searle, J. B. (2013). Divergent evolutionary processes associated with colonization of offshore islands. Molecular Ecology, 22, 5205–5220.

Montgomery, W. I., Provan, J., Marshal McCabe, A., & Yalden, D. W. (2014). Origin of British and Irish mammals: disparate post-glacial colonisation and species introductions. Quaternary Science Reviews, 98, 144–165.