Tag Archives: definitions

Set isomorphisms

There doesn’t seem to be any standard adjective to describe the property of a pair of sets having the same cardinality. Some authors use ‘equivalent’, but this adjective could be used for any equivalence relation; it might be useful to reserve it for the equivalence relation which is currently most prominent in the context. Other use ‘equinumerous’, and I guess that’s fine. There’s a natural counterpart of this adjective for the property of one set having cardinality less than or equal to the other, too, viz. ‘subnumerous’, although I don’t know if anyone actually uses this term. Anyway, if I was going to write a textbook on set theory, I think I would go for the adjective ‘isomorphic’. Consider the definition of a structure in abstract algebra, which can be stated as follows.

Definition 1. Structures are ordered tuples of the form (A, R_0, R_1, \dotsc, R_{n - 1}), where A is a set called the underlying set of the structure, and for every natural number i such that i < n, there is a natural number k, called the arity of R_i, such that R_i is a subset of the Cartesian product A^k. R_i is called the ith relation of the structure.

A set can be seen as a degenerate structure with no relations. Now, consider the definition of a homomorphism:

Definition 2. For every pair of structures (A, R_0, R_1, \dotsc, R_{n - 1}) and (B, S_0, S_1, \dotsc, S_{n - 1}) with the same number of relations and with the property that for every natural number i such that i < n, R_i and S_i have the same arity, homomorphisms are mappings f from A to B such that for every natural number i such that i < n and every k-tuple of members a_0, a_1, … and a_{k - 1} of A (where k is the common arity of R_i and S_i), (a_1, a_2, \dotsc, a_{k - 1}) \in R_i if and only if (f(a_1), f(a_2), \dotsc, f_(a_{k - 1})) \in S_i.

In the trivial case where the two structures have no relations, a homomorphism between the structures is simply a function between their underlying sets. Accordingly, an isomorphism between the structures is simply a bijection between their underlying sets. Since we say that two structures are isomorphic if and only if there is an isomorphism between them, two sets are isomorphic if and only if there is a bijection between them, i.e. they have the same cardinality. That’s the reasoning behind using the adjective ‘isomorphic’. Since injective homomorphisms are sometimes called monomorphisms, and surjective homomorphisms are sometimes called epimorphisms, for every pair of sets A and B, we can also say that A is monomorphic to B if and only if |A| \le |B| and A is epimorphic to B if and only if |B| \le |A|. The only problem I can see with this terminology is that since structures are often referred to by their underlying set when it is clear from context which relations should be taken as part of the structure; for example, people might speak of the ring \mathbb Z of integers and the ring \mathbb Q of rational numbers. So the statement that \mathbb Z and \mathbb Q are isomorphic might be taken to mean the rings \mathbb Z and \mathbb Q are isomorphic, which is false. Of course, this problem can be resolved just by saying ‘the sets \mathbb Z and \mathbb Q are isomorphic’ or ‘\mathbb Z and \mathbb Q are isomorphic as sets’.

Advertisements

0.999…, 0.000…1 and other numbers with infinitely many digits

The `number’ 0.000…1 sometimes comes up in the discussion of the infamous question of whether 0.999… = 1 (in case you didn’t know, 0.999… is equal to 11). For every natural number {n}, {1 - 0.99\dotso9} (where there are {n} 9s in the subtracted number) is 0.00…1 (in which there are {n} 0s, including the one before the decimal point), so naturally, people generalise the pattern to the case where there are infinitely many {n}s (although you can’t always generalise in this way), and conclude that {1 - 0.999\dotso} (where there are infinitely many 9s in the second number) is 0.000…1 (in which there are infinitely many 0s), which seems to be a positive number, not 0 as it should be if {0.999\dotso = 1}.

Even though it’s wrong, this argument is quite an intelligent one. Its problem is that it relies on a lack of understanding of exactly how decimal notation works, which is understandable, given that the subtleties of decimal notation are rarely given a proper explanation in maths education. It’s easy to understand how decimal notation works when there are only finitely many digits, and I think most people understand it on some level (though they may not be able to articulate it in the way I’m about to articulate it). Given a string of digits with a decimal point in between one pair of consecutive digits, one assigns an index to each digit {d} by looking at its position relative to the decimal point. If {d} is in front of the decimal point, the index is the number of digits between {d} and the decimal point, with {d} not being counted among these digits, and if {d} is behind the decimal point, the index is the negation of the number of digits between {d} and the decimal point, with {d} being counted among these digits. In other words, the digit in front of the decimal point has index 0, the digit behind it has index {-1}, and the other indices continue the sequence. For example, in 17.25, the digit 1 has index 1, the digit 7 has index 0, the digit 2 has index {-1} and the digit 5 has index {-2}. Then, one calculates the place value associated with {d} as {d \times 10^n}, where {n} is the index of {d}. The number denoted by the string of digits is simply the sum of all the place values. For example, 17.25 denotes the number {1 \times 10^1 + 7 \times 10^0 + 2 \times 10^{-1} + 5 \times 10^{-2}}. The number denoted is always non-negative, but we can denote its negation simply by putting a minus sign in front.

Decimal notation with finitely many digits is not very powerful. Every integer can be denoted in this way, but not every rational number (rational numbers are numbers that can be expressed as fractions whose numerators and denominators are both integers, in case you didn’t know). For example, {\frac 1 3} cannot be denoted in this way. In fact, the only rational numbers which can be denoted in this way are rational numbers whose denominator is not divisible by any prime number other than 2 or 5 (the prime factors of 10). So, in order to denote every rational number, you need infinitely many digits.

It’s in this case that the subtleties arise. Going from finite to infinite always brings a lot of complications with it. First of all, in order to be able to assign indices to each digit in the same way, we’re going to have to require that, even though there are infinitely many digits, there are finitely many digits between each digit and the decimal point. After all, if there are infinitely many digits between some digit {d} and the decimal point, then the number of digits between {d} and the decimal point does not exist, since infinity isn’t a number. Therefore, we can see a problem with the argument above right away. There are infinitely many digits between the digit 1 in 0.000…1 and the decimal point, so it is impossible to assign an index to this digit, and for this reason, 0.000…1 is not an example of well-formed decimal notation. There are ways to define alternative forms of decimal notation in which 0.000…1 is well-formed (and usually denotes some kind of infinitesimal number, i.e. one which is positive but smaller than every positive rational number), but I’m not going to go into them here. Certainly, none of these ways can be considered standard: if you just start talking about 0.000…1 to a mathematician without defining it first, they will ask you to clarify.

If there are finitely many digits between each digit and the decimal point, we can associate each of the infinitely many digits with a place value in the same way as before. So then the number denoted by the string of digits is just the sum of all these place values, right? Well, not quite. Remember, we have infinitely many digits, hence infinitely many place values, so this will be a sum of infinitely many terms! The ordinary definition of a sum is that it is the value obtained by letting the value be 0 at first, and then adding each term in the sum to the value, one at a time, until every term has been added. But if there are infinitely many terms, then no matter how long this process continues, terms will remain which still have not been added. So it doesn’t make sense to speak of sums of infinitely many terms, if we use this definition.

But just because this definition doesn’t work, that doesn’t mean we can’t define sums of infinitely many terms in some other way. For example, it certainly seems sensible to say that the infinite sum {0 + 0 + 0 + \dotsb} is 0, because the sum of any finite number of 0s is 0, so we can generalise this to the case where there are infinitely many 0s. In general, if only finitely many terms in an infinite sum are not equal to 0, we can just disregard the 0s, since they should have no effect on the value of the sum, and say that the infinite sum is equal to the finite sum consisting of its terms which are not equal to 0. For example, {1 + 2 + 3 + 0 + 0 + 0 + \dotsb = 1 + 2 + 3 = 6}.

These aren’t very interesting examples of infinite sums, since they are basically finite sums in disguise. It is possible to assign values to more interesting infinite sums. To see how this is done, first let me note that one way of stating the reasoning why an infinite sum of the form {x_1 + x_2 + \dotsb + x_n + 0 + 0 + 0 + \dotsb} is equal to the finite sum {x_1 + x_2 + \dotsb + x_n} is this: if we start with the value of 0, and add each term of the infinite sum in order, then, after we add {x_n}, the remaining additions do not change the value at all (since we are just adding 0). Therefore, the value stays at {x_1 + x_2 + \dotsb + x_n}, even if infinitely many additional additions are made, because these additional additions have no effect.

Now, if there are infinitely many terms which are not 0 it is impossible for the value of the incomplete sum to stay equal to a particular number after a finite number of terms are added. But it may be possible for the value to stay within a certain distance of a particular number after a finite number of terms are added, and if it does, then we can conclude that the distance of the sum of all the terms from that particular number must be less than or equal to this distance (by generalising to the infinite case). Remember, the distance between two numbers {x} and {y} is either {x - y} or {y - x}, whichever one is positive (or, if {x - y} and {y - x} are both 0, so that neither is positive, the distance is 0).

An example will help to make this clear. Consider the number denoted by 0.999\textellipsis: {\frac 9 {10} + \frac 9 {100} + \frac 9 {1000} + \dotsb}. For every positive integer {n}, the sum of the first {n} terms in this sum is {\frac 9 {10} + \frac 9 {100} + \dotsb + \frac 9 {10^n}}. What is the distance of this finite sum from 1? The sum is clearly less than 1, because it can be written (using the rule for addition of fractions) as {\frac {10^n - 1} {10^n}}, i.e. {1 - \frac 1 {10^n}}. So the distance is the difference {1 - (1 - \frac 1 {10^n})}, which is just {\frac 1 {10^n}}. Now, for every positive integer {n'} such that {n < n'}, it can be seen in the same way that the distance of the sum of the first {n'} terms from 1 is {\frac 1 {10^{n'}}}, which is less than {\frac 1 {10^n}} since {n < n'}. That is, no matter how many terms are added to the incomplete sum after the {n}th term is added, the distance stays less than {\frac 1 {10^n}}. Therefore, we can conclude that the distance of 0.999… from 1 is less than {\frac 1 {10^n}} for every positive integer {n}. In fact, it is less than every positive rational number, since every positive rational number can be expressed as a fraction {\frac a b}, where {a} and {b} are positive integers, and {\frac 1 {10^b} < \frac a b} (since {b < 10^b}).

Another example is {\frac 1 2 + \frac 1 4 + \frac 1 8 + \dotsb}, the number denoted by 0.111… in binary. By similar reasoning as above, it can be shown that its distance from 1 is less than every positive rational number. Or, you can just look at this neat diagram.

Take a square, divide it in half, divide one of the halves in half (producing two quarters of the original square), divide one of these quarters into half too, and so on, and we see that a whole unit square can be filled by squares and rectangles whose areas are reciprocals of powers of 2 greater than 1.

In cases like this, where we can show that for some rational number {L}, the distance of the infinite sum from {L} is less than {\varepsilon} for every positive rational number {\varepsilon}, it seems natural to assign the infinite sum the value of {L}, because, if the value is a rational number, its distance from {L} must also a rational number, and this distance cannot be positive, otherwise it would have to be less than itself. In other words, {L} is the only rational number which the sum could be, and therefore, {L} is a very natural value to assign to the sum. In the case of 0.999…, {L} is 1, and that’s why we assign 0.999… the value 1. Of course, we could assign values to infinite sums in a different way, but this is the standard way we do it in mathematics. I’m not familiar with any other way of assigning values to infinite sums that is more natural and convenient than this one.

There are a few more remarks I’d like to say about decimal notation. First of all, there are strings of digits such that the sum of the place values cannot be assigned a rational number as a value in this way, since there is provably no rational number {L} such that for every positive rational number {\varepsilon}, the distance of the incomplete sum from {L} stays less than {\varepsilon} once a certain finite number of terms have been added. In fact, the only strings which can be assigned a rational number as a value in this way are those such that the digit at every index greater than some integer {n} is 0 and the other digits consist of a finite sequence repeated indefinitely (for example, 0.333… (\frac 1 3 ) or 0.142857142857142857… (\frac 1 7 )). However, as long as the digit at every index greater than some integer {n} is 0, a similar but slightly weaker condition is satisfied: for every positive rational number {\varepsilon}, once a certain finite number of terms have been added, if one picks any pair of incomplete sums which can be formed by adding on more terms, the distance between these two sums is less than {\varepsilon}. It’s as if the incomplete sums are approaching a number, but that number is not rational. We can therefore define a whole set of new numbers which are called the irrational numbers, with each irrational number being identified with one of these infinite sums which cannot be defined as a rational number. The numbers in the class that is obtained by adding these irrational numbers to the rational numbers are called real numbers. Although I won’t go into the details, it is possible to define how to add and multiply real numbers, by referring to the sequences that define them, in such a way that all the expected properties of addition and multiplication hold, so real numbers do deserve to be called numbers.

Footnotes

  1. ^ But some people just won’t accept it…

Definitions of finiteness

Perhaps the most intuitive definition of finiteness as a property of sets is that a set {A} is finite if and only if its cardinality is a natural number. Unfortunately, the most intuitive definition of the natural numbers, at least to me, is that the natural numbers are the cardinalities of finite sets. Therefore, in order to define both of these notions properly, one of these definitions must be altered. I think the most natural one to alter is the definition of finiteness, since, at least to me, finiteness seems like a more fundamental notion than the notion of the natural numbers.

The alternative definition which I’m most satisfied with is Kuratowski’s, given below. The essential idea behind it is that finite sets are sets that can be formed from the empty set by making a finite number of adjoinments (i.e. additions of a single object). The formal statement may seem a little peculiar if you haven’t seen this kind of definition before.

Definition 1 For every set {A}, {A} is Kuratowski finite if and only if it is a member of every family {\mathcal F} of sets such that {\emptyset \in \mathcal F} and for every member {B} of {\mathcal F} and every object {x}, {B \cup \{x\} \in \mathcal F}.

The key thing to notice here is that it immediately follows from the definition that a variant of mathematical induction can be performed on Kuratowski finite sets. Suppose you wish to prove that every Kuratowski finite set has a certain property. Well, if you let {\mathcal F} be the family of all sets with the property, you can see from the definition that one way to do this is to show that {\emptyset} has the property (i.e. {\emptyset \in \mathcal F}) and that if you adjoin an object to a set with the property, it will still have the property (i.e. for every member {B} of {\mathcal F} and every object {x}, {B \cup \{x\} \in \mathcal F}). Definitions similar to the one above turn up in quite a few places in mathematics, and they are always associated with a principle of induction. For example, consider the definition of the subgroup generated by a subset {H} of a group {G}. The intuitive definition of this subgroup is that it is the set of all objects of the form {a_1 a_2 \dotsm a_n}, where {n} is a natural number and {a_1}, {a_2}, … and {a_n} are members of {H} or inverses of members of {H}. However, this definition can be awkward to work with and it is complex to state as a logical formula (it involves a quantification over all finite sets). Alternatively, one can define this subgroup as the intersection of all subgroups {H'} of {G} such that {H \subseteq H'}. The corresponding principle of induction is that if every member of {H} has a certain property, and the set of all members of the group with the property is a subgroup of {G}, then every member of the subgroup generated by {H} has the property too.

I should note that this definition is not possible in ZF: {\mathcal F} must be allowed to range over proper classes as well as sets, because for every possible value of {\mathcal F}, we have that for every object {x}, {\emptyset \cup \{x\} = \{x\} \in \mathcal F}, so {\mathcal F} is a superclass of the class of all singletons, which is proper by the axiom of replacement, since {x \in V \mapsto \{x\}} is an injection. Quantifiers can only range over sets in ZF. However, a minor amendment makes the definition possible in ZF: we can just require {x} to be a member of {A}. It can be proven that this amended definition (definition 2) is equivalent to the definition as stated above (definition 1).

Proof: Suppose {A} is a finite set by definition 2 and {\mathcal F} is a family of sets such that {\emptyset \in \mathcal F} and for every member {B} of {\mathcal F} and every object {x}, {B \cup \{x\} \in \mathcal F}. Every member {x} of {A} is an object, so for every member {B} of {\mathcal F}, {B \cup \{x\} \in \mathcal F}, which means {A} is a member of {\mathcal F} (by definition 2). Therefore, {A} is finite by definition 1 as well.

As for the converse, well, {\emptyset} is clearly finite by definition 2. Now, suppose {A} is a finite set by definition 2, {x} is an object and {\mathcal F} is a family of sets such that {\emptyset \in \mathcal F} and for every member {B} of {\mathcal F} and every member {y} of {A \cup \{x\}}, {B \cup \{y\} \in \mathcal F}. Since {A} is finite by definition 2, {A \in \mathcal F}, and since {x} is a member of {A \cup \{x\}}, that means {A \cup \{x\} \in \mathcal F}. Therefore, {A \cup \{x\}} is finite by definition 2 as well. It follows by the principle of induction on finite sets by definition 1 that every set which is finite by definition 1 is finite by definition 2 as well. \Box

The reason I’ve chosen definition 1 is that it is easier to see the idea behind the definition.

Of course, we must ensure that Kuratowski finiteness is actually equivalent to ordinary finiteness. Since we can induct over natural numbers and we can induct over Kuratowski finite sets, the proof is straightforward.

Proof: {|\emptyset| = 0 \in \mathbb N}. Suppose {A} is a finite set, {|A| = n} and {x} is an object. What is {|A \cup \{x\}|}? Well, there is a bijection {f} from {A} to {\{0, 1, \dotsc, n - 1\}}, and it is easy to see that the extension {g} of {f} to {A \cup \{x\}} such that {g(x) = n} is a bijection onto {\{0, 1, \dotsc, n\}}. Therefore, {|A \cup \{x\}|} is {n + 1}. This completes the proof that the cardinality of every Kuratowski finite set is a natural number, i.e., every Kuratowski finite set is finite.

The only set whose cardinality is 0 is {\emptyset}, which is Kuratowski finite. Suppose {n} is a natural number, every set whose cardinality is {n} is Kuratowski finite, {A} is a set and {|A| = n + 1}. Then there is a bijection {f} from {A} to {\{0, 1, \dotsc, n\}}. Let {x} be {f^{-1}(n)}. It is easy to see that the restriction {g} of {f} to {A \setminus \{x\}} is a bijection onto {\{0, 1, \dotsc, n - 1\}}, so {|A \setminus \{x\}| = n}. Therefore, {A \setminus \{x\}} is Kuratowski finite. It follows that {(A \setminus \{x\}) \cup \{x\} = A} is Kuratowski finite as well. This completes the proof that every finite set is Kuratowski finite. \Box

The other commonly-seen definition of finiteness is due to Dedekind. The main advantage of this definition is that it is somewhat simpler to state than Kuratowski’s.

Definition 2 For every set {A}, {A} is Dedekind finite if and only if for every proper subset {B} of {A}, {|B| < |A|}.

It is probably possible to prove that every finite set is Dedekind finite by induction, but an easer way to prove this is to prove the contrapositive, that every Dedekind infinite set is infinite. This proof uses the fact that for every set {A}, {A} is infinite if {\aleph_0 \le |A|} ({\aleph_0} is the cardinality of {\mathbb N}), since {n < \aleph_0} for every natural number {n}, so {|A| \ne n}.

Proof: Suppose {A} is a Dedekind infinite set, so there is a proper subset {B} of {A} such that {|A| = |B|}. Since {B \subset A}, there is a member {a} of {B \setminus A}; since {|A| = |B|}, there is a bijection {f} from {A} to {B}. Consider the infinite sequence {a}, {f(a)}, {f^2(a)}, … We can show by induction that this sequence is an injection, so {\aleph_0 \le |A|}, and therefore, since {\mathbb N} is infinite, {A} is infinite.

  • For every positive integer {n}, {f^n(a) = f(f^{n - 1}(a))}, so {f^n(a) \in B}, which means {a \ne f^n(a)} since {a \not \in B}.
  • Suppose {m} is an integer and for every integer {n} such that {m < n}, {f^m(a) \ne f^n(a)}. Then for every integer {n} such that {m + 1 < n}, {m < n - 1}, so {f^m(a) \ne f^{n - 1}(a)}, which means {f^{m + 1} \ne f^n(a)}, since {f} is bijective.

\Box

Now, if the converse of the fact we used holds, i.e. for every infinite set {A}, {\aleph_0 \le |A|}, then we can prove the converse: that every infinite set is Dedekind infinite. The proof relies on the fact that {\mathbb N} has the same cardinality as its proper subset {\{1, 2, 3, \dotsc\}}.

Proof: Suppose {A} is an infinite set. Then {\aleph_0 \le |A|}, so there is an injection {f} from {\mathbb N} to {A}. Let {B} be the image of {f} under {\{1, 2, 3, \dotsc\}}. Then {\aleph_0 \le |B|}, so {B} is infinite, but {B} is a proper subset of {A}, since {f(0) \not \in B}. \Box

The problem with Dedekind’s definition is that one runs into trouble when one tries to prove that for every infinite set {A}, {\aleph_0 \le |A|}, in ZF. The proof can be done easily using the axiom of choice.

Proof: Suppose {A} is an infinite set. If the axiom of choice holds, there is a function {f} on {\mathcal P(A) \setminus \{\emptyset\}} such that for every non-empty subset {B} of {A}, {f(B) \in B}. Let {g} be the sequence such that for every natural number {n}, {g(n) = f(A \setminus \{g(0), g(1), \dotsc, g(n - 1)\})} (this is always defined, because if {A \setminus \{g(0), g(1), \dotsc, g(n - 1)\} = \emptyset}, {A \subseteq \{g(0), g(1), \dotsc, g(n - 1)\}}, so {|A| \le n}, which means {A} is finite). It is clear that {g} is an injection into {A}, so {\aleph_0 \le |A|}. \Box

This doesn’t necessarily mean that the statement is independent of ZF, but certainly it suggests the possibility, since there is no obvious way to amend this proof so that it does not require the axiom of choice.

Perhaps there is another way we could prove that every infinite set is Dedekind infinite? Unfortunately there isn’t, because if every finite set is Dedekind infinite, then for every infinite set {A}, there is a proper subset {B} of {A} such that {|A| = |B|}, so there is a member of {A \setminus B}, and {B} is infinite too, so there is a proper subset {C} of {B} such that {|B| = |C|}, so there is a member of {B \setminus C}, and so on. This allows us to construct an injective sequence of members of {A}, so {\aleph_0 \le |A|}.

Ideally, I would finish off here with a proof that the statement is independent. Unfortunately, I don’t know how to do this. It may be possible to derive the axiom of choice from the fact that every infinite set is Dedekind infinite, in which case this fact cannot be a theorem of ZF (otherwise the axiom of choice would be a theorem of ZF, and the axiom of choice is known to be independent of ZF). On the other hand, this may not be true, in which case I’m not sure what the best way to prove independence would be. The most straightforward way would be to construct a model of ZF in which there are Dedekind finite infinite sets, but I don’t know how you might go about doing that.

The Kuratowski definition of sequences

Sequences are one of the most fundamental kinds of objects dealt with in mathematics, along with sets, relations and functions. The standard definition of sequences is that they are simply functions on an ordered domain. Although this definition is quite nice and simple, it has a couple of flaws. First of all, it doesn’t capture the intuition most people have that a sequence is a kind of object which is ontologically distinct from a function. Secondly, the usual way to formally define a function in set theory is as a set of ordered pairs in which no two members are of the form (a, b) and (a, c) unless b = c. In order to use this definition, we have to define what an ordered pair is first. An ordered pair is simply the simplest kind of sequence; we would like to be able to define it as a function on a set of order type 2, but this is impossible if we haven’t already defined what a function is. Therefore, we have to define an ordered pair in some other way, which means ordered pairs are ontologically distinct from all other kinds of sequence, and that seems very untidy.

The standard definition of ordered pairs in set theory is credited to Kuratowski. By this definition, (a, b) is simply {{a}, {a, b}}. The intersection of the pair is then {a}, while its union is {a, b}, so its first term can be defined as the unique member of its intersection and its second term can be defined as the unique member of its union other than the first term. Well, actually the definition of the second term isn’t quite right. If the first term and the second term are the same, the union of the pair will have no members other than the first term. So the proper definition of the second term is that it is the unique member of the union, or, if there is more than one member of the union, the one which is not the first term.

This complication is the reason why the Kuratowski definition can’t be generalised to finite sequences of greater length. Suppose we made the following definition.

Definition 1 For every natural number n and every collection of n objects x0, x1, … and xn – 1, the sequence of length n whose terms are x0, x1, … and xn – 1 (in that order) is {{x0}, {x0, x1}, …, {x0, x1, …, xn – 1}}.

This would be fine if we only needed to deal with sequences whose terms are all distinct (i.e. injective sequences, if you view sequences as functions). But by this definition, the sequences (a, a and (a, a, a) would be identical: they would both just be (a). Similarly, and more seriously, perhaps, because the sequences are the same length, the sequences (a, b, a) and (a, b, b) would be identical: they would both just be (a, b). In general, you can remove duplicates from sequences defined this way just as you can with sets. The only reason the Kuratowski definition works for ordered pairs is that (a, a), which simplifies to (a), can be distinguished from the set {a}.

For this reason, I’m not really a fan of the Kuratowski definition. Another way of defining the ordered pair is somewhat similar to defining it as a function on a set of order type 2. We can define (a, b) simply as {{0, a}, {1, b}}. This is just as if it was a function on {0, 1}, save that the pairs are unordered. The first term of the pair can be defined as the unique object a such that {0, a} is a member of the pair. The second term of the pair can be defined as the unique object b such that {1, b} is a member of the pair. Unfortunately, we run into similar problems as we had with the Kuratowski definition if we try and extend this idea to define sequences of greater length. For example, we would have (1, 2, 0) = {{0, 1}, {1, 2}, {2, 0}} and (2, 0, 1) = {{0, 2}, {1, 0}, {2, 1}}, and these two sets are identical.

The problem is that the indices (0 and 1 in this case) need to be distinct from the terms in order for the definition to work. The solution, then, would be to find some way of making the indices always distinct. Anyway, today I had a shower thought about one possible solution.

Recall Von Neumann’s definition of the ordinals, in which each ordinal is defined as the set of all preceding ordinals. Let’s make a slight modification to this definition. Let ζ be an arbitrary object.

Definition 2 Ωa, whose members are called the ζ-ordinals, is the intersection of all the sets S such that the following three conditions are satisfied.

  • ζS.
  • For every member ζ of S, ζ ∪ {ζ} ∈ S.
  • For every subset T of S, the union of all the members of T is a member of S.

It’s just like the usual Von Neumann definition of the ordinals, except that we use ζ in place of the empty set. As long a is a well-founded set, these ordinals are, I think, isomorphic to the ordinary ordinals. Be warned that I haven’t checked this (I may do so tomorrow). If this is the case, we can make the following definition.

Definition 2 A sequence is a set of the form {0x0, 1x1, …, (n – 1)xn – 1}, where n is an ordinal called the length of the sequence and x0, x1, … and xn – 1 are objects called the terms of the sequence.

I suspect there may be problems with this definition too. Maybe it can be amended to make it work, but I’m going to have to leave this for some other day.