Category Archives: Uncategorized

Generating random probability distributions

I’ve been thinking about how to use a computer program to randomly generate a probability distribution of a given finite size. This has turned out to be an interesting problem.

My first idea was to generate n − 1 uniform variates in [0, 1] (where n is the desired size), sort them, add 0 to the front of the list and 1 to the back of the list, and then take the non-negative differences between the adjacent variates. In Python code:

def randpd1(size):
    variates = [random.random() for i in range(size - 1)]
    variates.sort()
    return [j - i for i, j in zip([0] + variates, variates + [1])]

My second idea was to simply generate n uniform variates in [0, 1], add them together, and take the ratio of each individual variate to the sum.

def randpd2(size):
    variates = [random.random() for i in range(size)]
    s = sum(variates)
    return [i/s for i in variates]

Both of these functions do reliably generate probability distributions, i.e. lists of non-negative real numbers (encoded as Python float objects) that sum to 1, although very large lists generated by randpd2 sometimes sum to something slightly different from 1 due to floating-point imprecision.

>>> sample1 = [randpd1(2**(i//1000 + 1)) for i in range(10000)]
>>> all(all(p >= 0 for p in pd) for pd in sample1)
True
>>> all(sum(pd) == 1 for pd in sample1)
True
>>> sample2 = [randpd2(2**(i//1000 + 1)) for i in range(10000)]
>>> all(all(p >= 0 for p in pd) for pd in sample2)
True
>>> all(sum(pd) == 1 for pd in sample2)
False
>>> all(math.isclose(sum(pd), 1) for pd in sample2)
True

But do they both generate a random probability distribution? In precise terms: for a given size argument, are the probability distributions of the return values randpd1(size) and randpd2(size) always uniform?

I don’t really know how to answer this question. In fact, it’s not even clear to me that there is a uniform distribution over the probability distributions of size n for every positive integer n. The problem is that the probability distributions of size n are the solutions in [0, 1]^n of the equation p_1 + p_2 + \dotsb + p_n = 1, where p_1, p_2, … and p_n are dummy variables, and therefore they comprise a set S whose dimension is n − 1 (not n). Because S is missing a dimension, continuous probability distributions over it cannot be defined in the usual way via probability density mappings on \mathbb R^n. Any such mapping would have to assign probability density 0 to every point in S, because for every such point x, there’s a whole additional dimension’s worth of points in every neighbourhood of x which are not in S. But then the integral of the probability density mapping over S would be 0, not 1, and it would not be a probability density mapping.

But perhaps you can map S onto a subset of \mathbb R^{n - 1}, and do something with a uniform distribution over the image. In any case, I’m finding thinking about this very confusing, so I’ll leave it for readers to ponder over. Given that I don’t currently know what a uniform probability distribution over the probability distributions of size n even looks like, I don’t know how to test whether one exists.

I can look at the marginal distributions of the individual items in the returned values of randpd1 and randpd2. But these marginal distributions are not straightforwardly related to the joint distribution of the list as a whole. In particular, uniformity of the joint distribution does not imply uniformity of the marginal distributions, and uniformity of the marginal distributions does not imply uniformity of the joint distribution.

But it’s still interesting to look at the marginal distributions. First of all, they allow validation of another desirable property of the two functions: the marginal distributions are the same for each item (regardless of its position in the list). I’m not going to demonstrate this here because it would be tedious, but it does look like this is the case. Therefore we can speak of “the marginal distribution” without reference to any particular item. Second, they reveal that randpd1 and randpd2 do not do exactly the same thing. The marginal distributions are different for the two functions. Let’s first look just at the case where size is 2.

>>> data1 = [randpd1(2)[0] for i in range(100000)]
>>> plt.hist(data1)

size2_randpd1

>>> data2 = [randpd2(2)[0] for i in range(100000)]
>>> plt.hist(data2)

size2_randpd2

The first plot looks like it’s been generated from a uniform distribution over [0, 1]; the second plot looks like it’s been generated from a non-uniform distribution which concentrates the probability density at 1/2. It’s easy to see why the distribution is uniform for randpd1: the function works by generating a single uniform variate p and then returning [min(p, 1 - p), max(p, 1 - p)], and given that the distribution of p is uniform the distribution of 1 - p is also uniform. The function randpd2, on the other hand, works by generating two uniform variates p + q and returning [p/(p + q), q/(p + q)]. However, I don’t know what the distribution of p/(p + q) and q/(p + q) is exactly, given that p and q are uniformly distributed. This is another thing I hope readers who know more about probability and statistics than me might be able to enlighten me on.

Here are the graphs for size 3:

>>> data1 = [randpd1(3)[0] for i in range(100000)]
>>> plt.hist(data1)

size3_randpd1

>>> data2 = [randpd2(3)[0] for i in range(100000)]
>>> plt.hist(data2)

size3_randpd2

The marginal distribution for randpd1 is no longer uniform; rather, it’s right triangular, with the right angle at the point (0, 0) and height 2. That means that, roughly speaking, a given item in the list returned by randpd1(3) is twice as likely to be close to 0 as it is to be close to 1/2, and half as likely to be close to 1 as it is to be close to 1/2.

In general, the marginal distribution for randpd1 is the distribution of the minimum of a sample of uniform variates in [0, 1] of size n − 1, where n is the value of size. This is because randpd1 works by generating such a sample, and the minimum of that sample always ends up being the first item in the returned list, and the marginal distributions of the other items are the same as the marginal distribution of the first item.

It turns out to be not too difficult to derive an exact formula for this distribution. For every x \in [0, 1], the minimum is greater than x if and only if all n − 1 variates are greater than x. Therefore the probabilities of these two events are the same. The probability of an individual variate being greater than x is 1 − x (because, given that the variate is uniformly distributed, x is the probability that the variate is less than or equal to x) and therefore, given that the variates are independent of each other, the probability of all being greater than x is (1 - x)^{n - 1}. It follows that the probability of the minimum being less than or equal to x is 1 - (1 - x)^{n - 1}. That is, the cumulative distribution mapping (CDM) f of the marginal distribution for randpd1 is given by

\displaystyle f(x) = 1 - (1 - x)^{n - 1}.

The probability distribution defined by this CDM is a well-known one called the beta distribution with parameters (1, n − 1). That’s a nice result!

The marginal distribution for randpd2, on the other hand, is similar to the one for size 2 except that the mean is now something like 1/3 rather than 1/2, and because the support is still the whole interval [0, 1] this results in a left-skewing of the distribution. Again, I don’t know how to characterize this distribution exactly. Here are the graphs for sizes 4 and 5:

>>> data = [randpd2(4)[0] for i in range(100000)]
>>> plt.hist(data)

size4_randpd2

>>> data = [randpd2(5)[0] for i in range(100000)]
>>> plt.hist(data2)

size5_randpd2

It looks like the marginal distribution generally has mean 1/n, or something close to that, for every positive integer n, while still having density approaching 0 at the left limit.

In conclusion… this post doesn’t have a conclusion, it just has a bunch of unanswered questions which I’d like to know the answers to.

  1. Is the concept of a uniform distribution over the probability distributions of size n sensical?
  2. If so, do the returned values of randpd1 and randpd2 have that distribution?
  3. If not, what distributions do they have?
  4. What’s the exact form of the marginal distribution for randpd2?
  5. Which is better: randpd1 or randpd2? Or if one isn’t clearly better than the other, what is each one best suited to being used for?
  6. Are there any other algorithms one could use to generate a random probability distribution?

That’s OK, but this’s not OK?

Here’s something peculiar I noticed the other day about the English language.

The word is (the third-person singular present indicative form of the verb be) can be ‘contracted’ with a preceding noun phrase, so that it is reduced to an enclitic form -‘s. This can happen after pretty much any noun phrase, no matter how syntactically complex:

(1) he’s here

/(h)iːz ˈhiːə/[1]

(2) everyone’s here

/ˈevriːwɒnz ˈhiːə/

(3) ten years ago’s a long time

/ˈtɛn ˈjiːəz əˈgəwz ə ˈlɒng ˈtajm/

However, one place where this contraction can’t happen is immediately after the proximal demonstrative this. This is strange, because it can certainly happen after the distal demonstrative that, and one wouldn’t expect these two very similar words to behave so differently:

(4) that’s funny
/ˈðats ˈfʊniː/

(5) *this’s funny

There is a complication here which I’ve kind of skirted over, though. Sure, this’s funny is unacceptable in writing. But what would it sound like, if it was said in speech? Well, the -’s enclitic form of is can actually be realized on the surface in a couple of different ways, depending on the phonological environment. You might already have noticed that it’s /-s/ in example (4), but /-z/ in examples (1)-(3). This allomorphy (variation in phonological form) is reminiscent of the allomorphy in the plural suffix: cats is /ˈkats/, dogs is /ˈdɒgz/, horses is /ˈhɔːsɪz/. In fact the distribution of the /-s/ and /-z/ realizations of -‘s is exactly the same as for the plural suffix: /-s/ appears after voiceless non-sibilant consonants and /-z/ appears after vowels and voiced non-sibilant consonants. The remaining environment, the environment after sibilants, is the environment in which the plural suffix appears as /-ɪz/. And this environment turns out to be exactly the same environment in which -’s is unacceptable in writing. Here are a couple more examples:

(6) *a good guess’s worth something (compare: the correct answer’s worth something)

(7) *The Clash’s my favourite band (compare: Pearl Jam’s my favourite band)

Now, if -‘s obeys the same rules as the plural suffix then we’d expect it to be realized as /-ɪz/ in this environment. However… this is exactly the same sequence of segments that the independent word is is realized as when it is unstressed. One might therefore suspect that in sentences like (8) below, the morpheme graphically represented as the independent word is is actually the enclitic -‘s, it just happens to be realized the same as the independent word is and therefore not distinguished from it in writing. (Or, perhaps it would be more elegant to say that the contrast between enclitic and independent word is neutralized in this environment.)

(8) The Clash is my favourite band

Well, this is (*this’s) a very neat explanation, and if you do a Google search for “this’s” that’s pretty much the explanation you’ll find given to the various other confused people who have gone to websites like English Stack Exchange to ask why this’s isn’t a word. Unfortunately, I think it can’t be right.

The problem is, there are some accents of English, including mine, which have /-əz/ rather than /-ɪz/ in the allomorph of the plural suffix that occurs after sibilants, while at the same time pronouncing unstressed is as /ɪz/ rather than /əz/. (There are minimal pairs, such as peace is upon us /ˈpiːsɪz əˈpɒn ʊz/ and pieces upon us /ˈpiːsəz əˈpɒn ʊz/.) If the enclitic form of is does occur in (8) then we’d expect it to be realized as /əz/ in these accents, just like the plural suffix would be in the same environment. This is not what happens, at least in my own accent: (8) can only have /ɪz/. Indeed, it can be distinguished from the minimally contrastive NP (9):

(9) The Clash as my favourite band

In fact this problem exists in more standard accents of English as well, because is is not the only word ending in /-z/ which can end a contraction. The third-person singular present indicative of the verb have, has, can also be contracted to -‘s, and it exhibits the expected allomorphy between voiceless and voiced realizations:

(10) it’s been a while /ɪts ˈbiːn ə ˈwajəl/

(11) somebody I used to know’s disappeared /ˈsʊmbɒdiː aj ˈjuːst tə ˈnəwz dɪsəˈpijəd/

But like is it does not contract, at least in writing, after sibilants, although it may drop the initial /h-/ whenever it’s unstressed:

(12) this has gone on long enough /ˈðɪs (h)əz gɒn ɒn lɒng əˈnʊf/

I am not a native speaker of RP, so, correct me if I’m wrong. But I would be very surprised if any native speaker of RP would ever pronounce has as /ɪz/ in sentences like (12).

What’s going on? I actually do think the answer given above—that this’s isn’t written because it sounds exactly the same as this is—is more or less correct, but it needs elaboration. Such an answer can only be accepted if we in turn accept that the plural -s, the reduced -‘s form of is and the reduced -‘s form of has do not all exhibit the same allomorph in the environment after sibilants. The reduced form of is has the allomorph /-ɪz/ in all accents, except in those such as Australian English in which unstressed /ɪ/ merges with schwa. The reduced form of has has the allomorph /-əz/ in all accents. The plural suffix has the allomorph /-ɪz/ in some accents, but /-əz/ in others, including some in which /ɪ/ is not merged completely with schwa and in particular is not merged with schwa in the unstressed pronunciation of is.

Introductory textbooks on phonology written in the English language are very fond of talking about the allomorphy of the English plural suffix. In pretty much every treatment I’ve seen, it’s assumed that /-z/ is the underlying form, and /-s/ and /-əz/ are derived by phonological rules of voicing assimilation and epenthesis respectively, with the voicing assimilation crucially coming after the epenthesis (otherwise we’d have an additional allomorph /-əs/ after voiceless sibilants, while /-əz/ would only appear after voiced sibilants). This is the best analysis when the example is taken in isolation, because positing an epenthesis rule allows the phonological rules to be assumed to be productive across the entire lexicon of English. If such a fully productive deletion rule were posited, then it would be impossible to account for the pronunciation of a word like Paulas (‘multiple people named Paula’) with /-əz/ on the surface, whose underlying form would be exactly the same, phonologically, as Pauls (‘multiple people named Paul’). (This example only works if your plural suffix post-sibilant allomorph is /-əz/ rather than /-ɪz/, but a similar example could probably be exhibited in the other case.) One could appeal to the differing placement of the morpheme boundary but this is unappealing.

However, the assumption that a single epenthesis rule operating between sibilants is productive across the entire English lexicon has to be given up, because ‘s < is and ‘s < has have different allomorphs after sibilants! Either they are accounted for by two different lexically-conditioned epenthesis rules (which is a very unappealing model) or the allomorphs with the vowels are actually the underlying ones, and the allomorphs without the vowels are produced by a not phonologically-conditioned but at least (sort of) morphologically-conditioned deletion rule that elides fully reduced unstressed vowels (/ə/, /ɪ/) before word-final obstruents. This rule only applies in inflectional suffixes (e.g. lettuce and orchid are immune), and even there it does not apply unconditionally because the superlative suffix -est is immune to it. But this doesn’t bother me too much. One can argue that the superlative is kind of a marginal inflectional category, when you put it in the company of the plural, the possessive and the past tense.

A nice thing about the synchronic rule I’m proposing here is that it’s more or less exactly the same as the diachronic rule that produced the whole situation in the first place. The Old English nom./acc. pl., gen. sg., and past endings were, respectively, -as, -es, -aþ and -ede. In Middle English final schwa was elided unconditionally in absolute word-final position, while in word-final unstressed syllables where it was followed by a single obstruent it was gradually eliminated by a process of lexical diffusion from inflectional suffix to inflectional suffix, although “a full coverage of the process in ME is still outstanding” (Minkova 2013: 231). Even the superlative suffix was reduced to /-st/ by many speakers for a time, but eventually the schwa-ful form of this suffix prevailed.

I don’t see this as a coincidence. My inclination, when it comes to phonology, is to see the historical phonology as essential for understanding the present-day phonology. Synchronic phonological alternations are for the most part caused by sound changes, and trying to understand them without reference to these old sound changes is… well, you may be able to make some progress but it seems like it’d be much easier to make progress more quickly by trying to understand the things that cause them—sound changes—at the same time. This is a pretty tentative paragraph, and I’m aware I’d need a lot more elaboration to make a convincing case for this stance. But this is where my inclination is headed.

[1] The transcription system is the one which I prefer to use for my own accent of English.

References

Minkova, D. 2013. A Historical Phonology of English. Edinburgh University Press.

A language with no word-initial consonants

I was having a look at some of the squibs in Linguistic Inquiry today, which are often fairly interesting (and have the redeeming quality that, when they’re not interesting, they’re at least short), and there was an especially interesting one in the April 1970 (second ever) issue by R. M. W. Dixon (Dixon 1970) which I’d like to write about for the benefit of those who can’t access it.

In Olgolo, a variety of Kunjen spoken on the Cape York Peninsula, there appears to been a sound change that elided consonants in initial position. That is, not just consonants of a particular variety, but all consonants. As a result of this change, every word in the language begins with a vowel. Examples (transcriptions in IPA):

  • *báma ‘man’ > áb͡ma
  • *míɲa ‘animal’ > íɲa
  • *gúda ‘dog’ > úda
  • *gúman ‘thigh’ > úb͡man
  • *búŋa ‘sun’ > úg͡ŋa
  • *bíːɲa ‘aunt’ > íɲa
  • *gúyu ‘fish’ > úyu
  • *yúgu ‘tree, wood’ > úgu

(Being used to the conventions of Indo-Europeanists, I’m a little disturbed by the fact that Dixon doesn’t identify the linguistic proto-variety to which the proto-forms in these examples belong, nor does he cite cognates to back up his reconstruction. But I presume forms very similar to the proto-forms are found in nearby Paman languages. In fact, I know for a fact that the Uradhi word for ‘tree’ is /yúku/ because Black (1993) mentions it by way of illustrating the remarkable Uradhi phonological rule which inserts a phonetic [k] or [ŋ] after every vowel in utterance-final position. Utterance-final /yúku/ is by this means realized as [yúkuk] in Uradhi.)

(The pre-stopped nasals in some of these words [rather interesting segments in of themselves, but fairly widely attested, see the Wikipedia article] have arisen due to a sound change occurring before the word-initial consonant elision sound change, which pre-stopped nasals immediately after word-initial syllables containing a stop or *w followed by a short vowel. This would have helped mitigate the loss of contrast resulting from the word-initial consonant elision sound change a little, but only a little, and between e.g. the words for ‘animal’ and ‘aunt’ homophony was not averted because ‘aunt’ had an originally long vowel [which was shortened in Olgolo by yet another sound change].)

Dixon says Olgolo is the only language he’s heard of in which there are no word-initial consonants, although it’s possible that more have been discovered since 1970. However, there is a caveat to this statement: there are monoconsonantal prefixes that can be optionally added to most nouns, so that they have an initial consonant on the surface. There are at least four of these prefixes, /n-/, /w-/, /y-/ and /ŋ-/; however, every noun seems to only take a single one of these prefixes, so we can regard these three forms as lexically-conditioned allomorphs of a single prefix. The conditioning is in fact more precisely semantic: roughly, y- is added to nouns denoting fish, n- is added to nouns denoting other animals, and w- is added to nouns denoting various inanimates. The prefixes therefore identify ‘noun classes’ in a sense (although these are probably not noun classes in a strict sense because Dixon gives no indication that there are any agreement phenomena which involve them). The prefix ŋ- was only seen on a one word, /ɔ́jɟɔba/ ~ /ŋɔ́jɟɔba/ ‘wild yam’ and might be added to all nouns denoting fruits and vegetables, given that most Australian languages with noun classes have a noun class for fruits and vegetables, but there were no other such nouns in the dataset (Dixon only noticed the semantic conditioning after he left the field, so he didn’t have a chance to elicit any others). It must be emphasized, however, that these prefixes are entirely optional, and every noun which can have a prefix added to it can also be pronounced without the prefix. In addition some nouns, those denoting kin and body parts, appear to never take a prefix, although possibly this is just a limitation of the dataset given that their taking a prefix would be expected to be optional in any case. And words other than nouns, such as verbs, don’t take these prefixes at all.

Dixon hypothesizes that the y- and n- prefixes are reduced forms of /úyu/ ‘fish’ and /íɲa/ ‘animal’ respectively, while w- may be from /úgu/ ‘tree, wood’ or just an “unmarked” initial consonant (it’s not clear what Dixon means by this). These derivations are not unquestionable (for example, how do we get from /-ɲ-/ to /n-/ in the ‘animal’ prefix?) But it’s very plausible that the prefixes do originate in this way, even if the exact antedecent words are difficult to identify, because similar origins have been identified for noun class prefixes in other Australian languages (Dixon 1968, as cited by Dixon 1970). Just intuitively, it’s easy to see how nouns might come to be ever more frequently replaced by compounds of the dependent original noun and a term denoting a superset; cf. English koala ~ koala bear, oak ~ oak tree, gem ~ gemstone. In English these compounds are head-final but in other languages (e.g. Welsh) they are often head-initial, and presumably this would have to be the case in pre-Olgolo in order for the head elements to grammaticalize into noun class prefixes. The fact that the noun class prefixes are optional certainly suggests that the system is very much incipient, and still developing, and therefore of recent origin.

It might therefore be very interesting to see how the Olgolo language has changed after a century or so; we might be able to examine a noun class system as it develops in real time, with all of our modern equipment and techniques available to record each stage. It would also be very interesting to see how quickly this supposedly anomalous state of every word beginning with a vowel (in at least one of its freely-variant forms) is eliminated, especially since work on Australian language phonology since 1970 has established many other surprising findings about Australian syllable structure, including a language where the “basic’ syllable type appears to be VC rather than CV (Breen & Pensalfini 1999). Indeed, since Dixon wrote this paper 46 years ago Olgolo might have changed considerably already. Unfortunately, it might have changed in a somewhat more disappointing way. None of the citations of Dixon’s paper recorded by Google Scholar seem to examine Olgolo any further, and the documentation on Kunjen (the variety which includes Olgolo as a subvariety) recorded in the Australian Indigenous Languages Database isn’t particularly overwhelming. I can’t find a straight answer as to whether Kunjen is extinct today or not (never mind the Olgolo variety), but Dixon wasn’t optimistic about its future in 1970:

It would be instructive to study the development of Olgolo over the next few generations … Unfortunately, the language is at present spoken by only a handful of old people, and is bound to become extinct in the next decade or so.

References

Black, P. 1993 (post-print). Unusual syllable structure in the Kurtjar language of Australia. Retrieved from http://espace.cdu.edu.au/view/cdu:42522 on 26 September 2016.

Breen, G. & Pensalfini, R. 1999. Arrernte: A Language with No Syllable Onsets. Linguistic Inquiry 30 (1): 1-25.

Dixon, R. M. W. 1968. Noun Classes. Lingua 21: 104-125.

Dixon, R. M. W. 1970. Olgolo Syllable Structure and What They Are Doing about It. Linguistic Inquiry 1 (2): 273-276.

The insecurity of relative chronologies

One of the things historical linguists do is reconstruct relative chronologies: statements about whether one change in a language occurred before another change in the language. For example, in the history of English there was a change which raised the Middle English (ME) mid back vowel /oː/, so that it became high /uː/: boot, pronounced /boːt/ in Middle English, is now pronounced /buːt/. There was also a change which caused ME /oː/ to be reflected as short /ʊ/ before /k/ (among other consonants), so that book is now pronounced as /bʊk/. There are two possible relative chronologies of these changes: either the first happens before the second, or the second happens before the first. Now, because English has been well-recorded in writing for centuries, because these written records of the language often contain phonetic spellings, and because they also sometimes communicate observations about the language’s phonetics, we can date these changes quite precisely. The first probably began in the thirteenth century and continued through the fourteenth, while the second took place in the seventeenth century (Minkova 2015: 253-4, 272). In this particular case, then, no linguistic reasoning is needed to infer the relative chronology. But much of if not most of the time in historical linguistics, we are not so lucky, and are dealing with the history of languages for which written records in the desired time period are much less extensive, or completely nonexistent. Relative chronologies can still be inferred under these circumstances; however, it is a methodologically trickier business. In this post, I want to point out some complications associated with inferring relative chronologies under these circumstances which I’m not sure historical linguists are always aware of.

Let’s begin by thinking again about the English example I gave above. If English was an unwritten language, could we still infer that the /oː/ > /uː/ change happened before the /oː/ > /ʊ/ change? (I’m stating these changes as correspondences between Middle English and Modern English sounds—obviously if /oː/ > /uː/ happened first then the second change would operate on /uː/ rather than /oː/.) A first answer might go something along these lines: if the /oː/ > /uː/ change in quality happens first, then the second change is /uː/ > /ʊ/, so it’s one of quantity only (long to short). On the other hand, if /oː/ > /ʊ/ happens first we have a shift of both quantity and quality at the same time, followed by a second shift of quality. The first scenario is simpler, and therefore more likely.

Admittedly, it’s only somewhat more likely than the other scenario. It’s not absolutely proven to be the correct one. Of course we never have truly absolute proofs of anything, but I think there’s a good order of magnitude or so of difference between the likelihood of /oː/ > /uː/ happening first, if we ignore the evidence of the written records and accept this argument, and the likelihood of /oː/ > /uː/ happening first once we consider the evidence of the written records.

But in fact we can’t even say it’s more likely, because the argument is flawed! The /uː/ > /ʊ/ would involve some quality adjustment, because /ʊ/ is a little lower and more central than /uː/.[1] Now, in modern European languages, at least, it is very common for minor quality differences to exist between long and short vowels, and for lengthening and shortening changes to involve the expected minor shifts in quality as well (if you like, you can think of persistent rules existing along the lines of /u/ > /ʊ/ and /ʊː/ > /uː/, which are automatically applied after any lengthening or shortening rules to “adjust” their outputs). We might therefore say that this isn’t really a substantive quality shift; it’s just a minor adjustment concomitant with the quality shift. But sometimes, these quality adjustments following lengthening and shortening changes go in the opposite direction than might be expected based on etymology. For example, when /ʊ/ was affected by open syllable lengthening in Middle English, it became /oː/, not /uː/: OE wudu > ME wood /woːd/. This is not unexpected, because the quality difference between /uː/ and /ʊ/ is (or, more accurately, can be) such that /ʊ/ is about as close in quality to /oː/ as it is to /uː/. Given that /ʊ/ could lengthen into /oː/ in Middle English, it is hardly unbelievable that /oː/ could shorten into /ʊ/ as well.

I’m not trying to say that one should go the other way here, and conclude that /oː/ > /ʊ/ happened first. I’m just trying to argue that without the evidence of the written records, no relative chronological inference can be made here—not even an insecure-but-best-guess kind of relative chronological inference. To me this is surprising and somewhat disturbing, because when I first started thinking about it I was convinced that there were good intrinsic linguistic reasons for taking the /oː/ > /uː/-first scenario as the correct one. And this is something that happens with a lot of relative chronologies, once I start thinking about them properly.

Let’s now go to an example where there really is no written evidence to help us, and where my questioning of the general relative-chronological assumption might have real force. In Greek, the following two very well-known generalizations about the reflexes of Proto-Indo-European (PIE) forms can be made:

  1. The PIE voiced aspirated stops are reflected in Greek as voiceless aspirated stops in the general environment: PIE *bʰéroh2 ‘I bear’ > Greek φέρω, PIE *dʰéh₁tis ‘act of putting’ > Greek θέσις ‘placement’, PIE *ǵʰáns ‘goose’ > Greek χήν.
  2. However, in the specific environment before another PIE voiced aspirated stop in the onset of the immediately succeeding syllable, they are reflected as voiceless unaspirated stops: PIE *bʰeydʰoh2 ‘I trust’ > Greek πείθω ‘I convince’, PIE *dʰédʰeh1mi ‘I put’ > Greek τίθημι. This is known as Grassman’s Law. PIE *s (which usually became /h/ elsewhere) is elided in the same environment: PIE *segʰoh2 ‘I hold’ > Greek ἔχω ‘I have’ (note the smooth breathing diacritic).

On the face of it, the fact that Grassman’s Law produces voiceless unaspirated stops rather than voiced ones seems to indicate that it came into effect only after the sound change that devoiced the PIE voiced aspirated stops. For otherwise, the deaspiration of these voiced aspirated stops due to Grassman’s Law would have produced voiced unaspirated stops at first, and voiced unaspirated stops inherited from PIE, as in PIE *déḱm̥ ‘ten’ > Greek δέκα, were not devoiced.

However, if we think more closely about the phonetics of the segments involved, this is not quite as obvious. The PIE voiced aspirated stops could surely be more accurately described as breathy-voiced stops, like their presumed unaltered reflexes in modern Indo-Aryan languages. Breathy voice is essentially a kind of voice which is closer to voicelessness than voice normally is: the glottis is more open (or less tightly closed, or open at one part and not at another part) than it is when a modally voiced sound is articulated. Therefore it does not seem out of the question for breathy-voiced stops to deaspirate to voiceless stops if they are going to be deaspirated, in a similar manner as ME /ʊ/ becoming /oː/ when it lengthens. Granted, I don’t know of any attested parallels for such a shift. And in Sanskrit, in which a version of Grassman’s Law also applies, breathy-voiced stops certainly deaspirate to voiced stops: PIE *dʰédʰeh1mi ‘I put’ > Sanskrit dádhāmi. So the Grassman’s Law in Greek certainly has to be different in nature (and probably an entirely separate innovation) from the Grassman’s Law in Sanskrit.[2]

Another example of a commonly-accepted relative chronology which I think is highly questionable is the idea that Grimm’s Law comes into effect in Proto-Germanic before Verner’s Law does. To be honest, I’m not really sure what the rationale is for thinking this in the first place. Ringe (2006: 93) simply asserts that “Verner’s Law must have followed Grimm’s Law, since it operated on the outputs of Grimm’s Law”. This is unilluminating: certainly Verner’s Law only operates on voiceless fricatives in Ringe’s formulation of it, but Ringe does not justify his formulation of Verner’s Law as applying only to voiceless fricatives. In general, sound changes will appear to have operated on the outputs of a previous sound change if one assumes in the first place that the previous sound change comes first: the key to justifying the relative chronology properly is to think about what alternative formulations of each sound change are required in order to make the alternative chronology (such alternative formulations can almost always be formulated), and establish the high relative unnaturalness of the sound changes thus formulated compared to the sound changes as formulable under the relative chronology which one wishes to justify.

If the PIE voiceless stops at some point became aspirated (which seems very likely, given that fricativization of voiceless stops normally follows aspiration, and given that stops immediately after obstruents, in precisely the same environment that voiceless stops are unaspirated in modern Germanic languages, are not fricativized), then Verner’s Law, formulated as voicing of obstruents in the usual environments, followed by Grimm’s Law formulated in the usual manner, accounts perfectly well for the data. A Wikipedia editor objects, or at least raises the objection, that a formulation of the sound change so that it affects the voiceless fricatives, specifically, rather than the voiceless obstruents as a whole, would be preferable—but why? What matters is the naturalness of the sound change—how likely it is to happen in a language similar to the one under consideration—not the sizes of the categories in phonetic space that it refers to. Some categories are natural, some are unnatural, and this is not well correlated with size. Both fricatives and obstruents are, as far as I am aware, about equally natural categories.

I do have one misgiving with the Verner’s Law-first scenario, which is that I’m not aware of any attested sound changes involving intervocalic voicing of aspirated stops. Perhaps voiceless aspirated stops voice less easily than voiceless unaspirated stops. But Verner’s Law is not just intervocalic voicing, of course: it also interacts with the accent (precisely, it voices obstruents only after unaccented syllables). If one thinks of it as a matter of the association of voice with low tone, rather than of lenition, then voicing of aspirated stops might be a more believable possibility.

My point here is not so much about the specific examples; I am not aiming to actually convince people to abandon the specific relative chronologies questioned here (there are likely to be points I haven’t thought of). My point is to raise these questions in order to show at what level the justification of the relative chronology needs to be done. I expect that it is deeper than many people would think. It is also somewhat unsettling that it relies so much on theoretical assumptions about what kinds of sound changes are natural, which are often not well-established.

Are there any relative chronologies which are very secure? Well, there is another famous Indo-European sound law associated with a specific relative chronology which I think is secure. This is the “law of the palatals” in Sanskrit. In Sanskrit, PIE *e, *a and *o merge as a; but PIE *k/*g/*gʰ and *kʷ/*gʷ/*gʷʰ are reflected as c/j/h before PIE *e (and *i), and k/g/gh before PIE *a and *o (and *u). The only credible explanation for this, as far as I can see, is that an earlier sound change palatalizes the dorsal stops before *e and *i, and then a later sound change merges *e with *a and *o. If *e had already merged with *a and *o by the time the palatalization occurred, then the palatalization would have to occur before *a, and it would have to be sporadic: and sporadic changes are rare, but not impossible (this is the Neogrammarian hypothesis, in its watered-down form). But what really clinches it is this: that sporadic change would have to apply to dorsal stops before a set of instances of *a which just happened to be exactly the same as the set of instances of *a which reflect PIE *e, rather than *a or *o. This is astronomically unlikely, and one doesn’t need any theoretical assumptions to see this.[3]

Now the question I really want to answer here is: what exactly are the relevant differences in this relative chronology that distinguish it from the three more questionable ones I examined above, and allow us to infer it with high confidence (based on the unlikelihood of a sporadic change happening to appear conditioned by an eliminated contrast)? It’s not clear to me what they are. Something to do with how the vowel merger counterbleeds the palatalization? (I hope this is the correct relation. The concepts of (counter)bleeding and (counter)feeding are very confusing for me.) But I don’t think this is referring to the relevant things. Whether two phonological rules / sound changes (counter)bleed or (counter)feed each other is a function of the natures of the phonological rules / sound changes; but when we’re trying to establish relative chronologies we don’t know what the natures of the phonological rules / sound changes are! That has to wait until we’ve established the relative chronologies. I think that’s why I keep failing to compute whether there is also a counterbleeding in the other relative chronologies I talked about above: the question is non-well-formed. (In case you can’t tell, I’m starting to mostly think aloud in this paragraph.) What we do actually know are the correspondences between the mother language and the daughter language[4], so an answer to the question should state it in terms of those correspondences. Anyway, I think it is best to leave it here, for my readers to read and perhaps comment with their ideas, providing I’ve managed to communicate the question properly; I might make another post on this theme sometime if I manage to work out (or read) an answer that satisfies me.

Oh, but one last thing: is establishing the security of relative chronologies that important? I think it is quite important. For a start, relative chronological assumptions bear directly on assumptions about the natures of particular sound changes, and that means they affect our judgements of which types of sound changes are likely and which are not, which are of fundamental importance in historical phonology and perhaps of considerable importance in non-historical phonology as well (under e.g. the Evolutionary Phonology framework of Blevins 2004).[5] But perhaps even more importantly, they are important in establishing genetic linguistic relationships. Ringe & Eska (2014) emphasize in their chapter on subgrouping how much less likely it is for languages to share the same sequence of changes than the same unordered set of changes, and so how the establishment of secure relative chronologies is our saving grace when it comes to establishing subgroups in cases of quick diversification (where there might be only a few innovations common to a given subgroup). This seems reasonable, but if the relative chronologies are insecure and questionable, we have a problem (and the sequence of changes they cite as establishing the validity of the Germanic subgroup certainly contains some questionable relative chronologies—for example they have all three parts of Grimm’s Law in succession before Verner’s Law, but as explained above, Verner’s Law could have come before Grimm’s; the third part of Grimm’s Law may also have not happened separately from the first).

[1] This quality difference exists in present-day English for sure—modulo secondary quality shifts which have affected these vowels in some accents—and it can be extrapolated back into seventeenth-century English with reasonable certainty using the written records. If we are ignoring the evidence of the written records, we can postulate that the quality differentiation between long /uː/ and short /ʊ/ was even more recent than the /uː/ > /ʊ/ shift (which would now be better described as an /uː/ > /u/ shift). But the point is that such quality adjustment can happen, as explained in the rest of the paragraph.

[2] There is a lot of literature on Grassman’s Law, a lot of it dealing with relative chronological issues and, in particular, the question of whether Grassman’s Law can be considered a phonological rule that was already present in PIE. I have no idea why one would want to—there are certainly PIE forms inherited in Germanic that appear to have been unaffected by Grassman’s Law, as in PIE *bʰeydʰ- > English bide; but I’ve hardly read any of this literature. My contention here is only that the generally-accepted relative chronology of Grassman’s Law and the devoicing of the PIE voiced aspirated stops can be contested.

[3] One should bear in mind some subtleties though—for example, *e and *a might have gotten very, very phonetically similar, so that they were almost merged, before the palatalization occured. If one wants to rule out that scenario, one has to appeal again to the naturalness of the hypothesized sound changes. But as long as we are talking about the full merger of *e and *a we can confidently say that it occurred after palatalization.)

[4] Actually, in practice we don’t know these with certainty either, and the correspondences we postulate to some extent are influenced by our postulations about the natures of sound changes that have occurred and their relative chronologies… but I’ve been assuming they can be established more or less independently throughout these posts, and that seems a reasonable assumption most of the time.

[5] I realize I’ve been talking about phonological changes throughout this post, but obviously there are other kinds of linguistic changes, and relative chronologies of those changes can be established too. How far the discussion in this post applies outside of the phonological domain I will leave for you to think about.

References

Blevins, J. 2004. Evolutionary phonology: The emergence of sound patterns. Cambridge University Press.

Minkova, D. 2013. A historical phonology of English. Edinburgh University Press.

Ringe, D. 2006. A linguistic history of English: from Proto-Indo-European to Proto-Germanic. Oxford University Press.

Ringe, D. & Eska, J. F. 2013. Historical linguistics: toward a twenty-first century reintegration. Cambridge University Press.

Vowel-initial and vowel-final roots in Proto-Indo-European

A remarkable feature of Proto-Indo-European (PIE) is the restrictiveness of the constraints on its root structure. It is generally agreed that all PIE roots were monosyllabic, containing a single underlying vowel. In fact, the vast majority of the roots are thought to have had a single underlying vowel, namely *e. (Some scholars reconstruct a small number of roots with underlying *a rather than *e; others do not, and reconstruct underlying *e in every PIE root.) It is also commonly supposed that every root had at least one consonant on either side of its vowel; in other words, that there were no roots which began or ended with the vowel (Fortson 2004: 71).

I have no dispute with the first of these constraints; though it is very unusual, it is not too difficult to understand in connection with the PIE ablaut system, and the Semitic languages are similar with their triconsonantal, vowel-less roots. However, I think the other constraint, the one against vowel-initial and vowel-final roots, is questionable. In order to talk about it with ease and clarity, it helps to have a name for it: I’m going to call it the trisegmental constraint, because it amounts to the constraint that every PIE root contains at least three segments: the vowel, a consonant before the vowel, and a consonant after the vowel.

The first thing that might make one suspicious of the trisegmental constraint is that it isn’t actually attested in any IE language, as far as I know. English has vowel-initial roots (e.g. ask) and vowel-final roots (e.g. fly); so do Latin, Greek and Sanskrit (cf. S. aj- ‘drive’, G. ἀγ- ‘lead’, L. ag- ‘do’), and L. dō-, G. δω-, S. dā-, all meaning ‘give’). And for much of the early history of IE studies, nobody suspected the constraint’s existence: the PIE roots meaning ‘drive’ and ‘give’ were reconstructed as *aǵ- and *dō-, respectively, with an initial vowel in the case of the former and a final vowel in the case of the latter.

It was only with the development of the laryngeal theory that the reconstruction of the trisegmental constraint became possible. The initial motivation for the laryngeal theory was to simplify the system of ablaut reconstructed for PIE. I won’t go into the motivation in detail here; it’s one of the most famous developments in IE studies so a lot of my readers are probably familiar with it already, and it’s not hard to find descriptions of it. The important thing to know, if you want to understand what I’m talking about here, is that the laryngeal theory posits the existence of three consonants in PIE which are called laryngeals and written *h1, *h2 and *h3, and that these laryngeals can be distinguished by their effects on adjacent vowels: *h2 turns adjacent underlying *e into *a and *h3 turns adjacent underlying *e into *o. In all of the IE languages other than the Anatolian languages (which are all extinct, and which records of were only discovered in the 20th century), the laryngeals are elided in pretty much everywhere, and their presence is only discernable from their effects on adjacent segments. Note that as well as changing the quality (“colouring”) underlying *e, they also lengthen preceding vowels. And between consonants, they are reflected as vowels, but as different vowels in different languages: in Greek *h1, *h2, *h3 become ε, α, ο respectively, in Sanskrit all three become i, in the other languages all three generally became a.

So, the laryngeal theory allowed the old reconstructions *aǵ- and *dō- to be replaced by *h2éǵ- and *deh3– respectively, which conform to the trisegmental constraint. In fact every root reconstructed with an initial or final vowel by the 19th century IEists could be reconstructed with an initial or final laryngeal instead. Concrete support for some of these new reconstructions with laryngeals came from the discovery of the Anatolian languages, which preserved some of the laryngeals in some positions as consonants. For example, the PIE word for ‘sheep’ was reconstructed as *ówis on the basis of the correspondence between L. ovis, G. ὄϊς, S. áviḥ, but the discovery of the Cuneiform Luwian cognate ḫāwīs confirmed without a doubt that the root must have originally begun with a laryngeal (although it is still unclear whether that laryngeal was *h2, preceding *o, or *h3, preceding *e).

There are also indirect ways in which the presence of a laryngeal can be evidenced. Most obviously, if a root exhibits the irregular ablaut alternations in the early IE languages which the laryngeal theory was designed to explain, then it should be reconstructed with a laryngeal in order to regularize the ablaut alternation in PIE. In the case of *h2eǵ-, for example, there is an o-grade derivative of the root, *h2oǵmos ‘drive’ (n.), which can be reconstructed on the evidence of Greek ὄγμος ‘furrow’ (Ringe 2006: 14). This shows that the underlying vowel of the root must have been *e, because (given the laryngeal theory) the PIE ablaut system did not involve alternations of *a with *o, only alternations of *e, *ō or ∅ (that is, the absence of the segment) with *o. But this underlying *e is reflected as if it was *a in all the e-grade derivatives of *h2eǵ- attested in the early IE languages (e.g. in the 3sg. present active indicative forms S. ájati, G. ἀγει, L. agit). In order to account for this “colouring” we must reconstruct *h2 next to the *e. Similar considerations allow us to be reasonably sure that *deh3– also contained a laryngeal, because the e-grade root is reflected as if it had *ō (S. dádāti, G. δίδωσι) and the zero-grade root in *dh3tós ‘given’ exhibits the characteristic reflex of interconsonantal *h3 (S. -ditáḥ, G. dotós, L. datus).

But in many cases there does not seem to be any particular evidence for the reconstruction of the initial or final laryngeal other than the assumption that the trisegmental constraint existed. For example, *h1éḱwos ‘horse’ could just as well be reconstructed as *éḱwos, and indeed this is what Ringe (2006) does. Likewise, there is no positive evidence that the root *muH- of *muHs ‘mouse’ (cf. S. mūṣ, G. μῦς, L. mūs) contained a laryngeal: it could just as well be *mū-. Both of the roots *(h1)éḱ- and *muH/ū- are found, as far as I know, in these stems only, so there is no evidence for the existence of the laryngeal from ablaut. It is true that PIE has no roots that can be reconstructed as ending in a short vowel, and this could be seen as evidence for at least a constraint against vowel-final roots, because if all the apparent vowel-final roots actually had a vowel + laryngeal sequence, that would explain why the vowel appears to be long. But this is not the only possible explanation: there could just be a constraint against roots containing a light syllable. This seems like a very natural constraint. Although the circumstances aren’t exactly the same—because English roots appear without inflectional endings in most circumstances, while PIE roots mostly didn’t—the constraint is attested in English: short unreduced vowels like that of cat never appear in root-final (or word-final) position; only long vowels, diphthongs and schwa can appear in word-final position, and schwa does not appear in stressed syllables.

It could be argued that the trisegmental constraint simplifies the phonology of PIE, and therefore it should be assumed to exist pending the discovery of positive evidence that some root does begin or end with a vowel. It simplifies the phonology in the sense that it reduces the space of phonological forms which can conceivably be reconstructed. But I don’t think this is the sense of “simple” which we should be using to decide which hypotheses about PIE are better. I think a reconstructed language is simpler to the extent that it is synchronically not unusual, and that the existence of whatever features it has that are synchronically unusual can be justified by explanations of features in the daughter languages by natural linguistic changes (in other words, both synchronic unusualness and diachronic unusualness must be taken into account). The trisegmental constraint seems to me synchronically unusual, because I don’t know of any other languages that have something similar, although I have not made any systematic investigation. And as far as I know there are no features of the IE languages which the trisegmental constraint helps to explain.

(Perhaps a constraint against vowel-initial roots, at least, would be more natural if PIE had a phonemic glottal stop, because people, or at least English and German speakers, tend to insert subphonemic glottal stops before vowels immediately preceded by a pause. Again, I don’t know if there are any cross-linguistic studies which support this. The laryngeal *h1 is often conjectured to be a glottal stop, but it is also often conjectured to be a glottal fricative; I don’t know if there is any reason to favour either conjecture over the other.)

I think something like this disagreement over what notion of simplicity is most important in linguistic reconstruction underlies some of the other controversies in IE phonology. For example, the question of whether PIE had phonemic *a and *ā: the “Leiden school” says it didn’t, accepting the conclusions of Lubotsky (1989), most other IEists say it did. The Leiden school reconstruction certainly reduces the space of phonological forms which can be reconstructed in PIE and therefore might be better from a falsifiability perspective. Kortlandt (2003) makes this point with respect to a different (but related) issue, the sound changes affecting initial laryngeals in Anatolian:

My reconstructions … are much more constrained [than the ones proposed by Melchert and Kimball] because I do not find evidence for more than four distinct sequences (three laryngeals before *-e- and neutralization before *-o-) whereas they start from 24 possibilites (zero and three laryngeals before three vowels *e, *a, *o which may be short or long, cf. Melchert 1994: 46f., Kimball 1999: 119f.). …

Any proponent of a scientific theory should indicate the type of evidence required for its refutation. While it is difficult to see how a theory which posits *H2 for Hittite h- and a dozen other possible reconstructions for Hittite a- can be refuted, it should be easy to produce counter-evidence for a theory which allows no more than four possibilities … The fact that no such counter-evidence has been forthcoming suggests that my theory is correct.

Of course the problem with the Leiden school reconstruction is that for a language to lack phonemic low vowels is very unusual. Arapaho apparently lacks phonemic low vowels, but it’s the only attested example I’ve heard of. But … I don’t have any direct answer to Kortlandt’s concerns about non-falsifiability. My own and other linguists’ concerns about the unnaturalness of a lack of phonemic low vowels also seem valid, but I don’t know how to resolve these opposing concerns. So until I can figure out a solution to this methodological problem, I’m not going to be very sure about whether PIE had phonemic low vowels and, similarly, whether the trisegmental constraint existed.

References

Fortson, B., 2004. Indo-European language and culture: An introduction. Oxford University Press.

Kortlandt, F., 2003. Initial laryngeals in Anatolian. Orpheus 13-14 [Gs. Rikov] (2003-04), 9-12.

Lubotsky, A., 1989. Against a Proto-Indo-European phoneme *a. The New Sound of Indo–European. Essays in Phonological Reconstruction. Berlin–New York: Mouton de Gruyter, pp. 53–66.

Ringe, D., 2006. A Linguistic History of English: Volume I, From Proto-Indo-European to Proto-Germanic. Oxford University Press.

The Duke of York gambit in diachronic linguistics

I.

Pullum (1976) discusses a phenomenon he evocatively calls the “Duke of York gambit”—the postulation of a derivation of the form A → B → A, which takes the underlying structure A “up to the top of the hill” into a different form B and then takes it “down again” into A on the surface (usually in a more restricted environment, otherwise the postulation of this derivation would not be able to explain anything). Such derivations are called “Duke of York derivations”.

As an illustrative example, consider the case of word-final devoicing in Dutch. Like many other languages, Dutch distinguishes its voiceless and voiced stop phonemes only in non-word-final position. In word-final position, voiceless stops are found exclusively, so that, for example, goed, the cognate of English good, is pronounced [ɣut] in isolation. But morphologically related words like goede ‘good one’, pronounced [ɣudə], seem to indicate that the segment written d is in fact underlyingly /d/ and it becomes [t] by a phonological rule that word-final obstruents become voiceless. We therefore have a derivation /d/ → [t]. Now, in fast, connected speech, goed is not always pronounced [ɣut]. Before a word that begins with a voiced obstruent such as boek ‘book’, it may be pronounced with [d]: goed boek [ɣudbuk]. Some linguists like Brink (1974) have therefore proposed a second phonological rule that grants word-final obstruents the voicing of the obstruent beginning the next word (if there is such an obstruent) in fast, connected speech. This rule applies after the first phonological rule that devoices word-final obstruents, so that the pronunciation [d] of the d in goed boek is derived from underlying /d/ by two steps: /d/ → /t/ → [d]. This is a Duke of York derivation.

Many linguists, as Pullum documents, find Duke of York gambits like this objectionable. They question others’ analyses on the grounds that they postulate Duke of York derivations to take place, and they decide between analyses of their own by disfavouring those which involve Duke of York gambits. In this particular case, an objection is reasonable enough: why not simply propose that in fast, connected speech, the words in phrases run into one another and become unitary words? In that case, the rule devoicing word-final obstruents would not apply to the d in goed boek in the first place because it would be not be in word-final position; the word-final segment would be the k in boek.

Yet Pullum finds no principled reason to disfavour analyses involving Duke of York gambits just because they involve Duke of York gambits. Clearly some linguists find something unsavoury about such analyses: in the quotes in Pullum’s paper, we can find descriptions of them as “to be viewed with some suspicion” (Brame & Bordelois 1973: 115), “rather suspicious” (White 1973: 159), “theoretically quite illegitimate” (Hogg 1973: 10), “hardly an attractive solution” (Chomsky & Halle 1968: 270), “clearly farcical” (Smith 1973: 33), and “extremely implausible” (Johnson 1974: 98). (See Pullum’s paper for the full references.) But none of them articulate the problem explicitly. If an analysis can be replaced by a simpler one with equal or greater explanatory power, that’s one thing: that would be a problem by the well-established principle of Ockham’s Razor. But a Duke of York gambit does not necessarily make an analysis more complex than the alternative in any well-defined way. Even with the Dutch example above, the greater simplicity of the Duke of York gambit-less solution proposed can be questioned: is it really simpler to propose a process of allegro word unification (at the fairly deep underlying level at which the word-final devoicing rule must apply) which we might be able to do without otherwise?

Pullum mentions some other examples where a Duke of York gambit might even seem part of the obviously preferable analysis. In Nootka, according to Campbell (1973), there is a phonological rule of assimilation that turns /k/ into /kʷ/ immediately after /o/, and there is another phonological rule of word-final delabialization that turns /kʷ/ into /k/ in word-final position. And, in word-final position, the sequence /-ok/ appears and the sequence /-okʷ/ does not appear. If the word-final delabialization rule applies before the assimilation rule, then we would expect instead to find the sequence /-okʷ/ to the exclusion of /-ok/ in word-final position. The only possible analysis, if the rules are to be ordered, is to have assimilation before word-final delabialization: but this means that word-final /-ok/ undergoes the Duke of York derivation /-ok/ → /-okʷ/ → /-ok/. And the use of a model with ordered rules is not essential here, because a Duke of York derivation is obtained even in a very natural model with unordered rules: if we say that rules apply at any point that they can apply but they only apply at most once, then we again have that word-final /-ok/ is susceptible to the assimilation rule (but not the delabialization rule), so /-ok/ becomes /-okʷ/, but then /-okʷ/ is susceptible to the delabialization rule (but not the assimilation rule), so /-okʷ/ becomes /-ok/. One could propose that the assimilation rule is restricted in its application to the non-word-final environment. But this is peculiar: why should a progressive assimilation rule pay any heed to whether there is a word boundary after the assimilating segment? Any way of accounting for such a restriction could easily involve making complicated assumptions which would make the Duke of York gambit analysis preferable by Ockham’s Razor.

II.

Now, Pullum discusses only synchronic derivations in his paper. But diachronic derivations can also of course be Duke of York derivations. It is interesting, then, to consider how we should evaluate diachronic analyses that postulate Duke of York derivations. Such analyses are favoured or disfavoured for different reasons than synchronic analyses, so, even if one accepts Pullum’s conclusion that synchronic Duke of York gambits are unobjectionable in of themselves, the situation could conceivably be different for diachronic Duke of York gambits.

My first intuition is that there is even less reason to object to Duke of York gambits in the diachronic context. After all, diachronic analyses deal with changes that we can actually see happening, over the course of years or decades or centuries, and observe the intermediate stages of. (Of course, this is only the case in practice for a very small subset of the diachronic change that we are interested in—until time travel is invented nobody can go and observe the real-time development of languages like Proto-Indo-European.) It is not inconceivable that a change might be “undone” on a short time-scale and it seems inevitable that some changes will be undone on longer time-scales. There is some very strong evidence for such long-term Duke of York derivations having happened in a diachronic sense. The history of English provides a nice example. In Old English, front vowels were “broken” in certain environments (e.g. before h): *æ became ea, *e become eo, and *i became io. We do not, of course, know with absolute certainty exactly how these segments were pronounced, unbroken or broken, but it is at least fairly certain that unbroken *æ, *e and *i were pronounced as [æ], [e] and [i] or vowels of very similar quality. The broken vowels remained largely unchanged throughout the Old English period, except that io was everywhere replaced by eo. But by the Middle English period they had been once again “unbroken” to a and e respectively—the only eventual change was to pre-Old English broken *i which eventually became Middle English e. There may or may not have been minor changes in the pronunciations of these letters in the meantime—[æ] to [a], [e] to [ɛ], [i] to [ɪ]—but these seem scarcely large enough for this sequence of changes to not count as a diachronic Duke of York derivation.

But there are indeed linguists who appear to object to the postulation of diachronic Duke of York derivations, just like the linguists Pullum mentions. Cercignani (1972) seems to rely on such an objection in his questioning of the hypothesis that Proto-Germanic *ē became *ā in stressed syllables in pre-Old English and pre-Old Frisian. The relevant facts here are as follows.

  1. The general reflexes of late Proto-Indo-European *ē in initial syllables in the Germanic languages are exemplified by the following example: Proto-Indo-European *dʰéh1tis ‘act of putting’ (cf. Greek θέσις; Sanskrit dádhāti and Greek τίθημι for the root *dʰeh1 ‘put’) ↣ Proto-Germanic *dēdiz ‘deed’ (with -d- [< Proto-Indo-European *-t-] levelled in from the Proto-Indo-European oblique stem *dʰh1téy-) > Gothic -deþs in missadeþs ‘misdeed’, Old Norse dáð, Old English (West Saxon) dǣd, Old English (non-West Saxon) and Old Frisian dēd, Old Saxon dād and Old High German tāt. One can see that Gothic, Old English (non-West Saxon) and Old Frisian reflect the vowel’s presumed original mid quality, Old Norse, Old Saxon and Old High German have shifted it to a low vowel, and Old English (West Saxon) is intermediate, having shifted it to a near-low front vowel. Length is preserved in every case (Gothic e is a long vowel, it’s just not marked with a macron diacritic because Gothic has no short /e/ phoneme). It is reasonable to reconstruct *ē for Proto-Germanic, reflecting the original Proto-Indo-European quality, and to assume that the shifts have taken place at a post-Proto-Germanic date.
  2. In Old English and Old Frisian, Proto-Germanic *ē is reflected as ō if it was immediately before an underlying nasal (including nasals before *h and *hʷ, which were allophonically elided in Proto-Germanic) in Proto-Germanic: Proto-Germanic *mēnō̄ ‘moon’ (cf. Old Saxon and Old High German māno; Gothic mena with -a [< Proto-Germanic *-a] levelled in from the stem *mēnan-; Old Norse máni with -i levelled in from nouns ending in -i < *-ija [with *-a levelled in as in Gothic] ← Proto-Germanic *-ijō̄) > Old English, Old Frisian mōna.
  3. In Old English, Proto-Germanic *ē is reflected as ā immediately before w: Proto-Germanic *sēgun ‘saw’ (3pl.) (cf. Old Norwegian and Old Swedish ságu, Old English [non-West Saxon] and Old Frisian sēgon; Gothic seƕun and Old High German sāhun with Gothic -ƕ- and Old High German -h- [< Proto-Germanic *-hʷ-] levelled in from the infinitive and present stems *sehʷ- and *sihʷ- and the past. sg. stem *sahʷ-; Old Saxon sāwun with -w- [< Proto-Germanic *-w-] levelled in from the past subj. stem *sēwī-) ↣ Old English (West Saxon) sāwon with -w- < Proto-Germanic *-w- levelled in as in Old Saxon.

The question is in which languages the shift from *ē to *ā reflected in Old Norse, Old Saxon, Old High German and (partially, at least) in Old English (West Saxon) took place. Cercignani argues that it took place only in the languages it is reflected in, with Old English and Old Frisian being partially or totally unaffected by this shift. Let us call this the restriction hypothesis. Other linguists propose that it took place in every Proto-Germanic language other than Gothic, including Old English and Old Frisian, and later shifts are responsible for the reflection of Proto-Germanic *ē as ǣ or ē in Old English and Old Frisian. Let us call this the extension hypothesis (because it postulates a more extensive area for the *ē > *ā shift to take place in than the restriction hypothesis). The derivation *ē > *ā > ē which must have taken place in Old English (non-West Saxon) and Old Frisian if the extension hypothesis is to be accepted is, of course, a Duke of York derivation, and it is clear that Cercignani regards this is a major strike against the extension hypothesis.

The restriction hypothesis certainly appears simpler and, therefore, preferable at first glance. However, there are various pieces of evidence that complicate matters—most obviously points 2 and 3 above. If Proto-Germanic *ē became *ā in pre-Old English and pre-Old Frisian before shifting back to a higher quality, then we can explain the reflection of Proto-Germanic *ē as ō when nasalized or immediately before a nasal as the result of a shift *ā > *ō in this environment (paralleled by the present-day shift /ɑ̃/ > [ɔ̃] in some French dialects). This is more believable than a direct shift *ē > *ō and arguably simpler than a two-step shift *ē > *ā > *ō occurring exclusively in this nasal environment. Likewise, one might argue that the postulation of a slight restriction on the environment of the *ā-fronting sound change in Old English, allowing for retention of *ā before *w, is simpler than the postulation of an entirely separate sound change shifting *ē to *ā before *w in Old English. Neither of these arguments is at all conclusive, but they might be sufficient to make the reader adjust their estimations of the two hypotheses’ probabilities a little in favour of the extension hypothesis. As far as I can tell, the thrust of Cercignani’s argument is that, even if the consideration of points 2 and 3 does make the restriction hypothesis more complicated than it seems at first glance, the postulation of Duke of York derivations is preposterous enough that the restriction hypothesis is still by far the favourable one. Naturally I, not thinking that Duke of York derivations are necessarily preposterous, disagree.

In any case there is some more conclusive evidence for the extension hypothesis not mentioned by Cercignani, but mentioned by Ringe (2014: 13). The Proto-Germanic distal demonstrative and interrogative locative adverbs ‘there’ and ‘where’ can be reconstructed as *þar and *hʷar on the basis of Gothic þar and ƕar and Old Norse þar and hvar. Further support for these reconstructions comes from the fact that they can be transparently derived from the Proto-Germanic distal demonstrative and interrogative stems *þa- and *hʷa- by the addition of a locative suffix *-r (also found on other adverbs such as *aljar ‘elsewhere’ [cf. Gothic aljar, Old English ellor] ← *alja- ‘other’ + *-r). But in the West Germanic languages, the reflexes are as if they contained Proto-Germanic *ē: Old English (West Saxon) þǣr and hwǣr, Old English (non-West Saxon) þēr and hwēr, Old Frisian thēr and hwēr, Old Saxon thār and hwār, Old High German dār and wār. The simplest way to explain this is to propose that there has been an irregular lengthening of these words to *þār and *hwār in Proto-West Germanic, and that the *-ā- in these words was raised in Old English and Old Frisian by the same changes that raised *ā < Proto-Germanic *ē. Proponents of the restriction hypothesis must propose an irregular raising as well as a lengthening in these words, which is perhaps less believable (one can imagine adverbs with the sense ‘here’ and ‘there’ being lengthened due to contrastive emphasis—Ringe alludes to “heavy deictic stress”, which may be the same thing, although he doesn’t explain the term) and, most importantly, one must propose that this irregular raising only happens in Old English and Old Frisian, with the identity of the reflexes of Proto-Germanic *a in these words with the reflexes of Proto-Germanic *ē in stressed syllables existing entirely by coincidence. It is true that Proto-Germanic short *a in stressed syllables became *æ in Old English and Old Frisian, so if we propose that the irregular lengthening occurred after this change as an areal innovation among the West Germanic languages, we can account for Old English (West Saxon) þǣr and hwǣr; but this does not account for Old English (non-West Saxon) þēr and hwēr and Old Frisian thēr and hwēr, which have to be accounted for by an irregular raising.

To me this additional evidence seems fairly decisive. In that case, with the extension hypothesis accepted, we have a nice example of a diachronic Duke of York derivation which we know must have run its full course in a fairly short time, because we can date the Proto-Northwest Germanic *ē > *ā shift and the Old English *ǣ > ē shift (fed by the *ā > *ǣ shift, whose date is irrelevant here because it must have occurred in between these two) with reasonable precision. Ringe (p. 12), citing Grønvik (1998), says that the *ē > *ā shift is “attested from the second half of the 2nd century AD”. This is presumably based on runic evidence. As for the *ǣ > ē shift, it was one of the very early Old English sound changes in the dialects it took place in, being attested already in apparent completion in the oldest Old English texts (which date to the 8th century AD). The fact that it is shared with Old Frisian also suggests an early date. We can therefore say that there were at most five or six centuries between the two shifts, and quite likely considerably less.

III.

To summarize: though they may seem somehow untidy, Duke of York derivations, whether diachronic or synchronic, are not intrinsically implausible. The simplest hypothesis that accounts for the data should always be preferred, but this is not always the hypothesis that avoids the Duke of York gambit. On the diachronic side of things, Duke of York derivations can certainly take place over many centuries—which nobody would dispute—but they can also take place over periods of just a few centuries, as evidenced by the history of Proto-Germanic *ē in Old English and Old Frisian.

References

Brink, D., 1974. Characterizing the natural order of application of phonological rules. Lingua, 34(1), pp. 47-72.

Campbell, L., 1973. Extrinsic order lives. Bloomington, IN: Indiana University Linguistics Club Publications.

Cercignani, F., 1972. Indo-European ē in Germanic. Zeitschrift für vergleichende Sprachforschung, 86(1. H), pp. 104-110.

Grønvik, O., 1998. Untersuchungen zur älteren nordischen und germanischen Sprachgeschichte. Lang.

Pullum, G. K., 1976. The Duke of York gambit. Journal of Linguistics, 12(01), pp. 83-102.

Ringe, D. & Taylor, A., 2014. A Linguistic History of English: Volume II, The Development of Old English. OUP.

The Kra-Dai languages of Hainan

One of my favourite blogs on the Internet is Martin Lewis’s GeoCurrents, a consistently high-quality and information-dense blog about geography, especially geopolitics, cultural geography and economic geography. As a student in linguistics I’m especially interested in the posts about linguistic geography (which comes under cultural geography), but almost every GeoCurrents post is interesting, and tells me lots of things I didn’t already know. As an example post for interested readers which touches on cultural, linguistic and ethnic geography and the history of agriculture, I recommend The Lost World of the Sago Eaters. Unfortunately, Martin Lewis recently announced that he was going to have to stop making any more posts until, at least, next June. So I thought it might be a good idea to try and do some posts in the style of GeoCurrents on this blog—introducing the reader to some region of the world and telling them whatever interesting things I can find out about this region and the people who live there.

For this particular post, I’ve decided to write about the island of Hainan, and in particular the Kra-Dai languages which are spoken there, which are in my opinion pretty interesting for several reasons. Hainan is an island off the southern coast of China, in the South China Sea. If you look along the southern Chinese coast, you can see Taiwan off the southeast, and then, further towards the west, not far from the Indochinese peninsula, and just across a strait from a little peninsula jutting out to the south, there’s another island of similar size—that’s Hainan. Politically, it’s part of the People’s Republic of China, and it has generally been a possession of the various Chinese states that have existed for over two thousand years. That makes it far more Chinese in terms of age than Taiwan, which was not settled by Han Chinese until the 17th century, was actually claimed by the Netherlands and Spain before China, and was under Japanese rule for most of the first half of the 20th century. However, Hainan has always been on the periphery of China culturally and economically, as well as geographically. Interested readers are referred to Michalk (1986) for an overview of the island’s history.

Below I’ve included a map of the languages of Hainan, based mainly on Steven Huffman‘s language maps.1 One remarkable feature of the island’s linguistic geography which you can see from this map is that languages of four out of the five language families of Southeast Asia are spoken on it: Sino-Tibetan, Hmong-Mien, Kra-Dai and Austronesian. Only Austro-Asiatic is absent, although an Austro-Asiatic language (Vietnamese) is spoken on the nearby island of Bạch Long Vĩ, which is politically part of Vietnam. That’s quite impressive for an island not much larger than Sicily.

hainan

All of these languages are interesting and worthy of discussion, but for the sake of not giving me too much to write about I’m going to focus on this post on those belonging to the Kra-Dai family: Be, Li, Cunhua and Jiamao. I will also—because it’s relevant—discuss the Austronesian language, Huihui, a little as well. These Kra-Dai languages are, most likely, the “indigenous” languages of the island, in the sense that they, or direct ancestors of them, were spoken on the island before the others. The Chinese language was obviously brought to Hainan by the Chinese settlers arriving mostly in the second millennium AD; the Mun language is closely related to (and sometimes considered the same as) the Kim Mun language spoken by some people of the Yao ethnicity in the mainland Chinese provinces of Guangxi and Hunan, and therefore these Mun-speakers are probably recent arrivals as well.

The most widespread and probably the most well-known language spoken on Hainan other than Chinese is the Li language, which is spoken in the mountainous interior of Hainan. Chinese sources use the name Li 黎 “black” to refer to the frequently-rebellious indigenous people of Hainan as early as the time of the Song Dynasty (960–1279). Of course, it can’t be assumed that this name refers to exactly the same group of people as the modern name Li does. But the geographic location of the Li-speakers—in the most inaccessible parts of the island, with Chinese settlers occupying the more habitable coastal lowlands—and their language’s phylogenetic position within Kra-Dai (we’ll talk about this more below) does strongly suggest that their language was the main language spoken on Hainan before Chinese settlement. Linguists sometimes use the name Hlai instead, which is presumably based on a native self-appellation. They also sometimes speak of the “Hlai languages” rather than the “Hlai language”, because, much like Chinese, the various Hlai “dialects” are actually highly divergent and often mutually unintelligible. This again suggests an antiquity to the presence of the Li language on Hainan—there must have been plenty of time for these dialects to differentiate from one another. Norquest (2007) has attempted a reconstruction of Proto-Hlai; you can look at his dissertation to get an idea of how different these dialects are from each other.

Two of the Li dialects—Cunhua and Nadouhua—have a special status. They are not much more distinct from neighbouring Li dialects than any other Li dialects are from the dialects neighbouring them. However, the speakers of these dialects are classified by the Chinese government as members of the Han Chinese ethnicity rather than the Li ethnicity, and they themselves identify more with the Han Chinese than the Li. Speakers of other Li dialects also refer to Han Chinese and Cunhua or Nadouhua speakers by the same name, Moi. Cunhua and Nadouhua do have lots of borrowings from Chinese to a much greater extent than the other Li dialects, but according to Norquest (2007) their basic vocabulary is mostly of Li origin which indicates that they should be regarded as Li dialects heavily influenced by Chinese, rather than Chinese dialects influenced by Li or mixed languages. The influence from Chinese is probably due to the fact that the speakers of these dialects live in the coastal lowlands, not in the mountains as do the speakers of the other Li dialects, where contact with Chinese settlers is greater. It is also likely that the speakers have significant Han Chinese ancestry as well as Li ancestry, but I don’t know if any genetic studies have been done. In any case, because of their different ethnic status Cunhua and Nadouhua are often regarded as comprising a separate language from Li, usually referred to as Cunhua or Cun after the more well-known of the two dialects (Cunhua has many times more speakers than Nadouhua). This is reflected on the map above.

Another Li “dialect” is special because it is in the opposite situation to Cunhua and Nadouhua: its speakers do not have a separate ethnic identity from the Li, but the language is clearly divergent and may not even be genetically a Li language at all. This is Jiamao, which is also shown as a distinct language on the map above. Less than half of its lexicon appears to be of Li origin—that is, more than half of its words cannot be identified as similar to words in other Li dialects. Moreover—and more significantly—linguists have been unable to establish regular sound correspondences between the Jiamao words that do look similar to those in other Li dialects, and those Li dialects. In the words of Thurgood (1992a):

The Jiamao tones do not correspond with the tones of Proto-Hlai at all. The Jiamao initials and finals correspond, but with a pervasive, unsystematic irregularity that raised more questions than it answered. The Jiamao initials often have two relatively-frequent unconditioned reflexes, with other less-frequent reflexes thrown in apparently randomly. The more comparative work that was done, the more obvious it became that a comparative approach was not going to explain the “extreme (and apparently unsystematic) aberrancy” of Jiamao.

Some information given to Thurgood by a Chinese linguist, Ni Dabai (it’s not clear where Ni Dabai got the information from) gave him an idea as to why this might be this case. Ni Dabai said that the Jiamao were originally Muslims, and they arrived in two waves, the first in 986 AD and 988 AD and the second in 1486. Thurgood concluded from this that the Jiamao were originally speakers of an Austro-Asiatic language, who migrated to Hainan and thus ended up in close contact with Li speakers. The Jiamao ignored the tone of the Li words they borrowed, and instead decided which tone to pronounce them with based on their initial consonants; this explains the apparently random tone correspondences. And they borrowed words in several strata; this explains the one-to-many correspondences among the non-tonal segments.

I’m not entirely sure how Thurgood gets straight to “they must have been Austro-Asiatic speakers” from “they were originally Muslims,” though. Unfortunately the copy of Thurgood’s paper that I can access online is inexplicably cut off after the fourth page, so I don’t know if he elaborates on the scenario later on in the paper. I’m not aware of any Austro-Asiatic-speaking ethnic group whose members are mostly Muslim. My understanding is that most of the Muslims in Southeast Asia are the Malays, and their close relatives, the Chams, who speak Austronesian languages. To my uninformed, non-Southeast Asian expert, not-having-access-to-the-full-Thurgood-paper self, the Chams seem like the obvious candidates. The Cham kingdom (Champa), situated in what is now southern Vietnam, was for a millennium and a half an integral part of the political landscape of continental Southeast Asia. Its history is one of constant conflict with the Vietnamese kingdom to its north, in which it tended to be something of the underdog. The Vietnamese sacked the Cham capital in 982, 1044, 1068, 1069 (clearly, the 11th century wasn’t a good time for Champa), 1252, 1446, and 1471; after the last and most catastrophic sacking in 1471, the Vietnamese emperor finally annexed the capital and reduced Champa to a rump state occupying only what were originally just its southern regions. Then these regions, too, were chipped away over the next few centuries, and Champa finally vanished from the map in 1832. Some Cham still live in these regions, but they are no longer the dominant ethnic group there, having mostly either been massacred or fled—mostly to Cambodia in the west, but also, in relatively small numbers, to Hainan in the east. This is how the Austronesian language you can see on the map, Huihui, ended up being spoken in Hainan. Huihui is simply an old-fashioned Chinese word for “Muslim”2, and the speakers of Huihui are indeed Muslims. The Huihui themselves call themselves and their language Tsat (which is cognate to Cham). According to Thurgood (1992b), the Tsat came to Hainan after the sacking of 982, and were mostly merchants who had established connections in the area, which explains their Muslim faith (most Cham at the time were Hindu, but much of the merchant class was Muslim; the Cham only became majority-Muslim during the 15th century, which is about the same time that the Malays converted). More Chams might have migrated to join the Tsat after the subsequent sackings.

Now, the dates Ni Dabai gave for the waves of Jiamao settlement—986 AD, 988 AD, 1486—are just a few years after the sackings of 982 AD and 1471 AD respectively, and that suggests to me that Jiamao, like Huihui, may have a Cham origin. But whereas the Cham origin of Huihui explains most everything about it, there are still a lot of unanswered questions with respect to Jiamao even if we accept that it has a Cham origin. Most obviously, what would have led them to take up residence in the highlands of the southeast, rather than the southern coast where Cham traders would have established the most contacts, and to assimilate so much into the Li culture that they gave up Islam (they are now animists like the Li and Be) and extensively relexified their language with Li loanwords?

Then there’s the problem of the actual linguistic evidence. Norquest in his dissertation examined the Jiamao lexicon and found a grand total of… 2 possible words of Austronesian origin (ɓaŋ˥ ɓɯa˩ ‘butterfly’ and pəj˦ ‘pig’; cf. Proto-Austronesian *qari-baŋbaŋ and *babuy), and none of Austro-Asiatic or of any other identifiable origin, apart from Li. He therefore regards the language as a provisional language isolate. Now, I don’t know how well Norquest knows Austronesian and Austro-Asiatic. He doesn’t explicitly rule out a connection with either of those families; he’s more concerned with simply listing the non-Li Jiamao vocabulary than identifying its origin. So it’s not impossible that Jiamao’s non-Li vocabulary is from one of the main Southeast Asian families, but this is certainly something on which more research needs to be done. I have included below some of the Jiamao and Proto-Hlai words for various body parts, to illustrate the difference; this data is taken from Norquest’s dissertation.

Proto-Hlai Jiamao Sense
*dʱəŋ pʰan1 ‘face’
*ʋaːɦ vet10 ‘shoulder’
*kʰiːn tɯːn1 ‘arm’
*ɦaːŋ tsʰɔːŋ1 ‘chin’

In any case, I assume Thurgood had a good reason for proposing the Austro-Asiatic connection (I just can’t figure out by myself what that reason would be). Another caveat to bear in mind here is that Ni Dabai’s information might be incorrect—even if the story of Jiamao being descended from Muslim immigrants arriving in 986 AD, 988 AD and 1486 isn’t completely false, it could be wrong in some details: perhaps they were Hindus rather than Muslims, and perhaps the dates are inaccurate. In short, it’s a mystery. But an interesting one, don’t you think? It’s just a shame that there has been so little investigation into it, so far—Thurgood’s not-wholly-accessible paper and Norquest’s dissertation are the only two papers I can find which go into any detail about Jiamao.

tai-kadai-phylogeny

Moving on… there is one other Kra-Dai language spoken on Hainan, which is completely different, both linguistically and ethnically, from Li. The Be language constitutes a branch of Kra-Dai of its own, and it does not appear to be much more closely related to the Li languages than it is to other Kra-Dai languages. The subgrouping of the branches of the Kra-Dai family is not particularly certain (as usual for language families—subgrouping is a hard problem in linguistics); Wikipedia gives a nice overview, and I’ve included a tree on the right adapted from Blench (2013) below (which appears to be just the Edmondson and Solnit classification mentioned in the Wikipedia article). As you can see, Be is often considered the closest relative of the Tai branch (the one that contains the one Kra-Dai language most people have heard of, Thai, the official language of Thailand). In fact, Norquest in  his dissertation mentions that it shows the greatest lexical similarity with the Northern Tai subgroup, specifically, meaning it might actually be a Tai language; unfortunately, this cannot be verified until more comparative work on Kra-Dai languages is done (no full reconstruction of Proto-Tai or Proto-Northern Tai is yet available).

This suggests that Be is a more recent arrival on Hainan than Li, because it must have arrived after or close to the time that the Tai subgroup separated from the other Kra-Dai languages, whereas Li could have split off straight from Proto-Tai-Kadai. Shintani (1991) has some phonological evidence which he says supports this: the Hainanese dialect of Chinese has undergone a sound change s > t (that is, s in other Chinese dialects corresponds to Hainanese t), and the Be language reflects this sound change in borrowings from Chinese such as tuan “garlic” (cf. Mandarin suan). That means it must have borrowed these words from Hainanese, and Shintani takes this as indicating that Be speakers arrived on Hainan after Chinese settlers were established on the island (that would be no earlier than the time of the Song Dynasty of 960-1279). But I don’t quite follow this inference—couldn’t the Be have arrived first, and borrowed these words only after the Chinese arrived?

That a Tai-speaking group might have migrated to Hainan in the historical period is not implausible, however. Although the political prominence of Thai in modern times might lead you to think otherwise, the Tai languages originated in southern China—more precisely, in the area of the modern provinces of Guizhou and Guangxi, probably extending into adjacent regions of Yunnan and Vietnam as well—and were restricted to that region for much of the historical period. Around 1000 AD, some of them began to migrate to the southwest, perhaps to escape Chinese political domination, although this doesn’t seem like a complete explanation—though the Chinese population in the area has surely been growing over time, they had held the political power since long before 1000 AD. (Also, plenty of Tai-speaking peoples remained in their homeland—in fact, the Tai-speaking Zhuang people still comprise over a quarter of the population of Guangxi). These migrations continued for the next couple of centuries, and by the 13th century the familiar Tai kingdoms of the historical record were being established (Sukhothai in the central part of modern Thailand; Lanna in the northern part of modern Thailand; the Shan states in the eastern part of modern Burma; and Ahom way over in the Brahmaputra valley just east of modern Bangladesh). The Lao people of Laos established their kingdom, known then as Lan Xang “[land of the] million elephants”, in the following century. Over the centuries these evolved into the modern Tai states of Thailand and Laos. Now, if the Tai migrated to the southwest because they wished to leave southern China (rather than being attracted by some particular feature of the southwest), we could positively expect some of them to take the alternative route to the direct south and end up on Hainan. Perhaps this, then, is the origin of the Be.

There is an alternative scenario I can think of which is probably less plausible, but a bit more exciting. Maybe the Be have always been on Hainan—or at least, they have been there as long as the Li have. Be being part of or most closely related to the Tai branch isn’t incompatible with this hypothesis. There’s a useful heuristic in linguistics that a region where a language family is most diverse is likely to be its place of origin, because the longer the presence of a speech variety in a given area, the more time it has to diversify into divergent but genetically related daughters. It’s a heuristic, not a rule, so exceptions are possible, and in fact one of the obvious ways an exception could arise is if external pressure repeatedly pushes speakers of languages in the family into a particular small cul-de-sac region (a “refugium”), which is what would have happened in Hainan in the scenario described in the above paragraph. And of course, the diversity of Kra-Dai in Hainan, with just two independent branches represented, isn’t that much greater than anywhere else (there are four independent branches in Guangxi, namely Kra, Lakkia, Kam-Sui and Tai, and by including an adjacent region of Guangdong the remaining Biao branch can be included as well; of course, Guangxi is a lot bigger than Hainan, and depending on how deep you imagine some of the proposed subgroups are, your perception of each region’s diversity might be altered). But I don’t think it’s ludicrous to think that the Kra-Dai languages, or at least a sub-clade of them excluding Kra, might have originated on Hainan. They might have differentiated first into a southern variety (pre-Li) and a northern variety; a first wave of migration onto the mainland, by the speakers of the northern variety, would have brought about the split between Proto-Lakkia-Biao-Kam-Sui and Proto-Be-Tai; and a second wave would have brought about the split between Proto-Tai and Be.

This is especially interesting to consider in the light of the Austro-Tai hypothesis, one of the most plausible macrofamily proposals floating around. Essentially it proposes a genetic relationship between the Kra-Dai languages and the Austronesian languages, although opinions among proponents differ as to whether Kra-Dai is coordinate to Austronesian (that is, Proto-Kra-Dai and Proto-Austronesian share a common ancestor, but neither is the ancestor of the other) or subordinate to Austronesian (that is, Proto-Austronesian is the ancestor of Proto-Kra-Dai). Sagart (2004) is of the opinion that it is subordinate. If Kra-Dai is subordinate to Austronesian then the possibility arises that Austronesians migrated to Hainan, just as they migrated to essentially all of the islands in southeast Asia and Oceania (plus Madagascar!) Unfortunately, the facts do not seem friendly to this neat hypothesis: nobody, so far as I know, goes so far as to say that Kra-Dai is subordinate to Malayo-Polynesian (the subgroup of Austronesian which includes all of the Austronesian languages outside of Taiwan), and the Austronesians probably hadn’t developed their island-hopping habits so extensively at the point where they were still in Taiwan. The more likely scenario, if the Austro-Tai hypothesis is correct, is that Proto-Kra-Dai was the result of a migration from Taiwan onto mainland China; and in order to reconcile this with the Hainan homeland hypothesis we’d have to propose a migration onto Hainan and then multiple migrations back out again, which is kind of untidy. So, for various reasons, I don’t really think the Hainan homeland hypothesis is likely to be correct. I’d say it’s more likely that the homeland of the Kra-Dai languages is on the mainland, somewhere in Guangxi. But it’s not impossible.

Footnotes

  1. ^ Huffman’s maps do not always make it clear which language is spoken within a given boundary; in order to identify the languages spoken in scattered pockets in the northern part of the Li-speaking area and to the north and east of that area, I had to refer to the wonderful but not entirely reliable map at Muturzikin. Unfortunately the boundaries on Muturzikin’s map are not entirely the same as those on Huffman’s, and even on Muturzikin’s map, it is sometimes not entirely clear what language is spoken within a particular boundary, so I have had to make some guesses in identifying all of these pockets as Mun-speaking.
  2. ^ The modern Chinese word for “Muslim” is Musilin, but the unreduplicated word Hui, which strictly speaking refers only to Chinese Muslims, is often colloquially used to refer to Muslims of any nationality.

References

Blench, R., 2013. The prehistory of the Daic (Tai-Kadai) speaking peoples and the hypothesis of an Austronesian connection. In Unearthing Southeast Asia’s past: Selected Papers from the 12th International Conference of the European Association of Southeast Asian Archaeologists (Vol. 1, pp. 3-15).

Michalk, D.L., 1986. Hainan Island: A brief historical sketch. Journal of the Hong Kong Branch of the Royal Asiatic Society, pp.115-143.

Norquest, P.K., 2007. A phonological reconstruction of Proto-Hlai. ProQuest.

Sagart, L., 2004. The higher phylogeny of Austronesian and the position of Tai-Kadai. Oceanic Linguistics, 43(2), pp.411-444.

Shintani, T., 1991. Preglottalized consonants in the languages of Hainan Island, China. Journal of Asian and African Studies, (41), pp.1-10.

Thurgood, G., 1992. The aberrancy of the Jiamao dialect of Hlai: speculation on its origins and history. Southeast Asian Linguistics Society I, pp.417-433.

Thurgood, G., 1992b. From Atonal to Tonal in Utsat (A Chamic Language of Hainan). In Proceedings of the Eighteenth Annual Meeting of the Berkeley Linguistics Society: Special Session on the Typology of Tone Languages (pp. 145-146).

The perfect pathway

Anybody who knows French or German will be familiar with the fact that the constructions in these languages described as “perfects” tend to be used in colloquial speech as simple pasts1 rather than true perfects. This can be illustrated by the fact that the English sentence (1) is ungrammatical, whereas the French and German sentences (2) and (3) are perfectly grammatical.

  1. *I have left yesterday.
  1. Je suis parti hier.
    I am leave-PTCP yesterday
    “I left yesterday.”
  1. Ich habe gestern verlassen.
    I have-1SG yesterday leave-PTCP
    “I left yesterday.”

The English perfect is a true perfect, referring to a present state which is the result of a past event. So, for example, the English sentence (4) is paraphrased by (5).

  1. I have left.
  1. I am in the state of not being present resulting from having left.

As it is specifically present states which are referred to by perfects, it makes no sense for a verb in the perfect to be modified by an adverb of past time like ‘yesterday’. That’s why (1) is ungrammatical. In order for ‘yesterday’ to modify the verb in (1), the verb would have to refer to a past state resulting from an event further in the past; the appropriate category for such a verb is not the perfect but rather the pluperfect or past perfect, which is formed in the same way as the perfect in English except that the auxiliary verb have takes the past tense. It’s perfectly fine for adverbs of past time to modify the main verbs of pluperfect constructions; c.f. (6).

  1. I had left yesterday.

If the French and German “perfects” were true perfects like the English perfect, (2) and (3) would have to be ungrammatical too, and as they are not in fact ungrammatical we can conclude that these “perfects” are not true perfects. (Of course one could also conclude this from asking native speakers about the meaning of these “perfects”, and one has to take this step to be able to conclude that they are in fact simple pasts; the above is just a neat way of demonstrating their non-true perfect nature via the medium of writing.)

French and German verbs do have simple past forms which have a distinctive inflection; for example, partis and verließ are the first-person singular inflected simple past forms of the verbs meaning ‘leave’ in sentences (2) and (3) respectively, corresponding to the first-person singular present forms pars and verlasse. But these inflected simple past forms are not used in colloquial speech; their function has been taken over by the “perfect”. If you take French or German lessons you are taught how to use the “perfect” before you are taught how to use the simple past, because the “perfect” is more commonly used; it’s the other way round if you take English lessons, because in English the simple past is not restricted to literary speech, and is more common than the perfect as it has a more basic meaning.

The French and German “perfects” were originally true perfects even in colloquial speech, just as in English. So how did this change in meaning from perfect to simple past occur? One way to understand it is as a simple case of generalization. The perfect is a kind of past; if one were to translate (4) into a language such as Turkish which does not have any sort of perfect construction, but does have a distinction between present and past tense, one would translate it as a simple past, as in (7).

  1. Ayrıldım.
    leave-PST-1SG
    “I left / have left.”

The distinction in meaning between the perfect and the simple past is rather subtle, so it is not hard to imagine the two meanings being confused with each other frequently enough that the perfect came eventually to be used with the same meaning as the simple past. This could have been a gradual process. After all, it is often more or less a matter of arbitrary perspective whether one chooses to focus on the state of having done something, and accordingly use the perfect, or on the doing of the thing itself, and accordingly use the simple past. Here’s an example: if somebody tells you to look up the answer to a question which was raised in a discussion of yours with them, and you go away and look up the answer, and then you meet this person again, you might say either “I looked up the answer” or “I’ve looked up the answer”. At least to me, neither utterance seems any more expected in that situation than other. French and German speakers may have tended over time to more and more err on the side of focusing on the state, so that the perfect construction became more and more common, and this would encourage reanalysis of the meaning of the perfect as the same as that of the simple past.

But it might help to put this development in some further context. It’s not only in French and German that this development from perfect to simple past has occurred. In fact, it seems to be pretty common. Well, I don’t know about other families, but it is definitely common among the Indo-European (IE) languages. There is, in fact, evidence that the development occurred in the history of English, during the development of Proto-Germanic from Proto-Indo-European (PIE). (This means German has undergone the development twice!) I’ll talk a little bit about this pre-Proto-Germanic development, because it’s a pretty interesting one, and it ties in with some of the other cases of the development attested from IE languages.

PIE (or at least a late stage of it; we’ll talk more about that issue below) distinguished three different aspect categories, which are traditionally called the “present”, “aorist” and “perfect”. The names of these aspects do not have their usual meanings—if you know about the distinction between tense and aspect, you probably already noticed that “present” is normally the name of a tense, rather than an aspect. (Briefly, tense is an event or state’s relation in time to the speech act, aspect is the structure of the event on the timeline without any reference to the speech act; for example, aspect includes things like whether the event is completed or not. But this isn’t especially important to our discussion.) The better names for the “present” and “aorist” aspects are imperfective and perfective, respectively. The difference between them is the same as that between the French imperfect and the French simple past: the perfective (“aorist”) refers to events as completed wholes and the imperfective (“present”) refers to other events, such as those which are iterated, habitual or ongoing. Note that present events cannot be completed yet and therefore can only be referred to by imperfectives (“presents”). But past events can be referred to by either imperfectives or perfectives. So, although PIE did distinguish two tenses, present and past, in addition to the three aspects, the distinction was only made in the imperfective (“present”, although that name is getting especially confusing here) aspect because the perfective (“aorist”) aspect entailed past tense. The past tense of the imperfective aspect is called the imperfect rather than the past “present” (I guess even IEists would find that terminology too ridiculous).

So what was the meaning of the PIE “perfect”? Well, the PIE “perfect” is reflected as a true perfect in Classical Greek. The system of Classical Greek, with the imperfect, aorist and true perfect all distinguished from one another, was more or less the same as that of modern literary French. However, according to Ringe (2006: 25, 155), the “perfect” in the earlier Greek of Homer’s poems is better analyzed as a simple stative, referring to a present state without any implication of this state being the result of a past event. Now, I’m not sure exactly what the grounds for this analysis are. Ringe doesn’t elaborate on it very much and the further sources it refers to (Wackernagel 1904; Chantraine 1927) are in German and French, respectively, so I can’t read them very easily. The thing is, every state has a beginning, which can be seen as an event whose result is the state, and thus every simple stative can be seen as a perfect. English does distinguish simple statives from perfects (predicative adjectives are stative, as are certain verbs in the present tense, such as “know”). The difference seems to me to be something to do with how salient the event that begins the state—the state’s inception—is. Compare sentences (8) and (9), which have more or less the same meaning except that the state’s inception is more salient in (9) (although still not as salient as it is in (10)).

  1. He is dead.
  1. He has died.
  1. He died.

But I don’t know if there are any more concrete diagnostic tests that can distinguish a simple stative from a perfect. Homeric and Classical Greek are extinct languages, and it seems like it would be difficult to judge the salience of inceptions of states in sentences of these languages without having access to native speaker intutions.

It is perhaps the case that some states are crosslinguistically more likely than others to be referred to by simple statives, rather than perfects. Perhaps the change was just that the “perfect” came to be used more often to refer to states that crosslinguistically tend to be referred to by perfects. Ringe (2006: 155) says:

… a large majority of the perfects in Classical Attic are obvious innovations and have meanings like that of a Modern English perfect; that is, they denote a past action and its present result. We find ἀπεκτονέναι /apektonénai/ ‘to have killed’, πεπομφέναι /pepompʰénai/ ‘to have sent’, κεκλοφέναι /keklopʰénai/ ‘to have stolen’, ἐνηνοχέναι /enęːnokʰénai/ ‘to have brought’, δεδωκέναι /dedǫːkénai/ ‘to have given’, γεγραφέναι /gegrapʰénai/ ‘to have written’, ἠχέναι /ęːkʰénai/ ‘to have led’, and many dozens more. Most are clearly new creations, but a few appear to be inherited stems that have acquired the new ‘resultative’ meaning, such as λελοιπέναι /leloipʰénai/ ‘to have left behind’ and ‘to be missing’ (the old stative meaning).

These newer perfects could still be glossed as simple statives (‘to be a thief’ instead of ‘to have stolen’, etc.) but the states they refer to do seem to me to be ones which inherently tend to involve a salient reference to the inception of the state.

There is a pretty convincing indication that the “perfect” was a simple stative at some point in the history of Greek: some Greek verbs whose meanings are conveyed by lexically stative verbs or adjectives in English, such as εἰδέναι ‘to know’ and δεδιέναι ‘to be afraid of’, only appear in the perfect and pluperfect. These verbs are sometimes described as using the perfect in place of the present and the pluperfect in place of the imperfect, although at least in Homeric Greek their appearance in only the perfect and pluperfect is perfectly natural in respect of their meaning and does not need to be treated as a special case. These verbs continued to appear only in the perfect and pluperfect during the Classical period, so they do not tell us anything about when the Greek “perfect” became a true perfect.

Anyway, it is on the basis of the directly attested meaning of the “perfect” in Homeric Greek that the PIE “perfect” is reconstructed as a simple stative. Other IE languages do preserve relics of the simple stative meaning which add to the evidence for this reconstruction. There are in fact relics of the simple stative meaning in the Germanic languages which have survived, to this day, in English. These are the “preterite-present” or “modal” verbs: can, dare, may, must, need, ought, shall and will. Unlike other English verbs, these verbs do not take an -s ending in the third person singular (dare and need can take this ending, but only when their complements are to-infinitives rather than bare infinitives). Apart from will (which has a slightly more complicated history), the preterite-present verbs are precisely those whose presents are reflexes of PIE “perfects” rather than PIE “presents” (although some of them have unknown etymologies). It is likely that they were originally verbs that appeared only in the perfect, like Greek εἰδέναι ‘to know’.2

Most of the PIE “perfects”, however, ended up as the simple pasts of Proto-Germanic strong verbs. (That’s why the preterite-present verbs are called preterite-presents: “preterite” is just another word for “past”, and the presents of preterite-present verbs are inflected like the pasts of other verbs.) Presumably these “perfects” underwent the whole two-step development from simple stative to perfect to simple past. There was plenty of time for this to occur: remember that the Germanic languages are unattested before 100 AD, and the development of the true perfect in Greek had already occurred by 500 BC. Just as the analytical simple pasts of colloquial French and German, which are the reflexes of former perfects, have completely replaced the older inflected simple pasts, so the PIE “perfects” completely replaced the PIE “aorists” in Proto-Germanic. According to Ringe (2006: 157) there is absolutely no trace of the PIE “aorist” in any Germanic language. Proto-Germanic also lost the PIE imperfective-perfective opposition, and again the simple pasts reflecting the PIE “perfects” completely replaced the PIE imperfects—with a single exception. This was the verb *dōną ‘to do’, whose past stem *ded- is a reflex of the PIE present stem *dʰédeh1 ‘put’. Admittedly, the development of this verb as a whole is somewhat mysterious (it is not clear where its present stem comes from; proposals have been put forward, but Ringe 2006: 160 finds none of them convincing) but given its generic meaning and probable frequent use it is not surprising to find it developing in an exceptional way. One reason we can be quite sure it was used very frequently is that the *ded- stem is the same one which is though to be reflected in the past tense endings of Proto-Germanic weak verbs. There’s a pretty convincing correspondence between the Gothic weak past endings and the Old High German (OHG) past endings of tuon ‘to do’:

Past of Gothic waúrkjan ‘to make’ Past of OHG tuon ‘to do’
Singular First-person waúrhta ‘I made’ tëta ‘I did’
Second-person waúrhtēs ‘you (sg.) made’ tāti ‘you (sg.) did’
Third-person waúrhta ‘(s)he made’ tëta ‘(s)he did’
Plural First-person waúrhtēdum ‘we made’ tāti ‘we did’
Second-person waúrhtēduþ ‘you (pl.) made’ tātīs ‘you (pl.) did’
Third-person waúrhtēdun ‘they made’ tāti ‘they did’

Note that Proto-Germanic is reflected as ē in Gothic but ā in the other Germanic languages, so the alternation between -t- and -tēd- at the start of each ending in Gothic corresponds exactly, phonologically and morphologically, to the alternation between the stems tët- and tāt- in OHG.

The pasts of Germanic weak verbs must have originally been formed by an analytical construction with a similar syntax as the English, French and German perfect constructions, involving the auxiliary verb *dōną ‘to do’ in the past tense (probably in a sense of ‘to make’) and probably the past participle of the main verb. As pre-Proto-Germanic had SOV word order, the auxiliary verb could then be reinterpreted as an ending on the past participle, which would take us (with a little haplology) from (11) to (12).

  1. *Ek wēpną wurhtą dedǭ.
    I weapon made-NSG wrought-1SG
    “I wrought a weapon” (lit. “I made a weapon wrought”)
  1. *Ek wēpną wurht(ąd)edǭ
    I weapon wrought-1SG
    “I wrought a weapon”

(The past of waúrht- is glossed here by the archaic ‘wrought’ to distinguish it from ded- ‘make’, although ‘make’ is the ideal gloss for both verbs. I should probably have just used a verb other than waúrhtjan in the example to avoid this confusion, but oh well.)

Why couldn’t the pasts of weak verbs have been formed from PIE “perfects”, like those of strong verbs? The answer is that the weak verbs were those that did not have perfects in PIE to use as pasts. Many PIE verbs never appeared in one or more of the three aspects (“present”, “aorist” and “perfect”). I already mentioned the verbs like εἰδέναι < PIE *weyd- ‘to know’ which only appeared in the perfect in Greek, and probably in PIE as well. One very significant and curious restriction in this vein was that all PIE verbs which were derived from roots by the addition of a derivational suffix appeared only in the present aspect. There is no semantic reason why this restriction should have existed, and it is therefore one of the most convincing indications that PIE did not originally have morphological aspect marking on verbs. Instead, aspect was marked by the addition of derivational suffixes. There must have been a constraint on the addition of multiple derivational suffixes to a single root (perhaps because it would mess up the ablaut system, or perhaps just because it’s a crosslinguistically common constraint), and that would account for this curious restriction. Other indications that aspect was originally marked by derivational suffixes in PIE are the fact that the “present”, “aorist” and “perfect” stems of each PIE verb do not have much of a consistent formal relation to one another (there are some consistencies, e.g. all verbs which have a perfect stem form it by reduplication of the initial syllable, although *weyd- ‘know’, which has no present or aorist stem, is not reduplicated; but the general rule is one of inconsistency); there is no single present or aorist suffix, for example, and one pretty much has to learn each stem of each verb off by heart. Also, I’ve think I’ve read, although I can’t remember where I read it, that aspect is still marked (wholly, or largely) by derivational sufixes only in Hittite.

The class of derived verbs naturally expanded over time, while the class of basic verbs became smaller. The inability of derived verbs to have perfect stems is therefore perhaps the main reason why it was necessary to use an alternative strategy for forming the pasts of some verbs in Proto-Germanic, and thus to create a new class of weak verbs separate from the strong verbs.

So that’s the history of the PIE “perfect” in Germanic (with some tangential, but hopefully interesting elaboration). A similar development occurred in Latin. A few PIE “perfects” were preserved in Latin as statives, just like the Germanic preterite-presents (meminisse ‘to remember’, ōdisse ‘to hate’, nōvisse ‘to recognize, to know (someone)’); the others became simple pasts. But I don’t know much about the details of the developments in Latin.

Conclusions

perfect-pathwayWe’ve seen evidence from Indo-European languages that there’s a kind of developmental pathway going on: statives develop into perfects, and perfects develop into simple pasts. In order for the first step to occur there has to be some kind of stative category, and it looks like this might be a relatively uncommon feature: most of the languages I’ve seen have a class of lexically stative verbs or tend to use entirely different syntax for events and states (e.g. verbs for events, adjectives for states). (English does a bit of both.) The existence of the stative category in PIE might be associated with the whole aspectual system’s recent genesis via morphologization of derivational suffixes. Of course the second part of the pathway can occur on its own, as it did in French and German after perfects were innovated via an analytical construction. It is also possible for simple pasts to be innovated straight away via analytical constructions, as we saw with the Germanic weak past inflection.

It would be interesting to hear if there are any other examples of developments occurring along this pathway, or, even more interestingly, examples where statives, perfects or simple pasts have developed or have been developed in completely different ways, from non-Indo-European languages (or Indo-European languages that weren’t mentioned here).

Notes

  1. ^ I’m using the phrase “simple past” here to refer to the past tense without the additional meaning of the true perfect (that of a present state resulting from the past event). In French the simple past can be distinguished from the imperfect as well as the perfect: the simple past refers to events as completed wholes (and is therefore said to have perfective aspect), while the imperfect refers either to iterated or habitual events, or to part of an event without the entailment that the event was completed (and is therefore said to have imperfective aspect). The perfect also refers to events as completed wholes, but it also refers to the state resulting from the completion of such events, more or less at the same time (arguably the state is the more primary reference). In colloquial French, the perfect is used in place of the simple past, so that no distinction is made between the simple past and perfect (and the merged category takes the name of the simple past), but the distinction from the imperfect is preserved. Thus the “simple past” in colloquial French is a little different from the “simple past” in colloquial German; German does not distinguish the imperfect from the simple past in either its literary or colloquial varieties. The name “aorist” can be used to refer to a simple past category like the one in literary French, i.e., a simple past which is distinct from both the perfect and the imperfect.
  2. ^ Of course, εἰδέναι appears in the pluperfect as well as the perfect, but the Greek pluperfect was an innovation formation, not inherited from PIE, and there is no reason to think Proto-Germanic ever had a pluperfect. The Proto-Germanic perfect might well have referred to a state of indeterminate tense resulting from a past event, in which case it verbs in the perfect probably could be modified with adverbs of past time like ‘yesterday’. It is a curious thing that the present and past tenses were not distinguished in the PIE “perfect”; there is no particular reason why they should not have been (simple stative meaning is perfectly compatible with both tenses, c.f. English “know” and “knew”) and it is therefore perhaps an indication that tense distinction was a recent innovation in PIE, which had not yet had time to spread to aspects other than the imperfective (“present”). The nature of the endings distinguishing the present and past tense is also suggestive of this; for example the first-person, second-person and third-person singular endings are *-mi, *-si and *-ti respectively in the present and *-m, *-s and *-t respectively in the past, so the present endings can be derived from the past endings by the addition of an *-i element. This *-i element has been hypothesised to be have originally been a particle indicating present tense; it’s called the hic et nunc (‘here and now’) particle. I don’t know how the other endings are accounted for though.

Reference

Ringe, D., 2006. From Proto-Indo-European to Proto-Germanic: A Linguistic History of English: Volume I: A Linguistic History of English (Vol. 1). Oxford University Press.

Dirichlet’s approximation theorem

The definition of rational numbers is usually expressed as follows.

Definition 1 For every real number {x}, {x} is rational if and only if there are integers {p} and {q} such that {q \ne 0} and {x = p/q}.

Remark 1 For every pair of integers {p} and {q} such that {q \ne 0}, {p/q = (p'/\gcd(p, q))/(|q|/\gcd(p, q))}, where {p' = p} if {q > 0} and {p' = -p} if {q < 0}. Therefore, the definition which is the same as Definition 1 except that {q} is required to be positive and {p} is required to be coprime to {q} is equivalent to Definition 1.

However, there’s a slightly different way one can express the definition, which uses the fact that the equations {x = p/q} and {q x = p} are equivalent.

Definition 2 For every real number {x}, {x} is rational if and only if there is a nonzero integer {q} such that {q x} is an integer.

Remark 2 The definition which is the same as Definition 3 except that {q} is required to be positive and {q x} is required to be coprime to {q} is equivalent to Definition 3.

The nice thing about Definition 3 is that it immediately brings to mind the following algorithm for verifying that a real number {x} is rational: iterate through the positive integers in ascending order, and for each positive integer {q} check whether {q x} is an integer. (It’s assumed that it is easy to check whether an arbitrary real number is an integer.) If it is an integer, stop the iteration. The algorithm terminates if and only if {x} is rational. The algorithm is obviously not very useful if it is actually used by a computer to check for rationality—one obvious problem is that it cannot verify irrationality, it can only falsify it. But it is useful as a guuide to thought. Mathematical questions are often easier to think about if they are understood in terms of processes, rather than in terms of relationships between static objects.

In particular, there’s a natural way in which some irrational numbers can be said to be “closer to rational” than others, in terms of this algorithm. If {x} is irrational, then none of the terms in the sequence {\langle x, 2 x, 3 x, \dotsc \rangle} are integers. But how close to integers are the terms? The closer they are to integers, the closer to rational {x} can be said to be.

But how is the closeness of the integers to the terms of the sequence to be measured? There are different ways this can be done. Perhaps the most natural way to start off with is to measure it by the minimum of the distances of the terms from the closest integers to them—that is, the minimum of the set {\{|q x - p|: p \in \mathbf Z, q \in \mathbf N\}}. Of course, this minimum may not even exist—it may be possible to make {|q x - p|} arbitrarily small by choosing appropriate integers {p} and {q} such that {q > 0}. So the first question to answer is this: for which values of {x} does the minimum exist?

The answer to this question is given by Dirichlet’s approximation theorem.

Theorem 3 (Dirichlet’s approximation theorem) For every real number {x} and every positive integer {n}, there are integers {p} and {q} such that {q > 0} and

\displaystyle  |q x - p| < \frac 1 n. \ \ \ \ \ (1)

Proof: First, let us define some notation. For every real number {x}, let {[x]} denote the greatest integer less than or equal to {x} and let {\{x\}} denote {x - [x]}. Note that the inequality {0 \le \{x\} < 1} always holds.

Now, suppose {x} is a real number and {n} is a positive integer. The {n + 1} real numbers 0, {\{x\}}, {\{2 x\}}, … and {\{n x\}} are all in the half-open interval {I = [0, 1)}. This interval can be partitioned into the {n} sub-intervals {I_1 = [0, 1/n}, {I_2 = [1/n, 2/n)}, \dots and {I_n = [1 - 1/n, 1)}, each of length {1/n}. These {n + 1} real numbers are distributed among these {n} sub-intervals, and since there are more real numbers than sub-intervals at least one of the sub-intervals contains more than one of the real numbers. That is, there are integers {r} and {s} such that {\{r x\}} and {\{s x\}} are in the same sub-interval and hence {|\{r x\} - \{s x\}| < 1/n}. Or, equivalently:

\begin{array}{rcl} 1/n &>& \|\{r x\} - \{s x\}| \\  &=& |(r x - [r x]) - (s x - [s x])| \\  &=& |(r - s) x - ([r x] - [s x])|, \end{array}

so if we let {q = r - s} and {p = [r x] - [s x]} we have {|q x - p| < 1/n}. And {r} and {s} can be chosen so that {r < s} and hence {q} is positive. \Box

Dirichlet’s approximation theorem says that for every real number {x}, {q x - p} can be made arbitrarily small by choosing appropriate integers {p} and {q} such that {q > 0}, and hence the minimum of the set {\{|q x - p|: p \in \mathbf Z, q \in \mathbf N\}} does not exist.

It may not be immediately obvious from the way in which it has been presented here why Dirichlet’s approximation theorem is called an “approximation theorem”. The reason is that if the inequality {|q x - p| < 1/n} is divided through by {q} (which produces an equivalent inequality, given that {q} is positive), the result is

\displaystyle  \left| x - \frac p q \right| < \frac 1 {n q}. \ \ \ \ \ (2)

So Dirichlet’s approximation theorem can also be interpreted as saying that for every real number {x} and every positive integer {n}, it is possible to find a rational approximation {p/q} to {x} (where {p} and {q} are integers and {q > 0}) whose error is less than {1/(nq)}. In fact, this is how the theorem is usually presented. When it’s presented in this way, Dirichlet’s approximation theorem can be seen as an addendum to the fact that for every positive integer {n}, it is possible to find a rational approximation {p/q} to {x} whose error is less than {1/n}—that is, arbitrarily small rational approximations exist to every real number. (This is very easily proven—it’s really just another way of expressing the fact that the set of all rational numbers, {\mathbf Q}, is dense in the set of all real numbers, {\mathbf R}.) After obtaining that result, one might naturally think, “well, in this sense all real numbers are equally well approximable by rational numbers, but perhaps if I make the condition more strict by adding a factor of {1/q} into the quantity the error has to be less than, I can uncover some interesting differences in the rational approximability of different real numbers.” But the relevance of Dirichlet’s approximation theorem can also be understood in a more direct way, and that’s what I wanted to show with this post.

Of course putting this extra factor in doesn’t lead to the discovery of any interesting differences in the rational approximability of of different real numbers. In order to get to the interesting differences, you have to add in yet another factor of {1/q}. A real number {x} is said to be well approximable if and only if for every positive integer {n}, there are integers {p} and {q} such that {q > 0} and

\displaystyle  |q x - p| < \frac 1 {n q}, \ \ \ \ \ (3)

or, equivalently,

\displaystyle  \left| x - \frac p q \right| < \frac 1 {n q^2}. \ \ \ \ \ (4)

Otherwise, {x} is said to be badly approximable. Some real numbers are well approximable, and some are badly approximable.

There is in fact a very neat characterisation of the distinction in terms of continued fractions. The real numbers that are well approximable are precisely those that have arbitrarily large terms in their continued fraction expansion. For example, {e} is well-approximable because its continued fraction expansion is

\displaystyle  [2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, \dotsc]

(Note that the pattern only appears from the third term onwards, so it’s really {(e - 2)/(3 - e)} that has the interesting continued fraction expansion.) Every multiple of 2 appears in this continued fraction expansion, so there are arbitrarily large terms. The real numbers that are badly approximable, on the other hand, are those that have a maximum term in their continued fraction expansion. They include all quadratic irrational numbers (since those numbers have continued fraction expansions which are eventually periodic), as well as others. For example, the real number with the continued fraction expansion

\displaystyle  [1, 2, 1, 1, 2, 1, 1, 1, 2, \dotsc]

is badly approximable. This distinction is the topic of the final year project I’m currently doing for my mathematics course at university.

I guess it would be possible to motivate the well approximable-badly approximable distinction in a similar way: note that a real number {x} is rational if and only if there is an integer {q} such that {q^2 x} is an integer divisible by {q}, and then go on to say that the closeness of rationality of an irrational number {x} can be judged by how close the terms of the sequence {\langle x, 4 x, 9 x, \dotsc \rangle} are to integers that are multiples of {q}. The well approximable numbers would be those for which there exist terms of the sequence arbitrarily close to integers. Of course, this is a lot more contrived.

Status update

You might have noticed that I hadn’t posted on this blog for about two months before yesterday. This is partly because I’ve been trying not to slack off too much at university, and partly because I’ve been posting a lot of stuff on my Tumblr. My writing there is generally more short-form and of less lasting value, but some of it might be of interest to readers of this blog—I’ve made a list of some of the past posts that I consider more worthwhile here. I’d also add to that list the following more recent posts (all linguistics-related):