Category Archives: Linguistics

Voles and Orkney

What do voles and Orkney have to do with one another? One thing somebody knowledgeable about British wildlife might be able to tell you is that Orkney is home to a unique variety of the common European vole (Microtus arvalis) called the Orkney vole.

The most remarkable thing about the Orkney vole is that the common European vole isn’t found anywhere else in the British Isles, nor in Scandinavia—it’s a continental European animal. That raises the question of how a population of them ended up in Orkney. During the last ice age, Orkney was covered by a glacier and would have been uninhabitable by voles; and after the ice retreated, Orkney was separated from Great Britain straight away; there were never any land bridges that would have allowed voles from Great Britain to colonize Orkney. Besides, there is no evidence that M. arvalis was ever present on Great Britain, nor is there any evidence that voles other than M. arvalis were ever present on Orkney; none of the three species that inhabit Great Britain today (the field vole, Microtus agrestis, the bank vole, Myodes glareolus, and the water vole, Arvicola amphibius) were able to colonize Orkney, even though they were able to colonize some islands that were originally connected to Great Britain by land bridges (Haynes, Jaarola & Searle, 2003). The only plausible hypothesis is that the Orkney voles were introduced into Orkney by humans.

But if the Orkney voles were introduced, they were introduced at a very early date—the earliest discovered Orkney vole remains have been carbon-dated to ca. 3100 BC (Martínkova et al., 2013)—around the same time Skara Brae was first occupied, to put that in context. The only other mammals on the British Isles known to have been introduced at a similarly ancient date or earlier are the domestic dog and the domestic bovines (cattle, sheep, goats)—even the house mouse is not known to have been present before c. 500 BC (Montgomery, 2014)! The motivation for the introduction remains mysterious—voles might have been transported accidentally in livestock fodder imported from the Continent, or they might have been deliberately introduced as pets, food sources, etc.; we can only speculate. It’s interesting to note that the people of Orkney at this time seem to have been rather influential, as they introduced the Grooved Ware pottery style to other parts of the British Isles.

Anyway, there is in fact another interesting connection between voles and Orkney, which has to do with the word ‘vole’ itself. Something you might be aware of if you’ve looked at old books on British wildlife is that ‘vole’ is kind of a neologism. Traditionally, voles were not thought of as a different sort of animal from mice and rats. The relatively large animal we usually call the water vole today, Arvicola amphibius, was called the ‘water rat’ (as it still is sometimes today), or less commonly the ‘water mouse’. The smaller field vole, Microtus agrestis, was often just the ‘field mouse’, not distinguished from Apodemus sylvaticus, although it was sometimes distinguished as the ‘water mouse’ or the ‘short-tailed field mouse’ (as opposed to the ‘long-tailed field mouse’ A. sylvaticus—if you’ve ever wondered why people still call A. sylvaticus the ‘long-tailed field mouse’, even though its tail isn’t much longer than that of other British mice, that’s probably why!) The bank vole, Myodes glareolus, seems not to have been distinguished from the field vole before 1832 (the two species are similar in appearance, one distinction being that whereas the bank vole’s tail is about half its body length, the field vole’s tail is about 30% to 40% of its body length).

As an example, a reference to a species of vole as a ‘mouse’ can be found in the 1910 edition of the Encyclopedia Britannica:

The snow-mouse (Arvicola nivalis) is confined to the alpine and snow regions. (vol. 1, p. 754, under “Alps”)

Today that would be ‘the snow vole (Chionomys nivalis)’.

A number of other small British mammals were traditionally subsumed under the ‘mouse’ category, namely:

  • Shrews, which were often referred to as shrewmice from the 16th to the 19th centuries, although ‘shrew’ on its own is the older word (it is attested in Old English, but its ultimate origin is unknown).
  • Bats, which in older language could also be referred to by a number of whimsical compound words, the oldest and most common being rearmouse, from a now-obsolete verb meaning ‘stir’, but also rattlemouse, flindermouse, flickermouse, flittermouse and fluttermouse. The word rearmouse is still used today in the strange language of heraldry.
  • And, of course, dormice, which are still referred to by a compound ending in ‘-mouse’, although we generally don’t think of them as true mice today. The origin of the ‘dor-‘ prefix is uncertain; the word is attested first in c. 1425. There was an Old English word sisemūs for ‘dormouse’ whose origins are similarly mysterious, but the -mūs element is clearly ‘mouse’.

There is still some indeterminacy about the boundaries of the ‘mouse’ category when non-British rodent species are included: for example, are birch mice mice?

So, where did the word ‘vole’ come from? Well, according to the OED, it was first used in a book called History of the Orkney Islands (available from, published in 1805 and written by one George Barry, who was not a native of Orkney but a minister who preached there. In a list of the animals that inhabit Orkney, we find the following entry (alongside entries for the Shrew Mouse ſorex araneus, the [unqualified] Mouse mus muſculus, and the [unqualified] Field Mouse mus sylvaticus):

The Short-tailed Field Mouse, (mus agreſtis, Lin. Syſt.) which with us has the name of the vole mouſe, is very often found in marſhy grounds that are covered with moſs and ſhort heath, in which it makes roads or tracks of about three inches in breadth, and ſometimes miles in length, much worn by continual treading, and warped into a thouſand different directions. (p. 320)

So George Barry knew vole mouse as the local, Orkney dialectal word for the Orkney vole, which he was used to calling a ‘short-tailed field mouse’ (evidently he wasn’t aware that the Orkney voles were actually of a different species from the Scottish M. agrestis—I don’t know when the Orkney voles’ distinctiveness was first identified). Now, given that vole mouse was an Orkney dialect word, its further etymology is straightforward: the vole element is from Old Norse vǫllr ‘field’ (cf. English wold, German Wald ‘forest’), via the Norse dialect once spoken in Orkney and Shetland (sometimes known as ‘Norn’). So the Norse, like the English, thought of voles as ‘field mice’. The word vole is therefore the only English word I know, that isn’t about something particularly to do with Orkney or Shetland, that has been borrowed from Norn.

Of course, Barry only introduced vole mouse as a Orcadianism; he wasn’t proposing that the word be used to replace ‘short-tailed field mouse’. The person responsible for that seems to have been the author of the next quotation in the OED, from an 1828 book titled A History of British Animals by University of Edinburgh graduate John Fleming (available from On p. 23, under an entry for the genus Arvicola, Fleming notes that

The species of this genus differ from the true mice, with which the older authors confounded them, by the superior size of the head, the shortness of the tail, and the coarseness of the fur.

He doesn’t explain where he got the name vole from, nor does he seem to reference Barry’s work at all, but he does list alternative common names of each of the two vole species he identifies. The species Arvicola aquatica, which he names the ‘Water Vole’ for the first time, is noted to also be called the ‘Water Rat’, ‘Llygoden y dwfr’ (in Welsh) or ‘Radan uisque’ (in Scottish Gaelic). The species Arvicola agrestis, which he names the ‘Field Vole’ for the first time, is noted to be also called the ‘Short-tailed mouse’, ‘Llygoden gwlla’r maes’ (in Welsh), or “Vole-mouse in Orkney”.

Fleming also separated the shrews, bats and dormice from the true mice, thus establishing division of the British mammals into basic one-word-labelled categories that we are familiar with today. With respect to the other British mammals, the naturalists seem to have found the traditional names to be sufficiently precise: for example, each of the three quite similar species of the genus Mustela has its own name—M. erminea being the stoat, M. nivalis being the weasel, and M. putorius being the polecat.

Fleming still didn’t distinguish the field vole and the bank vole; that innovation was made by one Mr. Yarrell in 1832, who exhibited specimens of each to the Zoological Society, demonstrated their distinctiveness and gave the ‘bank vole’ (his coinage) the Latin name Arvicola riparia. It was later found that the British bank vole was the same species as a German one described by von Schreber in 1780 as Clethrionomys glareolus, and so that name took priority (and just recently, during the 2010s, the name Myodes has come to be favoured for the genus over Clethrionomys—I don’t know why exactly).

In the report of Yarrell’s presentation in the Proceedings of the Zoological Society the animals are referred to as the ‘field Campagnol‘ and ‘bank Campagnol‘, so the French borrowing campagnol (‘thing of the field’, still the current French word for ‘vole’) seems to have been favoured by some during the 19th century, although Fleming’s recognition of voles as distinct from mice was universally accepted. The word ‘vole’ was used by other authors such as Thomas Bell in A History of British Quadrupeds including the Cetacea (1837), and eventually the Orcadian word seems to have prevailed and entered ordinary as well as naturalists’ usage.


Haynes, S., Jaarola, M., & Searle, J. B. (2003). Phylogeography of the common vole (Microtus arvalis) with particular emphasis on the colonization of the Orkney archipelago. Molecular Ecology, 12, 951–956.

Martínkova, N., Barnett, R., Cucchi, T., Struchen, R., Pascal, M., Pascal, M., Fischer, M. C., Higham, T., Brace, S., Ho, S. Y. W., Quéré, J., O’Higgins, P., Excoffier, L., Heckel, G., Rus Hoelzel, A., Dobney, K. M., & Searle, J. B. (2013). Divergent evolutionary processes associated with colonization of offshore islands. Molecular Ecology, 22, 5205–5220.

Montgomery, W. I., Provan, J., Marshal McCabe, A., & Yalden, D. W. (2014). Origin of British and Irish mammals: disparate post-glacial colonisation and species introductions. Quaternary Science Reviews, 98, 144–165.

Some of the phonological history of English vowels, illustrated by failed rhymes in English folk songs


  • ModE = Modern English (18th century–present)
  • EModE = Early Modern English (16th–17th centuries)
  • ME = Middle English (12th–15th centuries)
  • OE = Old English (7th–11th centuries)
  • OF = Old French (9th–14th centuries)

All of this information is from the amazingly comprehensive book English Pronunciation, 1500–1700 (Volume II) by E. J. Dobson, published in 1968, which I will unfortunately have to return to the library soon.

The transcriptions of ModE pronunciations are not meant to reflect any particular accent in particular but to provide enough information to allow the pronunciation in any particular accent to be deduced given sufficient knowledge about the accent.

I use the acute accent to indicate primary stress and the grave accent to indicate secondary stress in phonetic transcriptions. I don’t like the standard IPA notation.

Oh, the holly bears a blossom
As white as the lily flower
And Mary bore sweet Jesus Christ
To be our sweet saviour
— “The Holly and the Ivy”, as sung by Shirley Collins and the Young Tradition)

In ModE flower is [fláwr], but saviour is [séjvjər]; the two words don’t rhyme. But they rhymed in EModE, because saviour was pronounced with secondary stress on its final syllable, as [séjvjə̀wr], while flower was pronounced [flə́wr].

The OF suffix -our (often spelt -or in English, as in emperor and conqueror) was pronounced /-ur/; I don’t know if it was phonetically short or long, and I don’t know whether it had any stress in OF, but it was certainly borrowed into ME as long [-ùːr] quite regularly, and regularly bore a secondary stress. In general borrowings into ME and EModE seem to have always been given a secondary stress somewhere, in a position chosen so as to minimize the number of adjacent unstressed syllables in the word. The [-ùːr] ending became [-ə̀wr] by the Great Vowel Shift in EModE, and then would have become [-àwr] in ModE, except that it (universally, as far as I know) lost its secondary stress.

English shows a consistent tendency for secondary stress to disappear over time. Native English words don’t generally have secondary stress, and you could see secondary stress as a sort of protection against the phonetic degradation brought about by English’s native vowel reduction processes, serving to prevent the word from getting too dissimilar from its foreign pronunciation too quickly. Eventually, however, the word (or really suffix, in this case, since saviour, emperor and conqueror all develop in the same way) gets fully nativized, which means loss of the secondary stress and concomitant vowel reduction. According to Dobson, words probably acquired their secondary stress-less variants more or less immediately after borrowing if they were used in ordinary speech at all, but educated speech betrays no loss of secondary stress until the 17th century (he’s speaking generally here, not just about the [-ə̀wr] suffix. Disyllabic words were quickest to lose their secondary stresses, trisyllabic words (such as saviour) a bit slower, and in words with more than three syllables secondary stress often survives to the present day (there are some dialect differences, too: the suffix -ary, as in necessary, is pronounced [-ɛ̀ri] in General American but [-əri] in RP, and often just [-ri] in more colloquial British English).

The pronunciation [-ə̀wr] is recorded as late as 1665 by Owen Price (The Vocal Organ). William Salesbury (1547–1567) spells the suffix as -wr in Welsh orthography, which could reflect a pronunciation [-ùːr] or [-ur]; the former would be the result of occasional failure of the Great Vowel Shift before final [r] as in pour, tour, while the latter would be the probable initial result of vowel reduction. John Hart (1551–1570) has [-urz] in governors. So the [-ə̀wr] pronunciation was in current use throughout the 17th century, although the reduced forms were already being used occasionally in Standard English during the 16th. Exactly when [-ə̀wr] became obsolete, I don’t know (because Dobson doesn’t cover the ModE period).

Bold General Wolfe to his men did say
Come lads and follow without delay
To yonder mountain that is so high
Don’t be down-hearted
For we’ll gain the victory
— “General Wolfe” as sung by the Copper Family

Our king went forth to Normandy
With grace and might of chivalry
The God for him wrought marvelously
Wherefore England may call and cry
— “Agincourt Carol” as sung by Maddy Prior and June Tabor

This is another case where loss of secondary stress is the culprit. The words victory, Normandy and chivalry are all borrowings of OF words ending in -ie /-i/. They would therefore have ended up having [-àj] in ModE, like cry, had it not been for the loss of the secondary stress. For the -y suffix this occurred quite early in everyday speech, already in late ME, but the secondarily stressed variants survived to be used in poetry and song for quite a while longer. Alexander Gil’s Logonomia Anglica (1619) explicitly remarks that pronouncing three-syllable, initially-stressed words ending in -y with [-ə̀j] is something that can be done in poetry but not in prose. Dobson says that apart from Gil’s, there are few mentions of this feature of poetic speech during the 17th century; we can perhaps take this an indication that it was becoming unusual to pronounce -y as [-ə̀j] even in poetry. I don’t know exactly how long the feature lasted. But General Wolfe is a folk song whose exact year of composition can be identified—1759, the date of General Wolfe’s death—so the feature seems to have been present well into the 18th century.

They’ve let him stand till midsummer day
Till he looked both pale and wan
And Barleycorn, he’s grown a beard
And so become a man
— “John Barleycorn” as sung by The Young Tradition

In ModE wan is pronounced [wɒ́n], with a different vowel from man [man]. But both of them used to have the same vowel as man; in wan the influence of the preceding [w] resulted in rounding to an o-vowel. The origins of this change are traced by Dobson to the East of England during the 15th century. There is evidence of the change from the Paston Letters (a collection of correspondence between members of the Norfolk gentry between 1422 and 1509) and the Cely Papers (a collection of correspondence between wealthy wool merchants owning estates in Essex between 1475 and 1488); the Cely Papers only exhibit the change in the word was, but the change is more extensive in the Paston Letters and in fact seems to have applied before the other labial consonants [b], [f] and [v] too for these letters’ writers.

There is no evidence of the change in Standard English until 1617, when Robert Robinson in The Art of Pronunciation notes that was, wast (as in thou wast) and what have [ɒ́] rather than [á]. The restriction of the change to unstressed function words initially, as in the Cely Papers suggests the change did indeed spread from the Eastern dialects. Later phoneticians during the 17th century record the [ɒ́] pronunciation in more and more words, but the change is not regular at this point; for example, Christopher Cooper (1687) has [ɒ́] in watch but not in wan. According to Dobson, relatively literary words such as wan and quality, not often used in everyday speech, did not reliably have [ɒ́] until the late 18th century.

Note that the change also applied after [wr] in wrath, and that words in which a velar consonant ([k], [g] or [ŋ]) followed the vowel were regular exceptions (cf. wax, wag, twang).

I’ll go down in some lonesome valley
Where no man on earth shall e’er me find
Where the pretty little small birds do change their voices
And every moment blows blusterous winds
— “The Banks of the Sweet Primroses” as sung by the Copper family

The expected ModE pronunciation of OE wind ‘wind’ would be [wájnd], resulting in homophony with find. Indeed, as far as I know, every other monosyllabic word with OE -ind has [-ájnd] in Modern English (mind, grind, bind, kind, hind, rind, …), resulting from an early ME sound change that lengthened final-syllable vowels before [nd] and various other clusters containing two voiced consonants at the same place of articulation (e.g. [-ld] as in wild).

It turns out that [wájnd] did use to be the pronunciation of wind for a long time. The OED entry for wind, written in the early 20th century, actually says that the word is still commonly taken to rhyme with [-ajnd] by “modern poets”; and Bob Copper and co. can be heard pronouncing winds as [wájndz] in their recording of “The Banks of the Sweet Primroses”. The [wínd] pronunciation reportedly became usual in Standard English only in the 17th century. It is hypothesized to be a result of backformation from the derivatives windy and windmill, in which lengthening never occurred because the [nd] cluster was not in word-final position. It is unlikely to be due to avoidance of homophony with the verb wind, because the words spent several centuries being homophonous without any issues arising.

Meeting is pleasure but parting is a grief
And an inconstant lover is worse than a thief
A thief can but rob me and take all I have
But an inconstant lover sends me to the grave
— “The Cuckoo”, as sung by Anne Briggs

As the spelling suggests, the word have used to rhyme with grave. The word was confusingly variable in form in ME, but one of its forms was [haːvə] (rhyming with grave) and another one was [havə]. The latter could have been derived from the former by vowel reduction when the word was unstressed, but this is not the only possible sources of it (e.g. another one would be analogy with the second-person singular form hast, where the a was in a closed open syllable and therefore would have been short); there does not seem to be any consistent conditioning by stress in the forms recorded by 16th- and 17th-century phoneticians, who use both forms quite often. There are some who have conditioning by stress, such as Gil, who explicitly describes [hǽːv] as the stressed form and [hav] as the unstressed form. I don’t know how long [hǽːv] (and its later forms, [hɛ́ːv], [héːv], [héjv]) remained a variant usable in Standard English, but according to the Traditional Ballad Index, “The Cuckoo” is attested no earlier than 1769.

Now the day being gone and the night coming on
Those two little babies sat under a stone
They sobbed and they sighed, they sat there and cried
Those two little babies, they laid down and died
— “Babes in the Wood” as sung by the Copper family

In EModE there was occasional shortening of stressed [ɔ́ː], so that it developed into ModE [ɒ́] rather than [ów] as normal. It is a rather irregular and mysterious process; examples of it which have survived into ModE include gone (< OE ġegān), cloth (< OE clāþ) and hot (< OE hāt). The 16th- and 17th-century phoneticians record many other words which once had variants with shortening that have not survived to the present-day, such as both, loaf, rode, broad and groat. Dobson mentions that Elisha Coles (1675–1679) “knew some variant, perhaps ŏ in stone“; the verse from “Babes in the Wood” above would be additional evidence that stone at some point by some people was pronounced as [stɒn], thus rhyming with on. As far as I know, there is no way it could have been the other way round, with on having [ɔ́ː]; the word on has always had a short vowel.

“So come riddle to me, dear mother,” he said
“Come riddle it all as one
Whether I should marry with Fair Eleanor
Or bring the brown girl home” (× 2)

“Well, the brown girl, she has riches and land
Fair Eleanor, she has none
And so I charge you do my bidding
And bring the brown girl home” (× 2)
— “Lord Thomas and Fair Eleanor” as sung by Peter Bellamy

In “Lord Thomas and Fair Eleanor”, the rhymes on the final consonant are often imperfect (although the consonants are always phonetically similar). These two verses, however, are the only ones where the vowels aren’t the same in the modern pronunciation—and there’s good reason to think they were the same once.

The words one and none are closely related. The OE word for ‘one’ was ān; the OE word for ‘none’ was nān; the OE word for ‘not’ was ne; the second is simply the result of adding the third as a prefix to the first: ‘not one’.

OE ā normally becomes ME [ɔ́ː] and then ModE [ów] in stressed syllables. If it had done that in one and none, it’d be a near-rhyme with home today, save for the difference in the final nasals’ places of articulation. Indeed, in only, which is a derivative of one with the -ly suffix added, we have [ów] in ModE. But the standard ModE pronunciations of one and none are [wʌ́n] and [nʌ́n] respectively. There are also variant forms [wɒ́n] and [nɒ́n] widespread across England. How did this happen? As usual, Dobson has answers.

The [nɒ́n] variant is the easiest one to explain, at least if we consider it in isolation from the others. It’s just the result of sporadic [ɔ́ː]-shortening before [n], as in gone (see above on the onstone rhyme). As for [nʌ́n]—well, ModE [ʌ] is the ordinary reflex of short ME [u], but there is a sporadic [úː]-shortening change in EModE besides the sporadic [ɔ́ː]-shortening one. This change is quite common and reflected in many ModE words such as blood, flood, good, book, cook, wool, although I don’t think there are any where it happens before n. So perhaps [nɔ́ːn] underwent a shift to [nóːn] somehow during the ME period, which would become [núːn] by the Great Vowel Shift. As it happens there is some evidence for such a shift in ME from occasional rhymes in ME texts, such as hoom ‘home’ with doom ‘doom’ and forsothe ‘forsooth’ with bothe ‘bothe’ in the Canterbury Tales. However, there is especially solid evidence for it in the environment after [w], in which environment most instances of ME [ɔ́ː] exhibit raising that has passed into Standard English (e.g. who < OE hwā, two < OE twā, ooze < OE wāse; woe is an exception in ModE, although it, too, is listed as a homophone of woo occasionally by Early Modern phoneticians). Note that although all these examples happen to have lost the [w], presumably by absorption into the following [úː] after the Great Vowel Shift occurred, there are words such as womb with EModE [úː] which have retained their [w], and phoneticians in the 16th and 17th centuries record pronunciations of who and two with retained [w]. So if ME [ɔ́ːn] ‘one’ somehow became [wɔ́ːn], and then raising to [wóːn] occurred due to the /w/, then this vowel would be likely to spread by analogy to its derivative [nɔ́ːn], allowing for the emergence of [wʌ́n] and [nʌ́n] in ModE. The ModE [wɒ́n] and [nɒ́n] pronunciations can be accounted for by assuming the continued existence of an un-raised [wɔ́ːn] variant in EModE alongside [wuːn].

As it happens there is a late ME tendency for [j] to be inserted before long mid front vowels and, a little less commonly, for [w] to be inserted before word-initial long mid back vowels. This glide insertion only happened in initial syllables, and usually only when the vowel was word-initial or the word began with [h]; but there are occasional examples before other consonants such as John Hart’s [mjɛ́ːn] for mean. The Hymn of the Virgin (uncertain date, 14th century), which is written in Welsh orthography and therefore more phonetically transparent than usual, evidences [j] in earth. John Hart records [j] in heal and here, besides mean, and [w] in whole (< OE hāl). 17th-century phoneticians record many instances of [j]- and [w]-insertion, giving spellings such as yer for ‘ere’, yerb for ‘herb’, wuts for ‘oats’ (this one also has shortening)—but they frequently condemn these pronunciations as “barbarous”. Christopher Cooper (1687) even mentions a pronunciation wun for ‘one’, although not without condemning it for its barbarousness. The general picture seems to be that glide insertion was widespread in dialects, and filtered into Standard English to some degree during the 16th century, but there was a strong reaction against it during the 17th century and it mostly disappeared—except, of course, in the word one, which according to Dobson the [wʌ́n] pronunciation becomes normal for around 1700. The [nʌ́n] pronunciation for ‘none’ is first recorded by William Turner in The Art of Spelling and Reading English (1710).

Finally, I should mention that sporadic [úː]-shortening is also recorded as applying to home, resulting in the pronunciation [hʌ́m]; and Turner has this pronunciation, as do many English traditional dialects. So it’s possible that the rhyme in “Lord Thomas and Fair Eleanor” is due to this change having applied to home, rather than preservation of the conservative [-ówn] forms of one and none.

Modelling communication systems

One of the classes I’m taking this term is about modelling the evolution of communication systems. Everything in the class is done via simulation, which is probably the best way to do it, and certainly necessary at the point where it starts to involve genetic algorithms and such. However, some of the earlier content in the class dealt with problems that I suspected were solvable by a purely mathematical approach, so as somebody with a maths degree I felt it necessary to rise to the challenge and try to derive the solutions mathematically. This post is my attempt to do that.

Let us begin by thinking very abstractly about a system which takes something in and gives something out. Suppose there is a finite, positive number m of things which may be taken in (possible inputs), which we shall call input 1, input 2, … and input m. Suppose likewise that there is a finite, positive number n of things which may be given out (possible outputs), which we shall call output 1, output 2, … and output n.

One way in which the behavior of such a system could be modelled is as a straightforward mapping from inputs to outputs. However, this might be too deterministic: perhaps the system doesn’t always output the same output for a given input. So let’s use a more general model, and think of the system as a mapping from inputs to probability distributions over outputs. For every pair (i, j) of integers such that 0 ≤ im and 0 ≤ jn, let pi, j denote the probability that input i is mapped to output j. The mapping as a whole is determined by the mn probabilities of the form pi, j, and therefore it can be thought of as an m-by-n matrix A:

\displaystyle \mathbf A = \left( \begin{matrix} p_{1, 1} & p_{1, 2} & \hdots & p_{1, n} \\ p_{2, 1} & p_{2, 2} & \hdots & p_{2, n} \\ \vdots & \vdots & \ddots & \vdots \\ p_{m, 1} & p_{m, 2} & \hdots & p_{m, n} \end{matrix} \right).

The rows of A correspond to the possible inputs and the columns of A correspond to the possible outputs. Probabilities are non-negative real numbers, so A is a non-negative real matrix. Also, the probabilities of mutually exclusive, exhaustive outcomes sum to 1, so the sum of each row of A is 1. This condition can be expressed as a system of linear equations:

\displaystyle \begin{aligned} p_{1, 1} &+ p_{1, 2} &+ \hdots &+ p_{1, n} &= 1 \\ p_{2, 1} &+ p_{2, 2} &+ \hdots &+ p_{2, n} &= 1 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ p_{m, 1} &+ p_{m, 2} &+ \hdots &+ p_{m, n} &= 1. \end{aligned}

Alternatively, and more compactly, it may be expressed as the matrix equation

\displaystyle (1) \quad \mathbf A \mathbf x = \mathbf y,

where x is the n-dimensional vector whose components are all equal to 1 and y is the m-dimensional vector whose components are all equal to 1.

In general, if x is an n-dimensional vector, and we think of x as a random variable determined by the output of the system, then Ax is the vector of expected values of x conditional on each input. That is, for every integer i such that 1 ≤ im, the ith component of Ax is the expected value of x conditional on meaning i being the input to the system.

Accordingly, if we have not just one, but p n-dimensional vectors x1, x2, … and xp (where p is a positive integer), we can think of these p vectors as the columns of an n-by-p matrix B, and then we can read off all the expected values from the matrix product

\displaystyle \mathbf A \mathbf B = \mathbf A \mathbf x_1 + \mathbf A \mathbf x_2 + \dotsb + \mathbf A \mathbf x_n

like so: for every pair (i, k) of integers such that 0 ≤ im and 0 ≤ kp, the (i, k) entry of AB is the expected value of xk conditional on meaning i being the input to the system.

In the case where B happens to be another non-negative real matrix such that

\displaystyle \mathbf B \mathbf x = \mathbf y,

so that the entries of B can be interpreted as probabilities, the matrix B as a whole can be interpreted as another input-output system whose possible inputs happen to be the same as the possible outputs of A. In order to emphasize this identity, let us now call the possible outputs of A (= the possible inputs of B) the signals: signal 1, signal 2, … and signal n. The other things—the possible inputs of A, and the possible outputs of B—can be thought of as meanings. Note that there is no need at the moment for the input meanings (the possible inputs of A) to be the same as the output meanings (the possible outputs of B); we make a distinction between the input meanings and the output meanings.

Together, A and B can be thought of as comprising a “product system” which works like this: an input meaning goes into A, a signal comes out of A, the signal goes into B, and an output meaning comes out of B. For every integer k such that 0 ≤ kp, the random variable xk (the kth column of B) can now be interpreted as the probability of the product system outputting output meaning k, as a random variable whose value is determined by the signal. That is, for every integer j such that 0 ≤ jn, the jth component of xk (the (j, k) entry of B) is the probability of output meaning k coming out if the signal happens to be signal j. It follows by the law of total probability that the probability of output meaning k coming out, if i is the input meaning, is the expected value of xk conditional on i being the input meaning. Now, by what we said a couple of paragraphs above, we have that for every integer i such that 0 ≤ im, the expected value of xk conditional on i being the input meaning is the (i, k) entry of AB. So the “product system”, as a matrix, is the matrix product AB. That’s why we call it the “product system”, see? 🙂

In the case where the possible input meanings are the same as the possible output meanings and m = p, we may think about the “product system” as a communicative dyad. The speaker is A, the hearer is B. The speaker is trying to express a meaning, the input meaning, and producing a signal in order to do so, and the hearer is interpreting that signal to have some meaning, the output meaning. The output meaning the hearer understands is not necessarily the same as the input meaning the speaker was trying to express. If it is different, we may regard the communication as unsuccessful; if it is the same, we may regard the communication as successful.

The key question is: what is the probability that the communication is successful? Given the considerations above, it’s very easy to answer. If the input meaning is i, we’re just looking for the probability that output meaning i given this input meaning. That probability is simply the (i, i) entry of AB, i.e. the ith entry along AB‘s main diagonal.

What if the input meaning isn’t fixed? Then the answer will in general depend on the probability distribution over the possible input meanings. But in the simplest case, where the distribution is uniform (no input meaning is any more probable than any other), the probability of successful communication is just the mean of the input meaning-specific probabilities, that is, the sum of the main diagonal entries of AB, divided by m (the number of the main diagonal entries, i.e. the number of meanings). In linear algebra, we call the sum of the main diagonal entries of a square matrix its trace, and we denote it by tr(C) where C is the matrix. So our formula for the communication success probability p is

\displaystyle (2) \quad p = \frac {\mathrm{tr}(\mathbf A \mathbf B)} m.

If the probability distribution over the input meanings isn’t uniform, the probability of successful communication is just the weighted average of the input meaning-specific probabilities, with the weights being the respective input meaning probabilities. The general formula can therefore be written as

(3) \quad \displaystyle p = \mathrm{tr}(\mathbf A \mathbf B \mathbf D) = \mathrm{tr}(\mathbf D \mathbf A \mathbf B)

where D is the diagonal matrix of size m whose main diagonal is the probability distribution over the input meanings (i.e. for every integer i such that 0 ≤ im, the ith diagonal entry of D is the probability of input meaning i being the one the speaker tries to express). It doesn’t matter whether D is left-multiplied or right-multiplied, because the trace of the product is the same in either case. In the case where the probability distribution over the input meanings is uniform the diagonal entries of D are all equal to 1/m, i.e \mathbf D = \mathbf I_m/m, where Im is the identity matrix of size m, and therefore (3) reduces to (2).

To leave you fully convinced that this formula works, here are some simulations. The 5 graphs below were generated using a Python script which you can view on GitHub. Each one involves 3 possible meanings, 3 possible signals, randomly-generated speaker and hearer matrices and a randomly-generated probability distribution over the input meanings. If you look at the code, you’ll see that the blue line is generated by simulating communication in the obvious way, by randomly drawing an input meaning, randomly drawing a signal based on that particular input meaning, and finally randomly drawing an output meaning based on that particular signal. The position on the x-axis corresponds to the number of trials (individual simulated communicative acts) carried out so far and the position on the y-axis corresponds to the proportion of those trials involving a successful communication (one where the output meaning ended up being the same as the input meaning). For each graph, there were 10 sets of 500 trials; each individual set of trials corresponds to one of the light blue lines, while the darker blue lines gives the results averaged over those ten sets. The horizontal green line indicates the success probability as calculated by our formula. This should be close to the success proportion for a large number of trials, so we should see the blue and green lines converging on the right side of each graph. That is what we see, so the formula works.


A very simple stochastic model of diachronic change

1. The discrete process

1.1. The problem

Consider an entity (for example, a language) which may or may not have a particular property (for example, obligatory coding of grammatical number). For convenience and interpretation-neutrality, we shall say that the entity is positive if it has this property and negative if it does not have this property. Consider the entity as it changes over the course of a number of events (for example, transmissions of the language from one generation to another) in which the entity’s state (whether it is positive or negative) may or may not change. For every nonnegative integer {n}, let {X_n} represent the entity’s state after exactly {n} events have occurred, with negativity being represented by 0 and positivity being represented by 1. The initial state {X_0} is a constant parameter of the model, but the states at other times are random variable whose “success” probabilities (i.e. values of 1 under their probability mass functions) are determined by {X_0} and the other parameters of the model.

The other parameters of the model, besides {X_0}, are denoted by {p} and {q}. These represent the probabilities that an event will change the state from negative to positive or from positive to negative, respectively. They are assumed to be constant across events—this assumption can be thought of as an interpretation of the uniformitarian principle familiar from historical linguistics and other fields. I shall call a change of state from negative to positive a gain and a change of state from positive to negative a loss, so that {p} can be thought of as the gain rate per event and {q} can be thought of as the loss rate per event.

Note that the gain resp. loss probability is {p}/{q} only if the state is negative resp. positive as the event begins. If the state is already positive resp. negative as the event begins then it is impossible for a further gain resp. loss to occur and therefore the gain resp. loss probability is 0 (but the loss resp. gain probability is {q}/{p}). Thus the random variables {X_1}, {X_2}, {X_3}, … are not necessarily independent of one another.

I am aware that there’s a name for a sequence of random variables that are not necessarily independent of one another, namely “stochastic process”. However, that is about the extent of what I know about stochastic processes. I think the thing I’m talking about in this post is a very simple example of a stochastic process–an appropriate name for it would be the gain-loss process. If you know something about stochastic processes it might seem very trivial, but it was an interesting problem for me to try to figure out knowing nothing already about stochastic processes.

1.2. The solution

Suppose {n} is a nonnegative integer and consider the state {X_{n + 1}} after exactly {n + 1} events have occurred. If the entity is negative as the {(n + 1)}th event begins, the probability of gain during the {(n + 1)}th event is {p}. If the entity is positive as the {(n + 1)}th event begins, the probability of loss during the {(n + 1)}th event is {q}. Now, as the {(n + 1)}th event begins, exactly {n} events have already occurred. Therefore the probability that the entity is negative as the {(n + 1)}th event begins is {\mathrm P(X_n = 0)} and the probability that the entity is positive as the {(n + 1)}th event begins is {\mathrm P(X_n = 1)}. It follows by the law of total probability that

\displaystyle \begin{aligned} \mathrm P(X_{n + 1} = 1) &= p (1 - \mathrm P(X_n = 1)) + (1 - q) \mathrm P(X_n = 1) \\ &= p - p \mathrm P(X_n = 1) + \mathrm P(X_n = 1) - q \mathrm P(X_n = 1) \\ &= p - (p - 1 + q) \mathrm P(X_n = 1) \\ &= p + (1 - p - q) \mathrm P(X_n = 1). \end{aligned}

This recurrence relation can be solved using the highly sophisticated method of “use it to find general equations for the first few terms in the sequence, extrapolate the pattern, and confirm that the extrapolation is valid using a proof by induction”. I’ll spare you the laborious first phrase, and just show you the second and third. The solution is

\displaystyle \begin{aligned} \mathrm P(X_n = 1 | X_0 = 0) &= p \sum_{i = 0}^{n - 1} (1 - p - q)^i, \\ \mathrm P(X_n = 1 | X_0 = 1) &= 1 - q \sum_{i = 0}^{n - 1} (1 - p - q)^i. \end{aligned}

Just so you can check that this is correct, the proofs by induction for the separate cases are given below.

Case 1 ({X_0 = 0)}. Base case. The expression

\displaystyle p \sum_{i = 0}^{n - 1} (1 - p - q)^i

evaluates to 0 if {n = 0}, because the sum is empty.

Successor case. For every nonnegative integer {n} such that

\displaystyle \mathrm P(X_n = 1 | X_0 = 0) = p \sum_{i = 0}^{n - 1} (1 - p - q)^i,

we have

\displaystyle \begin{aligned} \mathrm P(X_{n + 1} = 1 | X_0 = 0) &= p + (1 - p + q) \mathrm P(X_n = 1 | X_0 = 0) \\ &= p + (1 - p - q) p \sum_{i = 0}^{n - 1} (1 - p - q)^i \\ &= p + p (1 - p - q) \sum_{i = 0}^{n - 1} (1 - p - q)^i \\ &= p \left( 1 + \sum_{i = 0}^{n - 1} (1 - p - q)^{i + 1} \right) \\ &= p \sum_{j = 0}^n (1 - p - q)^j. \end{aligned}

Case 2 ({X_0 = 1}). Base case. The expression

\displaystyle 1 - q \sum_{i = 0}^{n - 1} (1 - p - q)^i

evaluates to 1 if {n = 0}, because the sum is empty.

Successor case. For every nonnegative integer {n} such that

\displaystyle \mathrm P(X_n = 1 | X_0 = 1) = 1 - q \sum_{i = 0}^{n - 1} (1 - p - q)^i,

we have

\displaystyle \begin{aligned} \mathrm P(X_{n + 1} = 1 | X_0 = 1) &= p + (1 - p + q) \mathrm P(X_n = 1 | X_0 = 1) \\ &= p + (1 - p - q) \left( 1 - q \sum_{i = 0}^{n - 1} (1 - p - q)^i \right) \\ &= p + 1 - p - q - (1 - p - q) q \sum_{i = 0}^{n - 1} (1 - p - q)^i \\ &= 1 - q - q (1 - p - q) \sum_{i = 0}^{n - 1} (1 - p - q)^i \\ &= 1 - q \left( 1 + \sum_{i = 0}^{n - 1} (1 - p - q)^{i + 1} \right) \\ &= 1 - q \sum_{j = 0}^n (1 - p - q)^j. \end{aligned}

I don’t know if there is any way to make sense of why exactly these equations are the way they are; if you have any ideas, I’d be interested to hear your comments. There is a nice way I can see of understanding the difference between the two cases. Consider an additional gain-loss process {B} which changes in tandem with the gain-loss process {A} that we’ve been considering up till just now, so that its state is always the opposite of that of {A}. Then the gain rate of {B} is {q} (because if {B} gains, {A} loses) and the lose rate of {B} is {p} (because if {B} loses, {A} gains). And for every nonnegative integer {n}, if we let {Y_n} denote the state of {B} after exactly {n} events have occurred, then

\displaystyle \mathrm P(Y_n = 1) = 1 - \mathrm P(X_n = 1)

because {Y_n = 1} if and only if {X_n = 0}. Of course, we can also rearrange this equation as {\mathrm P(X_n = 1) = 1 - \mathrm P(Y_n = 1)}.

Now, we can use the equation for Case 1 above, but with the appropriate variable names for {B} substituted in, to see that

\displaystyle \mathrm P(Y_n = 1 | Y_0 = 0) = q \sum_{i = 0}^{n - 1} (1 - q - p)^i,

and it then follows that

\displaystyle \mathrm P(X_n = 1 | X_0 = 1) = 1 - q \sum_{i = 0}^{n - 1} (1 - p - q)^i.

Anyway, you may have noticed that the sum

\displaystyle \sum_{i = 0}^{n - 1} (1 - p - q)^i

which appears in both of the equations for {\mathrm P(X_n = 1)} is a geometric progression whose common ratio is {1 - p - q}. If {1 - p - q = 1}, then {p + q = 0} and therefore {p = q = 0} (because {p} and {q} are probabilities, and therefore non-negative). The probability {\mathrm P(X_n = 1)} is then simply constant at 0 if {X_0 = 0} (because gain is impossible) and constant at 1 if {X_0 = 1} (because loss is impossible). Outside of this very trivial case, we have {1 - p - q \ne 1}, and therefore the geometric progression may be written as a fraction as per the well-known formula:

\displaystyle \begin{aligned} \sum_{i = 0}^{n - 1} (1 - p - q)^i &= \frac {1 - (1 - p - q)^n} {1 - (1 - p - q)} \\ &= \frac {1 - (1 - p - q)^n} {p + q}. \end{aligned}

It follows that

\displaystyle \begin{aligned} \mathrm P(X_n = 1 | X_0 = 0) &= \frac {p (1 - (1 - p - q)^n)} {p + q}, \\ \mathrm P(X_n = 1 | X_0 = 1) &= 1 - \frac {q (1 - (1 - p - q)^n)} {p + q} \\ &= \frac {p + q - q (1 - (1 - p - q)^n)} {p + q} \\ &= \frac {p + q - q + q (1 - p - q)^n} {p + q} \\ &= \frac {p + q (1 - p - q)^n} {p + q}. \end{aligned}

From these equations it is easy to see the limiting behaviour of the gain-loss process as the number of events approaches {\infty}. If {1 - p - q = -1}, then {p + q = 2} and therefore {p = q = 1} (because {p} and {q} are probabilities, and therefore not greater than 1). The equations in this case reduce to

\displaystyle \begin{aligned} \mathrm P(X_n = 1 | X_0 = 0) &= \frac {1 - (-1)^n} 2, \\ \mathrm P(X_n = 1 | X_0 = 1) &= \frac {1 + (-1)^n} 2, \end{aligned}

which show that the state simply alternates deterministically back and forth between positive and negative (because {(1 - (-1)^n)/2} is 0 if {n} is even and 1 if {n} is odd and {(1 + (-1)^n)/2} is 1 if {n} is even and 0 if {n} is odd).

Otherwise, we have {|1 - p - q| < 1} and therefore

\displaystyle \lim_{n \rightarrow \infty} (1 - p - q)^n = 0.

Now the equations for {\mathrm P(X_n = 1 | X_0 = 0)} and {\mathrm P(X_n = 1 | X_0 = 1)} above are the same apart from the term in the numerator which contains {(1 - p - q)^n} as a factor, as well as another factor which is independent of {n}. Therefore, regardless of the value of {X_0},

\displaystyle \lim_{k \rightarrow \infty} \mathrm P(X_k = 1) = \frac p {p + q}.

This is a nice result: if {n} is sufficiently large, the dependence of {X_n} on {X_0}, {X_1}, … and {X_{n - 1}} is negligible and its success probability is negligibly different from {p/(p + q)}. That it is this exact quantity sort of makes sense: it’s the ratio of the gain rate to the theoretical rate of change of state in either direction that we would get if both a gain and loss could occur in a single event.

In case you like graphs, here’s a graph of the process with {X_0 = 0}, {p = 1/100}, {q = 1/50} and 500 events. The x-axis is the number of events that have occurred and the y-axis is the observed frequency, divided by 1000, of the state being positive after this number of events has occurred (for the blue line) or the probability of the state being positive according to the equations described in this post (for the green line). If you want to, you can view the Python code that I used to generate this graph (which is actually capable of simulating multiple-trait interactions, although I haven’t tried solving it in that case) on GitHub.


2. The continuous process

2.1. The problem

Let us now consider the same process, but continuous rather than discrete. That is, rather than the gains and losses occuring over the course of a discrete sequence of events, we now have a continuous interval in time, during which at any point losses and gains might occur instantaneously. The state of the process at time {t} shall be denoted {X(t)}. Although multiple gains and losses may occur during an arbitrary subinterval, we may assume for the purpose of approximation that during sufficiently short subintervals only one gain or loss, or none, may occur, and the probabilities of gain and loss are directly proportional to the length of the subinterval. Let {\lambda} be the constant of proportionality for gain and let {\mu} be the constant of proportionality for loss. These are the continuous model’s analogues of the {p} and {q} parameters in the discrete model. Note that they may be greater than 1, unlike {p} and {q}.

2.2. The solution

Suppose {t} is a non-negative real number and {n} is a positive integer. Let {\Delta t = 1/n}. The interval in time from time 0 to time {t} can be divided up into {n} subintervals of length {\Delta t}. If {\Delta t} is small enough, so that the approximating assumptions described in the previous paragraph can be made, then the subintervals can be regarded as discrete events, during each of which gain occurs with probability {\lambda \Delta t} if the state at the start point of the subinterval is negative and loss occurs with probability {\mu \Delta t} if the state at the start point of the subinterval is positive. For every positive integer {k} between 0 and {n} inclusive, let {Y_k} denote the state of this discrete approximation of the process at time {t + k \Delta t}. Then for every integer {k} between 0 and {n} (inclusive) we have

\displaystyle \begin{aligned} \mathrm P(Y_k = 1 | Y_0 = 0) &= \frac {\lambda \Delta t (1 - (1 - \lambda \Delta t - \mu \Delta t)^k)} {\lambda \Delta t + \mu \Delta t}, \\ \mathrm P(Y_k = 1 | Y_0 = 1) &= \frac {\lambda \Delta t + \mu \Delta t (1 - \lambda \Delta t - \mu \Delta t)^k} {\lambda \Delta t + \mu \Delta t}, \end{aligned}

provided {\lambda} and {\mu} are not both equal to 0 (in which case, just as in the discrete case, the state remains constant at whatever the initial state was).

Many of the {\Delta t} factors in this equation can be cancelled out, giving us

\displaystyle \begin{aligned} \mathrm P(Y_k = 1 | Y_0 = 0) &= \frac {\lambda (1 - (1 - (\lambda + \mu) \Delta t)^k)} {\lambda + \mu}, \\ \mathrm P(Y_k = 1 | Y_0 = 1) &= \frac {\lambda + \mu (1 - (\lambda + \mu) \Delta t)^k} {\lambda + \mu}. \end{aligned}

Now consider the case where {k = n} in the limit {n} approaches {\infty}. Note that {\Delta t} approaches 0 at the same time, because {\Delta t = t/n}, and therefore the limit of {(1 - (\lambda + \mu) \Delta t)^n} is not simply 0 as in the discrete case. If we rewrite the expression as

\displaystyle \left( 1 - \frac {t (\lambda + \mu)} n \right)^n

and make the substitution {n = -mt(\lambda + \mu)}, giving us

\displaystyle \left( 1 + \frac 1 m \right)^{-mt(\lambda + \mu)} = \left( \left( 1 + \frac 1 m \right)^m \right)^{-t(\lambda + \mu)},

then we see that the limit is in fact {e^{-t(\lambda + \mu)}}, an exponential function of {t}. It follows that

\displaystyle \begin{aligned} \mathrm P(X(t) = 1 | X(0) = 0) = \lim_{n \rightarrow \infty} \mathrm P(Y_n = 1 | Y_0 = 0) &= \frac {\lambda (1 - e^{-t(\lambda + \mu)})} {\lambda + \mu}, \\ \mathrm P(X(t) = 1 | X(0) = 1) = \lim_{n \rightarrow \infty} \mathrm P(Y_n = 1 | Y_0 = 1) &= \frac {\lambda + \mu e^{-t(\lambda + \mu)}} {\lambda + \mu}. \end{aligned}

This is a pretty interesting result. I initially thought that the continuous process would just have the solution {\mathrm P(X_n = 1) = \lambda/{\lambda + \mu}}, completely independent of {X_0} and {t}, based on the idea that it could be viewed as a discrete process with an infinitely large number of events within every interval of time, so that it would constantly behave like the discrete process does in the limit as the number of events approaches infinity. In fact it turns out that it still behaves like the discrete process, with the effect of the initial state never quite disappearing—although it does of course disappear in the limit as {t} approaches {\infty}, because {e^{-t(\lambda + \mu)}} approaches 0:

\displaystyle \lim_{t \rightarrow \infty} \mathrm P(X(t) = 1) = \frac {\lambda} {\lambda + \mu}.

Greenberg’s Universal 38 and its diachronic implications

This post is being written hastily and therefore may be more incomprehensible than usual.

Greenberg’s Universal 38 says:

Where there is a case system, the only case which ever has only zero allomorphs is the one which includes among its meanings that of the subject of the intransitive verb. (Greenberg, 1963, 59)

A slightly better way to put this (more clearly clarifying what the universal says about languages that code case by means of non-concatenative morphology) might be: if a language makes a distinction between nominative/absolutive and ergative/accusative case by means of concatenative morphology, then there is always at least one ergative/accusative suffix form with nonzero phonological substance. Roughly, there’s a preference for there to be an ergative/accusative affix rather than a nominative/absolutive affix (but it’s OK if there are phonologically substantive affixes for both cases, or if ergative/accusative is zero-coded in some but not all environments).

On the other hand, Greenberg’s statement of the universal makes clear a rather interesting property of it: if you’re thinking about which argument can be zero-coded in a transitive sentence, Universal 38 actually says that it depends on what happens in an intransitive sentence: the one which can be zero-coded is the one which takes the same case that arguments in intransitive sentences take. If the language is accusative, then the nominative, the agenty argument, can be zero-coded, and the accusative, the patienty argument, can’t. If the language is ergative, then the absolutive, the patienty argument, can be zero-coded, and the ergative argument, can’t. (I mean can’t as in can’t be zero-coded in all environments.)

This is a problem, perhaps, for those who think of overt coding preferences and other phenomena related to “markedness” (see Haspelmath, 2006, for a good discussion of the meaning of markedness in linguistics) as related to the semantics of the category values in question. Agenty vs. patienty is the semantic classification of the arguments, but depending on the morphosyntactic alignment of the language, it can be either the agenty or patienty arguments which are allowed to be zero-coded. This seems like a case where Haspelmath’s preferred explanation of all phenomena related to markedness—differences in usage frequency—is much more preferable, although I don’t think he mentions it in his paper (but I might have missed it—I’m not going to check, because I’m trying to not spend too long writing this post).

Anyway, one thing I wonder about this universal (and a thing it’s generally interesting to wonder about with respect to any universal) is how it’s diachronically preserved. For it’s quite easy to imagine ways in which a language could end up in a situation where it has a zero-coded nominative/absolutive due to natural changes. Let’s say it has both cases overtly coded to start with; let’s say the nominative suffix is -ak and the accusative suffix is -an. Now final -n gets lost, with compensatory nasalization, and then vowels in absolute word-final position get elided. (That’s a perfectly natural sequence of sound changes; it happened in the history of English, cf. Proto-Indo-European *yugóm > Proto-Germanic *juką > English yoke.) The language would then end up with nominative -ak and a zero-coded accusative, thus violating Universal 38. So… well, I don’t actually know how absolute Universal 38 is, perhaps it has some exceptions (though I don’t know of any), and if there are enough exceptions we might be able to just say that it’s these kinds of developments that are responsible for the exceptions. But if the exceptions are very few, then there’s probably some way in which languages which end up with zero-coded accusatives like this are hastily “corrected” to keep them in line with the universal. Otherwise we’d expect to see more exceptions. Here’s one interesting question: how would that correction happen? It could just be that a postposition gets re-accreted or something and the accusative ends up being overtly coded once again. But it could also be that subjects of intransitive sentences start not getting the -ak suffix added to them, so that you get a shift from accusative to ergative morphosyntactic alignment, with the zero-coded accusative becoming a perfectly Universal 38-condoned zero-coded absolutive. That’d be pretty cool: a shift in morphosyntactic alignment triggered simply by a coincidence of sound change. Is any such development attested? Somebody should have it happen in a conlang family.

According to Wichmann (2009), morphosyntactic alignment is a “stable” feature which might be a problem if alignment shifts can occur in the manner described above. But then again, I wonder how common overt coding of both nominative/absolutive and ergative/accusative is, actually—most Indo-European languages that mark cases have it, but I did a quick survey of some non-IE languages with case marking, both accusative (Finnish, Hungarian, Turkish, Tamil, Quechua) and ergative (Basque, Dyirbal) and they all seem to code nominative/absolutive by zero (well, Basque codes absolutive overtly in one of its declensions, but not in the other two). If it’s pretty rare for both to be overtly coded, then this correction doesn’t have to happen very often, but it would surely need to happen sometimes if Universal 38 is absolute or close to it.


Greenberg, J. H., 1963. Some universals of grammar with particular reference to the order of meaningful elements. In Greenberg, J. H. (ed.), Universals of Language, 73–113. MIT Press.

Haspelmath, M. (2006). Against markedness (and what to replace it with). Journal of linguistics, 42(01), 25–70.

Wichmann, S. & Holman, E. W. (2009). Temporal stability of linguistic typological features. Retrieved from

That’s OK, but this’s not OK?

Here’s something peculiar I noticed the other day about the English language.

The word is (the third-person singular present indicative form of the verb be) can be ‘contracted’ with a preceding noun phrase, so that it is reduced to an enclitic form -‘s. This can happen after pretty much any noun phrase, no matter how syntactically complex:

(1) he’s here

/(h)iːz ˈhiːə/[1]

(2) everyone’s here

/ˈevriːwɒnz ˈhiːə/

(3) ten years ago’s a long time

/ˈtɛn ˈjiːəz əˈgəwz ə ˈlɒng ˈtajm/

However, one place where this contraction can’t happen is immediately after the proximal demonstrative this. This is strange, because it can certainly happen after the distal demonstrative that, and one wouldn’t expect these two very similar words to behave so differently:

(4) that’s funny
/ˈðats ˈfʊniː/

(5) *this’s funny

There is a complication here which I’ve kind of skirted over, though. Sure, this’s funny is unacceptable in writing. But what would it sound like, if it was said in speech? Well, the -’s enclitic form of is can actually be realized on the surface in a couple of different ways, depending on the phonological environment. You might already have noticed that it’s /-s/ in example (4), but /-z/ in examples (1)-(3). This allomorphy (variation in phonological form) is reminiscent of the allomorphy in the plural suffix: cats is /ˈkats/, dogs is /ˈdɒgz/, horses is /ˈhɔːsɪz/. In fact the distribution of the /-s/ and /-z/ realizations of -‘s is exactly the same as for the plural suffix: /-s/ appears after voiceless non-sibilant consonants and /-z/ appears after vowels and voiced non-sibilant consonants. The remaining environment, the environment after sibilants, is the environment in which the plural suffix appears as /-ɪz/. And this environment turns out to be exactly the same environment in which -’s is unacceptable in writing. Here are a couple more examples:

(6) *a good guess’s worth something (compare: the correct answer’s worth something)

(7) *The Clash’s my favourite band (compare: Pearl Jam’s my favourite band)

Now, if -‘s obeys the same rules as the plural suffix then we’d expect it to be realized as /-ɪz/ in this environment. However… this is exactly the same sequence of segments that the independent word is is realized as when it is unstressed. One might therefore suspect that in sentences like (8) below, the morpheme graphically represented as the independent word is is actually the enclitic -‘s, it just happens to be realized the same as the independent word is and therefore not distinguished from it in writing. (Or, perhaps it would be more elegant to say that the contrast between enclitic and independent word is neutralized in this environment.)

(8) The Clash is my favourite band

Well, this is (*this’s) a very neat explanation, and if you do a Google search for “this’s” that’s pretty much the explanation you’ll find given to the various other confused people who have gone to websites like English Stack Exchange to ask why this’s isn’t a word. Unfortunately, I think it can’t be right.

The problem is, there are some accents of English, including mine, which have /-əz/ rather than /-ɪz/ in the allomorph of the plural suffix that occurs after sibilants, while at the same time pronouncing unstressed is as /ɪz/ rather than /əz/. (There are minimal pairs, such as peace is upon us /ˈpiːsɪz əˈpɒn ʊz/ and pieces upon us /ˈpiːsəz əˈpɒn ʊz/.) If the enclitic form of is does occur in (8) then we’d expect it to be realized as /əz/ in these accents, just like the plural suffix would be in the same environment. This is not what happens, at least in my own accent: (8) can only have /ɪz/. Indeed, it can be distinguished from the minimally contrastive NP (9):

(9) The Clash as my favourite band

In fact this problem exists in more standard accents of English as well, because is is not the only word ending in /-z/ which can end a contraction. The third-person singular present indicative of the verb have, has, can also be contracted to -‘s, and it exhibits the expected allomorphy between voiceless and voiced realizations:

(10) it’s been a while /ɪts ˈbiːn ə ˈwajəl/

(11) somebody I used to know’s disappeared /ˈsʊmbɒdiː aj ˈjuːst tə ˈnəwz dɪsəˈpijəd/

But like is it does not contract, at least in writing, after sibilants, although it may drop the initial /h-/ whenever it’s unstressed:

(12) this has gone on long enough /ˈðɪs (h)əz gɒn ɒn lɒng əˈnʊf/

I am not a native speaker of RP, so, correct me if I’m wrong. But I would be very surprised if any native speaker of RP would ever pronounce has as /ɪz/ in sentences like (12).

What’s going on? I actually do think the answer given above—that this’s isn’t written because it sounds exactly the same as this is—is more or less correct, but it needs elaboration. Such an answer can only be accepted if we in turn accept that the plural -s, the reduced -‘s form of is and the reduced -‘s form of has do not all exhibit the same allomorph in the environment after sibilants. The reduced form of is has the allomorph /-ɪz/ in all accents, except in those such as Australian English in which unstressed /ɪ/ merges with schwa. The reduced form of has has the allomorph /-əz/ in all accents. The plural suffix has the allomorph /-ɪz/ in some accents, but /-əz/ in others, including some in which /ɪ/ is not merged completely with schwa and in particular is not merged with schwa in the unstressed pronunciation of is.

Introductory textbooks on phonology written in the English language are very fond of talking about the allomorphy of the English plural suffix. In pretty much every treatment I’ve seen, it’s assumed that /-z/ is the underlying form, and /-s/ and /-əz/ are derived by phonological rules of voicing assimilation and epenthesis respectively, with the voicing assimilation crucially coming after the epenthesis (otherwise we’d have an additional allomorph /-əs/ after voiceless sibilants, while /-əz/ would only appear after voiced sibilants). This is the best analysis when the example is taken in isolation, because positing an epenthesis rule allows the phonological rules to be assumed to be productive across the entire lexicon of English. If such a fully productive deletion rule were posited, then it would be impossible to account for the pronunciation of a word like Paulas (‘multiple people named Paula’) with /-əz/ on the surface, whose underlying form would be exactly the same, phonologically, as Pauls (‘multiple people named Paul’). (This example only works if your plural suffix post-sibilant allomorph is /-əz/ rather than /-ɪz/, but a similar example could probably be exhibited in the other case.) One could appeal to the differing placement of the morpheme boundary but this is unappealing.

However, the assumption that a single epenthesis rule operating between sibilants is productive across the entire English lexicon has to be given up, because ‘s < is and ‘s < has have different allomorphs after sibilants! Either they are accounted for by two different lexically-conditioned epenthesis rules (which is a very unappealing model) or the allomorphs with the vowels are actually the underlying ones, and the allomorphs without the vowels are produced by a not phonologically-conditioned but at least (sort of) morphologically-conditioned deletion rule that elides fully reduced unstressed vowels (/ə/, /ɪ/) before word-final obstruents. This rule only applies in inflectional suffixes (e.g. lettuce and orchid are immune), and even there it does not apply unconditionally because the superlative suffix -est is immune to it. But this doesn’t bother me too much. One can argue that the superlative is kind of a marginal inflectional category, when you put it in the company of the plural, the possessive and the past tense.

A nice thing about the synchronic rule I’m proposing here is that it’s more or less exactly the same as the diachronic rule that produced the whole situation in the first place. The Old English nom./acc. pl., gen. sg., and past endings were, respectively, -as, -es, -aþ and -ede. In Middle English final schwa was elided unconditionally in absolute word-final position, while in word-final unstressed syllables where it was followed by a single obstruent it was gradually eliminated by a process of lexical diffusion from inflectional suffix to inflectional suffix, although “a full coverage of the process in ME is still outstanding” (Minkova 2013: 231). Even the superlative suffix was reduced to /-st/ by many speakers for a time, but eventually the schwa-ful form of this suffix prevailed.

I don’t see this as a coincidence. My inclination, when it comes to phonology, is to see the historical phonology as essential for understanding the present-day phonology. Synchronic phonological alternations are for the most part caused by sound changes, and trying to understand them without reference to these old sound changes is… well, you may be able to make some progress but it seems like it’d be much easier to make progress more quickly by trying to understand the things that cause them—sound changes—at the same time. This is a pretty tentative paragraph, and I’m aware I’d need a lot more elaboration to make a convincing case for this stance. But this is where my inclination is headed.

[1] The transcription system is the one which I prefer to use for my own accent of English.


Minkova, D. 2013. A Historical Phonology of English. Edinburgh University Press.

A language with no word-initial consonants

I was having a look at some of the squibs in Linguistic Inquiry today, which are often fairly interesting (and have the redeeming quality that, when they’re not interesting, they’re at least short), and there was an especially interesting one in the April 1970 (second ever) issue by R. M. W. Dixon (Dixon 1970) which I’d like to write about for the benefit of those who can’t access it.

In Olgolo, a variety of Kunjen spoken on the Cape York Peninsula, there appears to been a sound change that elided consonants in initial position. That is, not just consonants of a particular variety, but all consonants. As a result of this change, every word in the language begins with a vowel. Examples (transcriptions in IPA):

  • *báma ‘man’ > áb͡ma
  • *míɲa ‘animal’ > íɲa
  • *gúda ‘dog’ > úda
  • *gúman ‘thigh’ > úb͡man
  • *búŋa ‘sun’ > úg͡ŋa
  • *bíːɲa ‘aunt’ > íɲa
  • *gúyu ‘fish’ > úyu
  • *yúgu ‘tree, wood’ > úgu

(Being used to the conventions of Indo-Europeanists, I’m a little disturbed by the fact that Dixon doesn’t identify the linguistic proto-variety to which the proto-forms in these examples belong, nor does he cite cognates to back up his reconstruction. But I presume forms very similar to the proto-forms are found in nearby Paman languages. In fact, I know for a fact that the Uradhi word for ‘tree’ is /yúku/ because Black (1993) mentions it by way of illustrating the remarkable Uradhi phonological rule which inserts a phonetic [k] or [ŋ] after every vowel in utterance-final position. Utterance-final /yúku/ is by this means realized as [yúkuk] in Uradhi.)

(The pre-stopped nasals in some of these words [rather interesting segments in of themselves, but fairly widely attested, see the Wikipedia article] have arisen due to a sound change occurring before the word-initial consonant elision sound change, which pre-stopped nasals immediately after word-initial syllables containing a stop or *w followed by a short vowel. This would have helped mitigate the loss of contrast resulting from the word-initial consonant elision sound change a little, but only a little, and between e.g. the words for ‘animal’ and ‘aunt’ homophony was not averted because ‘aunt’ had an originally long vowel [which was shortened in Olgolo by yet another sound change].)

Dixon says Olgolo is the only language he’s heard of in which there are no word-initial consonants, although it’s possible that more have been discovered since 1970. However, there is a caveat to this statement: there are monoconsonantal prefixes that can be optionally added to most nouns, so that they have an initial consonant on the surface. There are at least four of these prefixes, /n-/, /w-/, /y-/ and /ŋ-/; however, every noun seems to only take a single one of these prefixes, so we can regard these three forms as lexically-conditioned allomorphs of a single prefix. The conditioning is in fact more precisely semantic: roughly, y- is added to nouns denoting fish, n- is added to nouns denoting other animals, and w- is added to nouns denoting various inanimates. The prefixes therefore identify ‘noun classes’ in a sense (although these are probably not noun classes in a strict sense because Dixon gives no indication that there are any agreement phenomena which involve them). The prefix ŋ- was only seen on a one word, /ɔ́jɟɔba/ ~ /ŋɔ́jɟɔba/ ‘wild yam’ and might be added to all nouns denoting fruits and vegetables, given that most Australian languages with noun classes have a noun class for fruits and vegetables, but there were no other such nouns in the dataset (Dixon only noticed the semantic conditioning after he left the field, so he didn’t have a chance to elicit any others). It must be emphasized, however, that these prefixes are entirely optional, and every noun which can have a prefix added to it can also be pronounced without the prefix. In addition some nouns, those denoting kin and body parts, appear to never take a prefix, although possibly this is just a limitation of the dataset given that their taking a prefix would be expected to be optional in any case. And words other than nouns, such as verbs, don’t take these prefixes at all.

Dixon hypothesizes that the y- and n- prefixes are reduced forms of /úyu/ ‘fish’ and /íɲa/ ‘animal’ respectively, while w- may be from /úgu/ ‘tree, wood’ or just an “unmarked” initial consonant (it’s not clear what Dixon means by this). These derivations are not unquestionable (for example, how do we get from /-ɲ-/ to /n-/ in the ‘animal’ prefix?) But it’s very plausible that the prefixes do originate in this way, even if the exact antedecent words are difficult to identify, because similar origins have been identified for noun class prefixes in other Australian languages (Dixon 1968, as cited by Dixon 1970). Just intuitively, it’s easy to see how nouns might come to be ever more frequently replaced by compounds of the dependent original noun and a term denoting a superset; cf. English koala ~ koala bear, oak ~ oak tree, gem ~ gemstone. In English these compounds are head-final but in other languages (e.g. Welsh) they are often head-initial, and presumably this would have to be the case in pre-Olgolo in order for the head elements to grammaticalize into noun class prefixes. The fact that the noun class prefixes are optional certainly suggests that the system is very much incipient, and still developing, and therefore of recent origin.

It might therefore be very interesting to see how the Olgolo language has changed after a century or so; we might be able to examine a noun class system as it develops in real time, with all of our modern equipment and techniques available to record each stage. It would also be very interesting to see how quickly this supposedly anomalous state of every word beginning with a vowel (in at least one of its freely-variant forms) is eliminated, especially since work on Australian language phonology since 1970 has established many other surprising findings about Australian syllable structure, including a language where the “basic’ syllable type appears to be VC rather than CV (Breen & Pensalfini 1999). Indeed, since Dixon wrote this paper 46 years ago Olgolo might have changed considerably already. Unfortunately, it might have changed in a somewhat more disappointing way. None of the citations of Dixon’s paper recorded by Google Scholar seem to examine Olgolo any further, and the documentation on Kunjen (the variety which includes Olgolo as a subvariety) recorded in the Australian Indigenous Languages Database isn’t particularly overwhelming. I can’t find a straight answer as to whether Kunjen is extinct today or not (never mind the Olgolo variety), but Dixon wasn’t optimistic about its future in 1970:

It would be instructive to study the development of Olgolo over the next few generations … Unfortunately, the language is at present spoken by only a handful of old people, and is bound to become extinct in the next decade or so.


Black, P. 1993 (post-print). Unusual syllable structure in the Kurtjar language of Australia. Retrieved from on 26 September 2016.

Breen, G. & Pensalfini, R. 1999. Arrernte: A Language with No Syllable Onsets. Linguistic Inquiry 30 (1): 1-25.

Dixon, R. M. W. 1968. Noun Classes. Lingua 21: 104-125.

Dixon, R. M. W. 1970. Olgolo Syllable Structure and What They Are Doing about It. Linguistic Inquiry 1 (2): 273-276.

The insecurity of relative chronologies

One of the things historical linguists do is reconstruct relative chronologies: statements about whether one change in a language occurred before another change in the language. For example, in the history of English there was a change which raised the Middle English (ME) mid back vowel /oː/, so that it became high /uː/: boot, pronounced /boːt/ in Middle English, is now pronounced /buːt/. There was also a change which caused ME /oː/ to be reflected as short /ʊ/ before /k/ (among other consonants), so that book is now pronounced as /bʊk/. There are two possible relative chronologies of these changes: either the first happens before the second, or the second happens before the first. Now, because English has been well-recorded in writing for centuries, because these written records of the language often contain phonetic spellings, and because they also sometimes communicate observations about the language’s phonetics, we can date these changes quite precisely. The first probably began in the thirteenth century and continued through the fourteenth, while the second took place in the seventeenth century (Minkova 2015: 253-4, 272). In this particular case, then, no linguistic reasoning is needed to infer the relative chronology. But much of if not most of the time in historical linguistics, we are not so lucky, and are dealing with the history of languages for which written records in the desired time period are much less extensive, or completely nonexistent. Relative chronologies can still be inferred under these circumstances; however, it is a methodologically trickier business. In this post, I want to point out some complications associated with inferring relative chronologies under these circumstances which I’m not sure historical linguists are always aware of.

Let’s begin by thinking again about the English example I gave above. If English was an unwritten language, could we still infer that the /oː/ > /uː/ change happened before the /oː/ > /ʊ/ change? (I’m stating these changes as correspondences between Middle English and Modern English sounds—obviously if /oː/ > /uː/ happened first then the second change would operate on /uː/ rather than /oː/.) A first answer might go something along these lines: if the /oː/ > /uː/ change in quality happens first, then the second change is /uː/ > /ʊ/, so it’s one of quantity only (long to short). On the other hand, if /oː/ > /ʊ/ happens first we have a shift of both quantity and quality at the same time, followed by a second shift of quality. The first scenario is simpler, and therefore more likely.

Admittedly, it’s only somewhat more likely than the other scenario. It’s not absolutely proven to be the correct one. Of course we never have truly absolute proofs of anything, but I think there’s a good order of magnitude or so of difference between the likelihood of /oː/ > /uː/ happening first, if we ignore the evidence of the written records and accept this argument, and the likelihood of /oː/ > /uː/ happening first once we consider the evidence of the written records.

But in fact we can’t even say it’s more likely, because the argument is flawed! The /uː/ > /ʊ/ would involve some quality adjustment, because /ʊ/ is a little lower and more central than /uː/.[1] Now, in modern European languages, at least, it is very common for minor quality differences to exist between long and short vowels, and for lengthening and shortening changes to involve the expected minor shifts in quality as well (if you like, you can think of persistent rules existing along the lines of /u/ > /ʊ/ and /ʊː/ > /uː/, which are automatically applied after any lengthening or shortening rules to “adjust” their outputs). We might therefore say that this isn’t really a substantive quality shift; it’s just a minor adjustment concomitant with the quality shift. But sometimes, these quality adjustments following lengthening and shortening changes go in the opposite direction than might be expected based on etymology. For example, when /ʊ/ was affected by open syllable lengthening in Middle English, it became /oː/, not /uː/: OE wudu > ME wood /woːd/. This is not unexpected, because the quality difference between /uː/ and /ʊ/ is (or, more accurately, can be) such that /ʊ/ is about as close in quality to /oː/ as it is to /uː/. Given that /ʊ/ could lengthen into /oː/ in Middle English, it is hardly unbelievable that /oː/ could shorten into /ʊ/ as well.

I’m not trying to say that one should go the other way here, and conclude that /oː/ > /ʊ/ happened first. I’m just trying to argue that without the evidence of the written records, no relative chronological inference can be made here—not even an insecure-but-best-guess kind of relative chronological inference. To me this is surprising and somewhat disturbing, because when I first started thinking about it I was convinced that there were good intrinsic linguistic reasons for taking the /oː/ > /uː/-first scenario as the correct one. And this is something that happens with a lot of relative chronologies, once I start thinking about them properly.

Let’s now go to an example where there really is no written evidence to help us, and where my questioning of the general relative-chronological assumption might have real force. In Greek, the following two very well-known generalizations about the reflexes of Proto-Indo-European (PIE) forms can be made:

  1. The PIE voiced aspirated stops are reflected in Greek as voiceless aspirated stops in the general environment: PIE *bʰéroh2 ‘I bear’ > Greek φέρω, PIE *dʰéh₁tis ‘act of putting’ > Greek θέσις ‘placement’, PIE *ǵʰáns ‘goose’ > Greek χήν.
  2. However, in the specific environment before another PIE voiced aspirated stop in the onset of the immediately succeeding syllable, they are reflected as voiceless unaspirated stops: PIE *bʰeydʰoh2 ‘I trust’ > Greek πείθω ‘I convince’, PIE *dʰédʰeh1mi ‘I put’ > Greek τίθημι. This is known as Grassman’s Law. PIE *s (which usually became /h/ elsewhere) is elided in the same environment: PIE *segʰoh2 ‘I hold’ > Greek ἔχω ‘I have’ (note the smooth breathing diacritic).

On the face of it, the fact that Grassman’s Law produces voiceless unaspirated stops rather than voiced ones seems to indicate that it came into effect only after the sound change that devoiced the PIE voiced aspirated stops. For otherwise, the deaspiration of these voiced aspirated stops due to Grassman’s Law would have produced voiced unaspirated stops at first, and voiced unaspirated stops inherited from PIE, as in PIE *déḱm̥ ‘ten’ > Greek δέκα, were not devoiced.

However, if we think more closely about the phonetics of the segments involved, this is not quite as obvious. The PIE voiced aspirated stops could surely be more accurately described as breathy-voiced stops, like their presumed unaltered reflexes in modern Indo-Aryan languages. Breathy voice is essentially a kind of voice which is closer to voicelessness than voice normally is: the glottis is more open (or less tightly closed, or open at one part and not at another part) than it is when a modally voiced sound is articulated. Therefore it does not seem out of the question for breathy-voiced stops to deaspirate to voiceless stops if they are going to be deaspirated, in a similar manner as ME /ʊ/ becoming /oː/ when it lengthens. Granted, I don’t know of any attested parallels for such a shift. And in Sanskrit, in which a version of Grassman’s Law also applies, breathy-voiced stops certainly deaspirate to voiced stops: PIE *dʰédʰeh1mi ‘I put’ > Sanskrit dádhāmi. So the Grassman’s Law in Greek certainly has to be different in nature (and probably an entirely separate innovation) from the Grassman’s Law in Sanskrit.[2]

Another example of a commonly-accepted relative chronology which I think is highly questionable is the idea that Grimm’s Law comes into effect in Proto-Germanic before Verner’s Law does. To be honest, I’m not really sure what the rationale is for thinking this in the first place. Ringe (2006: 93) simply asserts that “Verner’s Law must have followed Grimm’s Law, since it operated on the outputs of Grimm’s Law”. This is unilluminating: certainly Verner’s Law only operates on voiceless fricatives in Ringe’s formulation of it, but Ringe does not justify his formulation of Verner’s Law as applying only to voiceless fricatives. In general, sound changes will appear to have operated on the outputs of a previous sound change if one assumes in the first place that the previous sound change comes first: the key to justifying the relative chronology properly is to think about what alternative formulations of each sound change are required in order to make the alternative chronology (such alternative formulations can almost always be formulated), and establish the high relative unnaturalness of the sound changes thus formulated compared to the sound changes as formulable under the relative chronology which one wishes to justify.

If the PIE voiceless stops at some point became aspirated (which seems very likely, given that fricativization of voiceless stops normally follows aspiration, and given that stops immediately after obstruents, in precisely the same environment that voiceless stops are unaspirated in modern Germanic languages, are not fricativized), then Verner’s Law, formulated as voicing of obstruents in the usual environments, followed by Grimm’s Law formulated in the usual manner, accounts perfectly well for the data. A Wikipedia editor objects, or at least raises the objection, that a formulation of the sound change so that it affects the voiceless fricatives, specifically, rather than the voiceless obstruents as a whole, would be preferable—but why? What matters is the naturalness of the sound change—how likely it is to happen in a language similar to the one under consideration—not the sizes of the categories in phonetic space that it refers to. Some categories are natural, some are unnatural, and this is not well correlated with size. Both fricatives and obstruents are, as far as I am aware, about equally natural categories.

I do have one misgiving with the Verner’s Law-first scenario, which is that I’m not aware of any attested sound changes involving intervocalic voicing of aspirated stops. Perhaps voiceless aspirated stops voice less easily than voiceless unaspirated stops. But Verner’s Law is not just intervocalic voicing, of course: it also interacts with the accent (precisely, it voices obstruents only after unaccented syllables). If one thinks of it as a matter of the association of voice with low tone, rather than of lenition, then voicing of aspirated stops might be a more believable possibility.

My point here is not so much about the specific examples; I am not aiming to actually convince people to abandon the specific relative chronologies questioned here (there are likely to be points I haven’t thought of). My point is to raise these questions in order to show at what level the justification of the relative chronology needs to be done. I expect that it is deeper than many people would think. It is also somewhat unsettling that it relies so much on theoretical assumptions about what kinds of sound changes are natural, which are often not well-established.

Are there any relative chronologies which are very secure? Well, there is another famous Indo-European sound law associated with a specific relative chronology which I think is secure. This is the “law of the palatals” in Sanskrit. In Sanskrit, PIE *e, *a and *o merge as a; but PIE *k/*g/*gʰ and *kʷ/*gʷ/*gʷʰ are reflected as c/j/h before PIE *e (and *i), and k/g/gh before PIE *a and *o (and *u). The only credible explanation for this, as far as I can see, is that an earlier sound change palatalizes the dorsal stops before *e and *i, and then a later sound change merges *e with *a and *o. If *e had already merged with *a and *o by the time the palatalization occurred, then the palatalization would have to occur before *a, and it would have to be sporadic: and sporadic changes are rare, but not impossible (this is the Neogrammarian hypothesis, in its watered-down form). But what really clinches it is this: that sporadic change would have to apply to dorsal stops before a set of instances of *a which just happened to be exactly the same as the set of instances of *a which reflect PIE *e, rather than *a or *o. This is astronomically unlikely, and one doesn’t need any theoretical assumptions to see this.[3]

Now the question I really want to answer here is: what exactly are the relevant differences in this relative chronology that distinguish it from the three more questionable ones I examined above, and allow us to infer it with high confidence (based on the unlikelihood of a sporadic change happening to appear conditioned by an eliminated contrast)? It’s not clear to me what they are. Something to do with how the vowel merger counterbleeds the palatalization? (I hope this is the correct relation. The concepts of (counter)bleeding and (counter)feeding are very confusing for me.) But I don’t think this is referring to the relevant things. Whether two phonological rules / sound changes (counter)bleed or (counter)feed each other is a function of the natures of the phonological rules / sound changes; but when we’re trying to establish relative chronologies we don’t know what the natures of the phonological rules / sound changes are! That has to wait until we’ve established the relative chronologies. I think that’s why I keep failing to compute whether there is also a counterbleeding in the other relative chronologies I talked about above: the question is non-well-formed. (In case you can’t tell, I’m starting to mostly think aloud in this paragraph.) What we do actually know are the correspondences between the mother language and the daughter language[4], so an answer to the question should state it in terms of those correspondences. Anyway, I think it is best to leave it here, for my readers to read and perhaps comment with their ideas, providing I’ve managed to communicate the question properly; I might make another post on this theme sometime if I manage to work out (or read) an answer that satisfies me.

Oh, but one last thing: is establishing the security of relative chronologies that important? I think it is quite important. For a start, relative chronological assumptions bear directly on assumptions about the natures of particular sound changes, and that means they affect our judgements of which types of sound changes are likely and which are not, which are of fundamental importance in historical phonology and perhaps of considerable importance in non-historical phonology as well (under e.g. the Evolutionary Phonology framework of Blevins 2004).[5] But perhaps even more importantly, they are important in establishing genetic linguistic relationships. Ringe & Eska (2014) emphasize in their chapter on subgrouping how much less likely it is for languages to share the same sequence of changes than the same unordered set of changes, and so how the establishment of secure relative chronologies is our saving grace when it comes to establishing subgroups in cases of quick diversification (where there might be only a few innovations common to a given subgroup). This seems reasonable, but if the relative chronologies are insecure and questionable, we have a problem (and the sequence of changes they cite as establishing the validity of the Germanic subgroup certainly contains some questionable relative chronologies—for example they have all three parts of Grimm’s Law in succession before Verner’s Law, but as explained above, Verner’s Law could have come before Grimm’s; the third part of Grimm’s Law may also have not happened separately from the first).

[1] This quality difference exists in present-day English for sure—modulo secondary quality shifts which have affected these vowels in some accents—and it can be extrapolated back into seventeenth-century English with reasonable certainty using the written records. If we are ignoring the evidence of the written records, we can postulate that the quality differentiation between long /uː/ and short /ʊ/ was even more recent than the /uː/ > /ʊ/ shift (which would now be better described as an /uː/ > /u/ shift). But the point is that such quality adjustment can happen, as explained in the rest of the paragraph.

[2] There is a lot of literature on Grassman’s Law, a lot of it dealing with relative chronological issues and, in particular, the question of whether Grassman’s Law can be considered a phonological rule that was already present in PIE. I have no idea why one would want to—there are certainly PIE forms inherited in Germanic that appear to have been unaffected by Grassman’s Law, as in PIE *bʰeydʰ- > English bide; but I’ve hardly read any of this literature. My contention here is only that the generally-accepted relative chronology of Grassman’s Law and the devoicing of the PIE voiced aspirated stops can be contested.

[3] One should bear in mind some subtleties though—for example, *e and *a might have gotten very, very phonetically similar, so that they were almost merged, before the palatalization occured. If one wants to rule out that scenario, one has to appeal again to the naturalness of the hypothesized sound changes. But as long as we are talking about the full merger of *e and *a we can confidently say that it occurred after palatalization.)

[4] Actually, in practice we don’t know these with certainty either, and the correspondences we postulate to some extent are influenced by our postulations about the natures of sound changes that have occurred and their relative chronologies… but I’ve been assuming they can be established more or less independently throughout these posts, and that seems a reasonable assumption most of the time.

[5] I realize I’ve been talking about phonological changes throughout this post, but obviously there are other kinds of linguistic changes, and relative chronologies of those changes can be established too. How far the discussion in this post applies outside of the phonological domain I will leave for you to think about.


Blevins, J. 2004. Evolutionary phonology: The emergence of sound patterns. Cambridge University Press.

Minkova, D. 2013. A historical phonology of English. Edinburgh University Press.

Ringe, D. 2006. A linguistic history of English: from Proto-Indo-European to Proto-Germanic. Oxford University Press.

Ringe, D. & Eska, J. F. 2013. Historical linguistics: toward a twenty-first century reintegration. Cambridge University Press.

Animacy and the meanings of ‘in front of’ and ‘behind’

The English prepositions ‘in front of’ and ‘behind’ behave differently in an interesting way depending on whether they have animate or inanimate objects.

To illustrate, suppose there are two people—let’s call them John and Mary—who are standing colinear with a ball. Three parts of the line can be distinguished: the segment between John’s and Mary’s positions (let’s call it the middle segment), the ray with John at its endpoint (let’s call it John’s ray), and the ray with Mary at its endpoint (let’s call it Mary’s ray). Note that John may be in front of or behind his ray, or at the side of it, depending on which way he faces; likewise with Mary, although, let’s assume that Mary is either in front of or behind her ray. What determines whether John describes the position of the ball, relative to Mary, as “in front of Mary” or “behind Mary”? First, note that it doesn’t matter which way John is facing. The relevant parameters are the way Mary is facing, and whether the ball is on the middle segment or Mary’s ray. So there are four different situations to consider:

  1. The ball is on the middle segment, and Mary is facing the middle segment. In this case, John can say, “Mary, the ball is in front of you.” But if he said, “Mary, the ball is behind you,” that statement would be false.
  2. The ball is on the middle segment, and Mary is facing her ray. In this case, John can say, “Mary, the ball is behind you.” But if he said, “Mary, the ball is in front of you,” that statement would be false.
  3. The ball is on Mary’s ray, and Mary is facing her ray. In this case, John can say, “Mary, the ball is in front of you.” But if he said, “Mary, the ball is behind you,” that statement would be false.
  4. The ball is on Mary’s ray, and Mary is facing the middle segment. In this case, John can say, “Mary, the ball is behind you.” But if he said, “Mary, the ball is in front of you,” that statement would be false.

So, the relevant variable is whether the ball’s position, and the position towards which Mary is facing, match up: if Mary faces the part of the line the ball is on, it’s in front of her, and if Mary faces away from the part of the line the ball is on, it’s behind her.

This all probably seems very obvious and trivial. But consider what happens if we replace Mary with a lamppost. A lamppost doesn’t have a face; it doesn’t even have clearly distinct front and back sides. So one of the parameters here—the way Mary is facing—has disappeared. But one has also been added—because now the way that John is facing is relevant. So there are still four situations:

  1. The ball is on the middle segment, and John is facing the middle segment. In this case, John can say, “The ball is in front of the lamppost.”
  2. The ball is on the middle segment, and John is facing his ray. In this case, I don’t think it really makes sense for John say either, “The ball is in front of the lamppost,” or, “The ball is behind the lamppost,” unless he is implicitly taking the perspective of some other person who is facing the middle segment. The most he can say is, “The ball is between me and the lamppost.”
  3. The ball is on Mary’s (or rather, the lamppost’s) ray, and John is facing the middle segment. In this case, John can say, “The ball is behind the lamppost.”
  4. The ball is on Mary’s (or rather, the lamppost’s) ray, and John is facing his ray. In this case, I don’t think it really makes sense for John say either, “The ball is in front of the lamppost,” or, “The ball is behind the lamppost,” unless he is implicitly taking the perspective of some other person who is facing the middle segment. The most he can say is, “The ball is behind me, and past the lamppost.”

A preliminary hypothesis: it seems that the prepositions ‘in front of’ and ‘behind’ can only be understood with reference to the perspective of a (preferably) animate being who has a face and a back, located on opposite sides of their body. If the object is animate, then this being is the object. The preposition ‘in front of’ means ‘on the ray extending from [the object]’s face’. The preposition ‘behind’ means ‘on the ray extending from [the object]’s back’. But if the object is inanimate, then … well, it seems to me that there are two analyses you could make:

  • The definitions just become completely different. The prepositions ‘in front of’ and ‘behind’ now presuppose that the object is on the ray extending from the speaker’s face. If the subject (the referent of the noun to which the prepositional phrase is attached, e.g. the ball above) is between the speaker and the object, it’s in front of the object. Otherwise (given the presupposition), it’s behind the object.
  • If the speaker is facing the object, the speaker imagines that the object has a face and a back and is looking back at the speaker. Then the regular definitions apply, so ‘in front of’ means ‘on the ray extending from [the object]’s face, i.e. on the ray extending from [the speaker]’s back or on the middle segment’, and ‘behind’ means ‘on the ray extending from [the object]’s back, i.e. on the ray extending from [the speaker]’s face but not on the middle segment’. On the other hand, if the speaker isn’t facing the object, then (for some reason) they fail to imagine the object as having a face and a back.

The first analysis feels more intuitively correct to me, when I think about what ‘in front of’ and ‘behind’ mean with inanimate objects. But the second analysis makes the same predictions, does not require the postulation of separate definitions in the animate-object and inanimate-object cases and goes some way towards explaining the presupposition that the object is on the ray extending from the speaker’s face (though it does not explain it completely, because it is still puzzling to me why the speaker imagines in particular that the object is facing the speaker, and why no such imagination takes place when the speaker does not face the object). Perhaps it should be preferred, then, although I definitely don’t intuitively feel like phrases like ‘in front of the lamppost’ are metaphors involving an imagination of the lamppost as having a face and a back.

Now, I’ve been talking above like all animate objects have a face and a back and all inanimate objects don’t, but this isn’t quite the case. Although the prototypical members of the categories certainly correlate in this respect, there are inanimate objects like cars, which can be imagined as having a face and a back, and certainly at least have distinct front and back sides. (It’s harder to think of examples of animates that don’t have a front and a back. Jellyfish, perhaps—but if a jellyfish is swimming towards you, you’d probably implicitly imagine its front as being the side closer to you. Given that animates are by definition capable of movement, perhaps animates necessarily have fronts and backs in this sense.)

With respect to these inanimate objects, I think they can be regarded both as animates/faced-and-backed beings or inanimates/unfaced-and-unbacked beings, with free variation as to whether they are so regarded. I can imagine John saying, “The ball is in front of the car,” if John is facing the boot of the car and the ball is in between him and the boot. But I can also imagine him saying, “The ball is behind the car.” He’d really have to say something more specific to make it clear where the ball is. This is much like how non-human animates are sometimes referred to as “he” or “she” and sometimes referred to as “it”.

The reason I started thinking about all this was that I read a passage in Claude Hagège’s 2010 book, Adpositions. Hagège gives the following three example sentences in Hausa:

(1) ƙwallo ya‐na gaba-n Audu
ball 3SG.PRS.S‐be in.front.of-3SG.O Audu
‘the ball is in front of Audu’

(2) ƙwallo ya‐na bayan‐n Audu
ball 3SG.PRS.S‐be behind-3SG.O Audu
‘the ball is behind Audu’

(3) ƙwallo ya‐na baya-n telefo
ball 3SG.PRS.S‐be behind-3SG.O telephone
‘the ball is in front of the telephone’ (lit. ‘the ball is behind the telephone’)

He then writes (I’ve adjusted the numbers of the examples; emphasis original):

If the ball is in front of someone whom ego is facing, as well as if the ball is behind someone and ego is also behind this person and the ball, Hausa and English both use an Adp [adposition] with the same meaning, respectively “in front of” in (1), and “behind” in (2). On the contrary, if the ball is in front of a telephone whose form is such that one can attribute this set a posterior face, which faces ego, and an anterior face, oriented in the opposite direction, the ball being between ego and the telephone, then English no longer uses the intrinsic axis from front to back, and ignores the fact that the telephone has an anterior and a posterior face: it treats it as a human individual, in front of which the ball is, whatever the face presented to the ball by the telephone, hence (3). As opposed to that, Hausa keeps to the intrinsic axis, in conformity to the more or less animist conception, found in many African cultures and mythologies, which views objects as spatial entities possessing their own structure. We thus have, here, a case of animism in grammar.

I don’t entirely agree with Hagège’s description here. I think a telephone is part of the ambiguous category of inanimate objects that have clearly distinct fronts and backs, and which can therefore be treated either way with respect to ‘in front of’ and ‘behind’. It might be true that Hausa speakers show a much greater (or a universal) inclination to treat inanimate objects like this in the manner of animates, but I’m not convinced from the wording here that Hagège has taken into account the fact that there might be variation on this point within both languages. And even if there is a difference, I would caution against assuming it has any correlation with religious differences (though it’s certainly a possibility which should be investigated!)

But it’s an interesting potential cross-linguistic difference in adpositional semantics. And regardless, I’m glad to have read the passage because it’s made me aware of this interesting complexity in the meanings of ‘in front of’ and ‘behind’, which I had never noticed before.

Vowel-initial and vowel-final roots in Proto-Indo-European

A remarkable feature of Proto-Indo-European (PIE) is the restrictiveness of the constraints on its root structure. It is generally agreed that all PIE roots were monosyllabic, containing a single underlying vowel. In fact, the vast majority of the roots are thought to have had a single underlying vowel, namely *e. (Some scholars reconstruct a small number of roots with underlying *a rather than *e; others do not, and reconstruct underlying *e in every PIE root.) It is also commonly supposed that every root had at least one consonant on either side of its vowel; in other words, that there were no roots which began or ended with the vowel (Fortson 2004: 71).

I have no dispute with the first of these constraints; though it is very unusual, it is not too difficult to understand in connection with the PIE ablaut system, and the Semitic languages are similar with their triconsonantal, vowel-less roots. However, I think the other constraint, the one against vowel-initial and vowel-final roots, is questionable. In order to talk about it with ease and clarity, it helps to have a name for it: I’m going to call it the trisegmental constraint, because it amounts to the constraint that every PIE root contains at least three segments: the vowel, a consonant before the vowel, and a consonant after the vowel.

The first thing that might make one suspicious of the trisegmental constraint is that it isn’t actually attested in any IE language, as far as I know. English has vowel-initial roots (e.g. ask) and vowel-final roots (e.g. fly); so do Latin, Greek and Sanskrit (cf. S. aj- ‘drive’, G. ἀγ- ‘lead’, L. ag- ‘do’), and L. dō-, G. δω-, S. dā-, all meaning ‘give’). And for much of the early history of IE studies, nobody suspected the constraint’s existence: the PIE roots meaning ‘drive’ and ‘give’ were reconstructed as *aǵ- and *dō-, respectively, with an initial vowel in the case of the former and a final vowel in the case of the latter.

It was only with the development of the laryngeal theory that the reconstruction of the trisegmental constraint became possible. The initial motivation for the laryngeal theory was to simplify the system of ablaut reconstructed for PIE. I won’t go into the motivation in detail here; it’s one of the most famous developments in IE studies so a lot of my readers are probably familiar with it already, and it’s not hard to find descriptions of it. The important thing to know, if you want to understand what I’m talking about here, is that the laryngeal theory posits the existence of three consonants in PIE which are called laryngeals and written *h1, *h2 and *h3, and that these laryngeals can be distinguished by their effects on adjacent vowels: *h2 turns adjacent underlying *e into *a and *h3 turns adjacent underlying *e into *o. In all of the IE languages other than the Anatolian languages (which are all extinct, and which records of were only discovered in the 20th century), the laryngeals are elided in pretty much everywhere, and their presence is only discernable from their effects on adjacent segments. Note that as well as changing the quality (“colouring”) underlying *e, they also lengthen preceding vowels. And between consonants, they are reflected as vowels, but as different vowels in different languages: in Greek *h1, *h2, *h3 become ε, α, ο respectively, in Sanskrit all three become i, in the other languages all three generally became a.

So, the laryngeal theory allowed the old reconstructions *aǵ- and *dō- to be replaced by *h2éǵ- and *deh3– respectively, which conform to the trisegmental constraint. In fact every root reconstructed with an initial or final vowel by the 19th century IEists could be reconstructed with an initial or final laryngeal instead. Concrete support for some of these new reconstructions with laryngeals came from the discovery of the Anatolian languages, which preserved some of the laryngeals in some positions as consonants. For example, the PIE word for ‘sheep’ was reconstructed as *ówis on the basis of the correspondence between L. ovis, G. ὄϊς, S. áviḥ, but the discovery of the Cuneiform Luwian cognate ḫāwīs confirmed without a doubt that the root must have originally begun with a laryngeal (although it is still unclear whether that laryngeal was *h2, preceding *o, or *h3, preceding *e).

There are also indirect ways in which the presence of a laryngeal can be evidenced. Most obviously, if a root exhibits the irregular ablaut alternations in the early IE languages which the laryngeal theory was designed to explain, then it should be reconstructed with a laryngeal in order to regularize the ablaut alternation in PIE. In the case of *h2eǵ-, for example, there is an o-grade derivative of the root, *h2oǵmos ‘drive’ (n.), which can be reconstructed on the evidence of Greek ὄγμος ‘furrow’ (Ringe 2006: 14). This shows that the underlying vowel of the root must have been *e, because (given the laryngeal theory) the PIE ablaut system did not involve alternations of *a with *o, only alternations of *e, *ō or ∅ (that is, the absence of the segment) with *o. But this underlying *e is reflected as if it was *a in all the e-grade derivatives of *h2eǵ- attested in the early IE languages (e.g. in the 3sg. present active indicative forms S. ájati, G. ἀγει, L. agit). In order to account for this “colouring” we must reconstruct *h2 next to the *e. Similar considerations allow us to be reasonably sure that *deh3– also contained a laryngeal, because the e-grade root is reflected as if it had *ō (S. dádāti, G. δίδωσι) and the zero-grade root in *dh3tós ‘given’ exhibits the characteristic reflex of interconsonantal *h3 (S. -ditáḥ, G. dotós, L. datus).

But in many cases there does not seem to be any particular evidence for the reconstruction of the initial or final laryngeal other than the assumption that the trisegmental constraint existed. For example, *h1éḱwos ‘horse’ could just as well be reconstructed as *éḱwos, and indeed this is what Ringe (2006) does. Likewise, there is no positive evidence that the root *muH- of *muHs ‘mouse’ (cf. S. mūṣ, G. μῦς, L. mūs) contained a laryngeal: it could just as well be *mū-. Both of the roots *(h1)éḱ- and *muH/ū- are found, as far as I know, in these stems only, so there is no evidence for the existence of the laryngeal from ablaut. It is true that PIE has no roots that can be reconstructed as ending in a short vowel, and this could be seen as evidence for at least a constraint against vowel-final roots, because if all the apparent vowel-final roots actually had a vowel + laryngeal sequence, that would explain why the vowel appears to be long. But this is not the only possible explanation: there could just be a constraint against roots containing a light syllable. This seems like a very natural constraint. Although the circumstances aren’t exactly the same—because English roots appear without inflectional endings in most circumstances, while PIE roots mostly didn’t—the constraint is attested in English: short unreduced vowels like that of cat never appear in root-final (or word-final) position; only long vowels, diphthongs and schwa can appear in word-final position, and schwa does not appear in stressed syllables.

It could be argued that the trisegmental constraint simplifies the phonology of PIE, and therefore it should be assumed to exist pending the discovery of positive evidence that some root does begin or end with a vowel. It simplifies the phonology in the sense that it reduces the space of phonological forms which can conceivably be reconstructed. But I don’t think this is the sense of “simple” which we should be using to decide which hypotheses about PIE are better. I think a reconstructed language is simpler to the extent that it is synchronically not unusual, and that the existence of whatever features it has that are synchronically unusual can be justified by explanations of features in the daughter languages by natural linguistic changes (in other words, both synchronic unusualness and diachronic unusualness must be taken into account). The trisegmental constraint seems to me synchronically unusual, because I don’t know of any other languages that have something similar, although I have not made any systematic investigation. And as far as I know there are no features of the IE languages which the trisegmental constraint helps to explain.

(Perhaps a constraint against vowel-initial roots, at least, would be more natural if PIE had a phonemic glottal stop, because people, or at least English and German speakers, tend to insert subphonemic glottal stops before vowels immediately preceded by a pause. Again, I don’t know if there are any cross-linguistic studies which support this. The laryngeal *h1 is often conjectured to be a glottal stop, but it is also often conjectured to be a glottal fricative; I don’t know if there is any reason to favour either conjecture over the other.)

I think something like this disagreement over what notion of simplicity is most important in linguistic reconstruction underlies some of the other controversies in IE phonology. For example, the question of whether PIE had phonemic *a and *ā: the “Leiden school” says it didn’t, accepting the conclusions of Lubotsky (1989), most other IEists say it did. The Leiden school reconstruction certainly reduces the space of phonological forms which can be reconstructed in PIE and therefore might be better from a falsifiability perspective. Kortlandt (2003) makes this point with respect to a different (but related) issue, the sound changes affecting initial laryngeals in Anatolian:

My reconstructions … are much more constrained [than the ones proposed by Melchert and Kimball] because I do not find evidence for more than four distinct sequences (three laryngeals before *-e- and neutralization before *-o-) whereas they start from 24 possibilites (zero and three laryngeals before three vowels *e, *a, *o which may be short or long, cf. Melchert 1994: 46f., Kimball 1999: 119f.). …

Any proponent of a scientific theory should indicate the type of evidence required for its refutation. While it is difficult to see how a theory which posits *H2 for Hittite h- and a dozen other possible reconstructions for Hittite a- can be refuted, it should be easy to produce counter-evidence for a theory which allows no more than four possibilities … The fact that no such counter-evidence has been forthcoming suggests that my theory is correct.

Of course the problem with the Leiden school reconstruction is that for a language to lack phonemic low vowels is very unusual. Arapaho apparently lacks phonemic low vowels, but it’s the only attested example I’ve heard of. But … I don’t have any direct answer to Kortlandt’s concerns about non-falsifiability. My own and other linguists’ concerns about the unnaturalness of a lack of phonemic low vowels also seem valid, but I don’t know how to resolve these opposing concerns. So until I can figure out a solution to this methodological problem, I’m not going to be very sure about whether PIE had phonemic low vowels and, similarly, whether the trisegmental constraint existed.


Fortson, B., 2004. Indo-European language and culture: An introduction. Oxford University Press.

Kortlandt, F., 2003. Initial laryngeals in Anatolian. Orpheus 13-14 [Gs. Rikov] (2003-04), 9-12.

Lubotsky, A., 1989. Against a Proto-Indo-European phoneme *a. The New Sound of Indo–European. Essays in Phonological Reconstruction. Berlin–New York: Mouton de Gruyter, pp. 53–66.

Ringe, D., 2006. A Linguistic History of English: Volume I, From Proto-Indo-European to Proto-Germanic. Oxford University Press.