Some of the phonological history of English vowels, illustrated by failed rhymes in English folk songs


  • ModE = Modern English (18th century–present)
  • EModE = Early Modern English (16th–17th centuries)
  • ME = Middle English (12th–15th centuries)
  • OE = Old English (7th–11th centuries)
  • OF = Old French (9th–14th centuries)

All of this information is from the amazingly comprehensive book English Pronunciation, 1500–1700 (Volume II) by E. J. Dobson, published in 1968, which I will unfortunately have to return to the library soon.

The transcriptions of ModE pronunciations are not meant to reflect any particular accent in particular but to provide enough information to allow the pronunciation in any particular accent to be deduced given sufficient knowledge about the accent.

I use the acute accent to indicate primary stress and the grave accent to indicate secondary stress in phonetic transcriptions. I don’t like the standard IPA notation.

Oh, the holly bears a blossom
As white as the lily flower
And Mary bore sweet Jesus Christ
To be our sweet saviour
— “The Holly and the Ivy”, as sung by Shirley Collins and the Young Tradition)

In ModE flower is [fláwr], but saviour is [séjvjər]; the two words don’t rhyme. But they rhymed in EModE, because saviour was pronounced with secondary stress on its final syllable, as [séjvjə̀wr], while flower was pronounced [flə́wr].

The OF suffix -our (often spelt -or in English, as in emperor and conqueror) was pronounced /-ur/; I don’t know if it was phonetically short or long, and I don’t know whether it had any stress in OF, but it was certainly borrowed into ME as long [-ùːr] quite regularly, and regularly bore a secondary stress. In general borrowings into ME and EModE seem to have always been given a secondary stress somewhere, in a position chosen so as to minimize the number of adjacent unstressed syllables in the word. The [-ùːr] ending became [-ə̀wr] by the Great Vowel Shift in EModE, and then would have become [-àwr] in ModE, except that it (universally, as far as I know) lost its secondary stress.

English shows a consistent tendency for secondary stress to disappear over time. Native English words don’t generally have secondary stress, and you could see secondary stress as a sort of protection against the phonetic degradation brought about by English’s native vowel reduction processes, serving to prevent the word from getting too dissimilar from its foreign pronunciation too quickly. Eventually, however, the word (or really suffix, in this case, since saviour, emperor and conqueror all develop in the same way) gets fully nativized, which means loss of the secondary stress and concomitant vowel reduction. According to Dobson, words probably acquired their secondary stress-less variants more or less immediately after borrowing if they were used in ordinary speech at all, but educated speech betrays no loss of secondary stress until the 17th century (he’s speaking generally here, not just about the [-ə̀wr] suffix. Disyllabic words were quickest to lose their secondary stresses, trisyllabic words (such as saviour) a bit slower, and in words with more than three syllables secondary stress often survives to the present day (there are some dialect differences, too: the suffix -ary, as in necessary, is pronounced [-ɛ̀ri] in General American but [-əri] in RP, and often just [-ri] in more colloquial British English).

The pronunciation [-ə̀wr] is recorded as late as 1665 by Owen Price (The Vocal Organ). William Salesbury (1547–1567) spells the suffix as -wr in Welsh orthography, which could reflect a pronunciation [-ùːr] or [-ur]; the former would be the result of occasional failure of the Great Vowel Shift before final [r] as in pour, tour, while the latter would be the probable initial result of vowel reduction. John Hart (1551–1570) has [-urz] in governors. So the [-ə̀wr] pronunciation was in current use throughout the 17th century, although the reduced forms were already being used occasionally in Standard English during the 16th. Exactly when [-ə̀wr] became obsolete, I don’t know (because Dobson doesn’t cover the ModE period).

Bold General Wolfe to his men did say
Come lads and follow without delay
To yonder mountain that is so high
Don’t be down-hearted
For we’ll gain the victory
— “General Wolfe” as sung by the Copper Family

Our king went forth to Normandy
With grace and might of chivalry
The God for him wrought marvelously
Wherefore England may call and cry
— “Agincourt Carol” as sung by Maddy Prior and June Tabor

This is another case where loss of secondary stress is the culprit. The words victory, Normandy and chivalry are all borrowings of OF words ending in -ie /-i/. They would therefore have ended up having [-àj] in ModE, like cry, had it not been for the loss of the secondary stress. For the -y suffix this occurred quite early in everyday speech, already in late ME, but the secondarily stressed variants survived to be used in poetry and song for quite a while longer. Alexander Gil’s Logonomia Anglica (1619) explicitly remarks that pronouncing three-syllable, initially-stressed words ending in -y with [-ə̀j] is something that can be done in poetry but not in prose. Dobson says that apart from Gil’s, there are few mentions of this feature of poetic speech during the 17th century; we can perhaps take this an indication that it was becoming unusual to pronounce -y as [-ə̀j] even in poetry. I don’t know exactly how long the feature lasted. But General Wolfe is a folk song whose exact year of composition can be identified—1759, the date of General Wolfe’s death—so the feature seems to have been present well into the 18th century.

They’ve let him stand till midsummer day
Till he looked both pale and wan
And Barleycorn, he’s grown a beard
And so become a man
— “John Barleycorn” as sung by The Young Tradition

In ModE wan is pronounced [wɒ́n], with a different vowel from man [man]. But both of them used to have the same vowel as man; in wan the influence of the preceding [w] resulted in rounding to an o-vowel. The origins of this change are traced by Dobson to the East of England during the 15th century. There is evidence of the change from the Paston Letters (a collection of correspondence between members of the Norfolk gentry between 1422 and 1509) and the Cely Papers (a collection of correspondence between wealthy wool merchants owning estates in Essex between 1475 and 1488); the Cely Papers only exhibit the change in the word was, but the change is more extensive in the Paston Letters and in fact seems to have applied before the other labial consonants [b], [f] and [v] too for these letters’ writers.

There is no evidence of the change in Standard English until 1617, when Robert Robinson in The Art of Pronunciation notes that was, wast (as in thou wast) and what have [ɒ́] rather than [á]. The restriction of the change to unstressed function words initially, as in the Cely Papers suggests the change did indeed spread from the Eastern dialects. Later phoneticians during the 17th century record the [ɒ́] pronunciation in more and more words, but the change is not regular at this point; for example, Christopher Cooper (1687) has [ɒ́] in watch but not in wan. According to Dobson, relatively literary words such as wan and quality, not often used in everyday speech, did not reliably have [ɒ́] until the late 18th century.

Note that the change also applied after [wr] in wrath, and that words in which a velar consonant ([k], [g] or [ŋ]) followed the vowel were regular exceptions (cf. wax, wag, twang).

I’ll go down in some lonesome valley
Where no man on earth shall e’er me find
Where the pretty little small birds do change their voices
And every moment blows blusterous winds
— “The Banks of the Sweet Primroses” as sung by the Copper family

The expected ModE pronunciation of OE wind ‘wind’ would be [wájnd], resulting in homophony with find. Indeed, as far as I know, every other monosyllabic word with OE -ind has [-ájnd] in Modern English (mind, grind, bind, kind, hind, rind, …), resulting from an early ME sound change that lengthened final-syllable vowels before [nd] and various other clusters containing two voiced consonants at the same place of articulation (e.g. [-ld] as in wild).

It turns out that [wájnd] did use to be the pronunciation of wind for a long time. The OED entry for wind, written in the early 20th century, actually says that the word is still commonly taken to rhyme with [-ajnd] by “modern poets”; and Bob Copper and co. can be heard pronouncing winds as [wájndz] in their recording of “The Banks of the Sweet Primroses”. The [wínd] pronunciation reportedly became usual in Standard English only in the 17th century. It is hypothesized to be a result of backformation from the derivatives windy and windmill, in which lengthening never occurred because the [nd] cluster was not in word-final position. It is unlikely to be due to avoidance of homophony with the verb wind, because the words spent several centuries being homophonous without any issues arising.

Meeting is pleasure but parting is a grief
And an inconstant lover is worse than a thief
A thief can but rob me and take all I have
But an inconstant lover sends me to the grave
— “The Cuckoo”, as sung by Anne Briggs

As the spelling suggests, the word have used to rhyme with grave. The word was confusingly variable in form in ME, but one of its forms was [haːvə] (rhyming with grave) and another one was [havə]. The latter could have been derived from the former by vowel reduction when the word was unstressed, but this is not the only possible sources of it (e.g. another one would be analogy with the second-person singular form hast, where the a was in a closed open syllable and therefore would have been short); there does not seem to be any consistent conditioning by stress in the forms recorded by 16th- and 17th-century phoneticians, who use both forms quite often. There are some who have conditioning by stress, such as Gil, who explicitly describes [hǽːv] as the stressed form and [hav] as the unstressed form. I don’t know how long [hǽːv] (and its later forms, [hɛ́ːv], [héːv], [héjv]) remained a variant usable in Standard English, but according to the Traditional Ballad Index, “The Cuckoo” is attested no earlier than 1769.

Now the day being gone and the night coming on
Those two little babies sat under a stone
They sobbed and they sighed, they sat there and cried
Those two little babies, they laid down and died
— “Babes in the Wood” as sung by the Copper family

In EModE there was occasional shortening of stressed [ɔ́ː], so that it developed into ModE [ɒ́] rather than [ów] as normal. It is a rather irregular and mysterious process; examples of it which have survived into ModE include gone (< OE ġegān), cloth (< OE clāþ) and hot (< OE hāt). The 16th- and 17th-century phoneticians record many other words which once had variants with shortening that have not survived to the present-day, such as both, loaf, rode, broad and groat. Dobson mentions that Elisha Coles (1675–1679) “knew some variant, perhaps ŏ in stone“; the verse from “Babes in the Wood” above would be additional evidence that stone at some point by some people was pronounced as [stɒn], thus rhyming with on. As far as I know, there is no way it could have been the other way round, with on having [ɔ́ː]; the word on has always had a short vowel.

“So come riddle to me, dear mother,” he said
“Come riddle it all as one
Whether I should marry with Fair Eleanor
Or bring the brown girl home” (× 2)

“Well, the brown girl, she has riches and land
Fair Eleanor, she has none
And so I charge you do my bidding
And bring the brown girl home” (× 2)
— “Lord Thomas and Fair Eleanor” as sung by Peter Bellamy

In “Lord Thomas and Fair Eleanor”, the rhymes on the final consonant are often imperfect (although the consonants are always phonetically similar). These two verses, however, are the only ones where the vowels aren’t the same in the modern pronunciation—and there’s good reason to think they were the same once.

The words one and none are closely related. The OE word for ‘one’ was ān; the OE word for ‘none’ was nān; the OE word for ‘not’ was ne; the second is simply the result of adding the third as a prefix to the first: ‘not one’.

OE ā normally becomes ME [ɔ́ː] and then ModE [ów] in stressed syllables. If it had done that in one and none, it’d be a near-rhyme with home today, save for the difference in the final nasals’ places of articulation. Indeed, in only, which is a derivative of one with the -ly suffix added, we have [ów] in ModE. But the standard ModE pronunciations of one and none are [wʌ́n] and [nʌ́n] respectively. There are also variant forms [wɒ́n] and [nɒ́n] widespread across England. How did this happen? As usual, Dobson has answers.

The [nɒ́n] variant is the easiest one to explain, at least if we consider it in isolation from the others. It’s just the result of sporadic [ɔ́ː]-shortening before [n], as in gone (see above on the onstone rhyme). As for [nʌ́n]—well, ModE [ʌ] is the ordinary reflex of short ME [u], but there is a sporadic [úː]-shortening change in EModE besides the sporadic [ɔ́ː]-shortening one. This change is quite common and reflected in many ModE words such as blood, flood, good, book, cook, wool, although I don’t think there are any where it happens before n. So perhaps [nɔ́ːn] underwent a shift to [nóːn] somehow during the ME period, which would become [núːn] by the Great Vowel Shift. As it happens there is some evidence for such a shift in ME from occasional rhymes in ME texts, such as hoom ‘home’ with doom ‘doom’ and forsothe ‘forsooth’ with bothe ‘bothe’ in the Canterbury Tales. However, there is especially solid evidence for it in the environment after [w], in which environment most instances of ME [ɔ́ː] exhibit raising that has passed into Standard English (e.g. who < OE hwā, two < OE twā, ooze < OE wāse; woe is an exception in ModE, although it, too, is listed as a homophone of woo occasionally by Early Modern phoneticians). Note that although all these examples happen to have lost the [w], presumably by absorption into the following [úː] after the Great Vowel Shift occurred, there are words such as womb with EModE [úː] which have retained their [w], and phoneticians in the 16th and 17th centuries record pronunciations of who and two with retained [w]. So if ME [ɔ́ːn] ‘one’ somehow became [wɔ́ːn], and then raising to [wóːn] occurred due to the /w/, then this vowel would be likely to spread by analogy to its derivative [nɔ́ːn], allowing for the emergence of [wʌ́n] and [nʌ́n] in ModE. The ModE [wɒ́n] and [nɒ́n] pronunciations can be accounted for by assuming the continued existence of an un-raised [wɔ́ːn] variant in EModE alongside [wuːn].

As it happens there is a late ME tendency for [j] to be inserted before long mid front vowels and, a little less commonly, for [w] to be inserted before word-initial long mid back vowels. This glide insertion only happened in initial syllables, and usually only when the vowel was word-initial or the word began with [h]; but there are occasional examples before other consonants such as John Hart’s [mjɛ́ːn] for mean. The Hymn of the Virgin (uncertain date, 14th century), which is written in Welsh orthography and therefore more phonetically transparent than usual, evidences [j] in earth. John Hart records [j] in heal and here, besides mean, and [w] in whole (< OE hāl). 17th-century phoneticians record many instances of [j]- and [w]-insertion, giving spellings such as yer for ‘ere’, yerb for ‘herb’, wuts for ‘oats’ (this one also has shortening)—but they frequently condemn these pronunciations as “barbarous”. Christopher Cooper (1687) even mentions a pronunciation wun for ‘one’, although not without condemning it for its barbarousness. The general picture seems to be that glide insertion was widespread in dialects, and filtered into Standard English to some degree during the 16th century, but there was a strong reaction against it during the 17th century and it mostly disappeared—except, of course, in the word one, which according to Dobson the [wʌ́n] pronunciation becomes normal for around 1700. The [nʌ́n] pronunciation for ‘none’ is first recorded by William Turner in The Art of Spelling and Reading English (1710).

Finally, I should mention that sporadic [úː]-shortening is also recorded as applying to home, resulting in the pronunciation [hʌ́m]; and Turner has this pronunciation, as do many English traditional dialects. So it’s possible that the rhyme in “Lord Thomas and Fair Eleanor” is due to this change having applied to home, rather than preservation of the conservative [-ówn] forms of one and none.

The insecurity of relative chronologies

One of the things historical linguists do is reconstruct relative chronologies: statements about whether one change in a language occurred before another change in the language. For example, in the history of English there was a change which raised the Middle English (ME) mid back vowel /oː/, so that it became high /uː/: boot, pronounced /boːt/ in Middle English, is now pronounced /buːt/. There was also a change which caused ME /oː/ to be reflected as short /ʊ/ before /k/ (among other consonants), so that book is now pronounced as /bʊk/. There are two possible relative chronologies of these changes: either the first happens before the second, or the second happens before the first. Now, because English has been well-recorded in writing for centuries, because these written records of the language often contain phonetic spellings, and because they also sometimes communicate observations about the language’s phonetics, we can date these changes quite precisely. The first probably began in the thirteenth century and continued through the fourteenth, while the second took place in the seventeenth century (Minkova 2015: 253-4, 272). In this particular case, then, no linguistic reasoning is needed to infer the relative chronology. But much of if not most of the time in historical linguistics, we are not so lucky, and are dealing with the history of languages for which written records in the desired time period are much less extensive, or completely nonexistent. Relative chronologies can still be inferred under these circumstances; however, it is a methodologically trickier business. In this post, I want to point out some complications associated with inferring relative chronologies under these circumstances which I’m not sure historical linguists are always aware of.

Let’s begin by thinking again about the English example I gave above. If English was an unwritten language, could we still infer that the /oː/ > /uː/ change happened before the /oː/ > /ʊ/ change? (I’m stating these changes as correspondences between Middle English and Modern English sounds—obviously if /oː/ > /uː/ happened first then the second change would operate on /uː/ rather than /oː/.) A first answer might go something along these lines: if the /oː/ > /uː/ change in quality happens first, then the second change is /uː/ > /ʊ/, so it’s one of quantity only (long to short). On the other hand, if /oː/ > /ʊ/ happens first we have a shift of both quantity and quality at the same time, followed by a second shift of quality. The first scenario is simpler, and therefore more likely.

Admittedly, it’s only somewhat more likely than the other scenario. It’s not absolutely proven to be the correct one. Of course we never have truly absolute proofs of anything, but I think there’s a good order of magnitude or so of difference between the likelihood of /oː/ > /uː/ happening first, if we ignore the evidence of the written records and accept this argument, and the likelihood of /oː/ > /uː/ happening first once we consider the evidence of the written records.

But in fact we can’t even say it’s more likely, because the argument is flawed! The /uː/ > /ʊ/ would involve some quality adjustment, because /ʊ/ is a little lower and more central than /uː/.[1] Now, in modern European languages, at least, it is very common for minor quality differences to exist between long and short vowels, and for lengthening and shortening changes to involve the expected minor shifts in quality as well (if you like, you can think of persistent rules existing along the lines of /u/ > /ʊ/ and /ʊː/ > /uː/, which are automatically applied after any lengthening or shortening rules to “adjust” their outputs). We might therefore say that this isn’t really a substantive quality shift; it’s just a minor adjustment concomitant with the quality shift. But sometimes, these quality adjustments following lengthening and shortening changes go in the opposite direction than might be expected based on etymology. For example, when /ʊ/ was affected by open syllable lengthening in Middle English, it became /oː/, not /uː/: OE wudu > ME wood /woːd/. This is not unexpected, because the quality difference between /uː/ and /ʊ/ is (or, more accurately, can be) such that /ʊ/ is about as close in quality to /oː/ as it is to /uː/. Given that /ʊ/ could lengthen into /oː/ in Middle English, it is hardly unbelievable that /oː/ could shorten into /ʊ/ as well.

I’m not trying to say that one should go the other way here, and conclude that /oː/ > /ʊ/ happened first. I’m just trying to argue that without the evidence of the written records, no relative chronological inference can be made here—not even an insecure-but-best-guess kind of relative chronological inference. To me this is surprising and somewhat disturbing, because when I first started thinking about it I was convinced that there were good intrinsic linguistic reasons for taking the /oː/ > /uː/-first scenario as the correct one. And this is something that happens with a lot of relative chronologies, once I start thinking about them properly.

Let’s now go to an example where there really is no written evidence to help us, and where my questioning of the general relative-chronological assumption might have real force. In Greek, the following two very well-known generalizations about the reflexes of Proto-Indo-European (PIE) forms can be made:

  1. The PIE voiced aspirated stops are reflected in Greek as voiceless aspirated stops in the general environment: PIE *bʰéroh2 ‘I bear’ > Greek φέρω, PIE *dʰéh₁tis ‘act of putting’ > Greek θέσις ‘placement’, PIE *ǵʰáns ‘goose’ > Greek χήν.
  2. However, in the specific environment before another PIE voiced aspirated stop in the onset of the immediately succeeding syllable, they are reflected as voiceless unaspirated stops: PIE *bʰeydʰoh2 ‘I trust’ > Greek πείθω ‘I convince’, PIE *dʰédʰeh1mi ‘I put’ > Greek τίθημι. This is known as Grassman’s Law. PIE *s (which usually became /h/ elsewhere) is elided in the same environment: PIE *segʰoh2 ‘I hold’ > Greek ἔχω ‘I have’ (note the smooth breathing diacritic).

On the face of it, the fact that Grassman’s Law produces voiceless unaspirated stops rather than voiced ones seems to indicate that it came into effect only after the sound change that devoiced the PIE voiced aspirated stops. For otherwise, the deaspiration of these voiced aspirated stops due to Grassman’s Law would have produced voiced unaspirated stops at first, and voiced unaspirated stops inherited from PIE, as in PIE *déḱm̥ ‘ten’ > Greek δέκα, were not devoiced.

However, if we think more closely about the phonetics of the segments involved, this is not quite as obvious. The PIE voiced aspirated stops could surely be more accurately described as breathy-voiced stops, like their presumed unaltered reflexes in modern Indo-Aryan languages. Breathy voice is essentially a kind of voice which is closer to voicelessness than voice normally is: the glottis is more open (or less tightly closed, or open at one part and not at another part) than it is when a modally voiced sound is articulated. Therefore it does not seem out of the question for breathy-voiced stops to deaspirate to voiceless stops if they are going to be deaspirated, in a similar manner as ME /ʊ/ becoming /oː/ when it lengthens. Granted, I don’t know of any attested parallels for such a shift. And in Sanskrit, in which a version of Grassman’s Law also applies, breathy-voiced stops certainly deaspirate to voiced stops: PIE *dʰédʰeh1mi ‘I put’ > Sanskrit dádhāmi. So the Grassman’s Law in Greek certainly has to be different in nature (and probably an entirely separate innovation) from the Grassman’s Law in Sanskrit.[2]

Another example of a commonly-accepted relative chronology which I think is highly questionable is the idea that Grimm’s Law comes into effect in Proto-Germanic before Verner’s Law does. To be honest, I’m not really sure what the rationale is for thinking this in the first place. Ringe (2006: 93) simply asserts that “Verner’s Law must have followed Grimm’s Law, since it operated on the outputs of Grimm’s Law”. This is unilluminating: certainly Verner’s Law only operates on voiceless fricatives in Ringe’s formulation of it, but Ringe does not justify his formulation of Verner’s Law as applying only to voiceless fricatives. In general, sound changes will appear to have operated on the outputs of a previous sound change if one assumes in the first place that the previous sound change comes first: the key to justifying the relative chronology properly is to think about what alternative formulations of each sound change are required in order to make the alternative chronology (such alternative formulations can almost always be formulated), and establish the high relative unnaturalness of the sound changes thus formulated compared to the sound changes as formulable under the relative chronology which one wishes to justify.

If the PIE voiceless stops at some point became aspirated (which seems very likely, given that fricativization of voiceless stops normally follows aspiration, and given that stops immediately after obstruents, in precisely the same environment that voiceless stops are unaspirated in modern Germanic languages, are not fricativized), then Verner’s Law, formulated as voicing of obstruents in the usual environments, followed by Grimm’s Law formulated in the usual manner, accounts perfectly well for the data. A Wikipedia editor objects, or at least raises the objection, that a formulation of the sound change so that it affects the voiceless fricatives, specifically, rather than the voiceless obstruents as a whole, would be preferable—but why? What matters is the naturalness of the sound change—how likely it is to happen in a language similar to the one under consideration—not the sizes of the categories in phonetic space that it refers to. Some categories are natural, some are unnatural, and this is not well correlated with size. Both fricatives and obstruents are, as far as I am aware, about equally natural categories.

I do have one misgiving with the Verner’s Law-first scenario, which is that I’m not aware of any attested sound changes involving intervocalic voicing of aspirated stops. Perhaps voiceless aspirated stops voice less easily than voiceless unaspirated stops. But Verner’s Law is not just intervocalic voicing, of course: it also interacts with the accent (precisely, it voices obstruents only after unaccented syllables). If one thinks of it as a matter of the association of voice with low tone, rather than of lenition, then voicing of aspirated stops might be a more believable possibility.

My point here is not so much about the specific examples; I am not aiming to actually convince people to abandon the specific relative chronologies questioned here (there are likely to be points I haven’t thought of). My point is to raise these questions in order to show at what level the justification of the relative chronology needs to be done. I expect that it is deeper than many people would think. It is also somewhat unsettling that it relies so much on theoretical assumptions about what kinds of sound changes are natural, which are often not well-established.

Are there any relative chronologies which are very secure? Well, there is another famous Indo-European sound law associated with a specific relative chronology which I think is secure. This is the “law of the palatals” in Sanskrit. In Sanskrit, PIE *e, *a and *o merge as a; but PIE *k/*g/*gʰ and *kʷ/*gʷ/*gʷʰ are reflected as c/j/h before PIE *e (and *i), and k/g/gh before PIE *a and *o (and *u). The only credible explanation for this, as far as I can see, is that an earlier sound change palatalizes the dorsal stops before *e and *i, and then a later sound change merges *e with *a and *o. If *e had already merged with *a and *o by the time the palatalization occurred, then the palatalization would have to occur before *a, and it would have to be sporadic: and sporadic changes are rare, but not impossible (this is the Neogrammarian hypothesis, in its watered-down form). But what really clinches it is this: that sporadic change would have to apply to dorsal stops before a set of instances of *a which just happened to be exactly the same as the set of instances of *a which reflect PIE *e, rather than *a or *o. This is astronomically unlikely, and one doesn’t need any theoretical assumptions to see this.[3]

Now the question I really want to answer here is: what exactly are the relevant differences in this relative chronology that distinguish it from the three more questionable ones I examined above, and allow us to infer it with high confidence (based on the unlikelihood of a sporadic change happening to appear conditioned by an eliminated contrast)? It’s not clear to me what they are. Something to do with how the vowel merger counterbleeds the palatalization? (I hope this is the correct relation. The concepts of (counter)bleeding and (counter)feeding are very confusing for me.) But I don’t think this is referring to the relevant things. Whether two phonological rules / sound changes (counter)bleed or (counter)feed each other is a function of the natures of the phonological rules / sound changes; but when we’re trying to establish relative chronologies we don’t know what the natures of the phonological rules / sound changes are! That has to wait until we’ve established the relative chronologies. I think that’s why I keep failing to compute whether there is also a counterbleeding in the other relative chronologies I talked about above: the question is non-well-formed. (In case you can’t tell, I’m starting to mostly think aloud in this paragraph.) What we do actually know are the correspondences between the mother language and the daughter language[4], so an answer to the question should state it in terms of those correspondences. Anyway, I think it is best to leave it here, for my readers to read and perhaps comment with their ideas, providing I’ve managed to communicate the question properly; I might make another post on this theme sometime if I manage to work out (or read) an answer that satisfies me.

Oh, but one last thing: is establishing the security of relative chronologies that important? I think it is quite important. For a start, relative chronological assumptions bear directly on assumptions about the natures of particular sound changes, and that means they affect our judgements of which types of sound changes are likely and which are not, which are of fundamental importance in historical phonology and perhaps of considerable importance in non-historical phonology as well (under e.g. the Evolutionary Phonology framework of Blevins 2004).[5] But perhaps even more importantly, they are important in establishing genetic linguistic relationships. Ringe & Eska (2014) emphasize in their chapter on subgrouping how much less likely it is for languages to share the same sequence of changes than the same unordered set of changes, and so how the establishment of secure relative chronologies is our saving grace when it comes to establishing subgroups in cases of quick diversification (where there might be only a few innovations common to a given subgroup). This seems reasonable, but if the relative chronologies are insecure and questionable, we have a problem (and the sequence of changes they cite as establishing the validity of the Germanic subgroup certainly contains some questionable relative chronologies—for example they have all three parts of Grimm’s Law in succession before Verner’s Law, but as explained above, Verner’s Law could have come before Grimm’s; the third part of Grimm’s Law may also have not happened separately from the first).

[1] This quality difference exists in present-day English for sure—modulo secondary quality shifts which have affected these vowels in some accents—and it can be extrapolated back into seventeenth-century English with reasonable certainty using the written records. If we are ignoring the evidence of the written records, we can postulate that the quality differentiation between long /uː/ and short /ʊ/ was even more recent than the /uː/ > /ʊ/ shift (which would now be better described as an /uː/ > /u/ shift). But the point is that such quality adjustment can happen, as explained in the rest of the paragraph.

[2] There is a lot of literature on Grassman’s Law, a lot of it dealing with relative chronological issues and, in particular, the question of whether Grassman’s Law can be considered a phonological rule that was already present in PIE. I have no idea why one would want to—there are certainly PIE forms inherited in Germanic that appear to have been unaffected by Grassman’s Law, as in PIE *bʰeydʰ- > English bide; but I’ve hardly read any of this literature. My contention here is only that the generally-accepted relative chronology of Grassman’s Law and the devoicing of the PIE voiced aspirated stops can be contested.

[3] One should bear in mind some subtleties though—for example, *e and *a might have gotten very, very phonetically similar, so that they were almost merged, before the palatalization occured. If one wants to rule out that scenario, one has to appeal again to the naturalness of the hypothesized sound changes. But as long as we are talking about the full merger of *e and *a we can confidently say that it occurred after palatalization.)

[4] Actually, in practice we don’t know these with certainty either, and the correspondences we postulate to some extent are influenced by our postulations about the natures of sound changes that have occurred and their relative chronologies… but I’ve been assuming they can be established more or less independently throughout these posts, and that seems a reasonable assumption most of the time.

[5] I realize I’ve been talking about phonological changes throughout this post, but obviously there are other kinds of linguistic changes, and relative chronologies of those changes can be established too. How far the discussion in this post applies outside of the phonological domain I will leave for you to think about.


An example of metathesis of features

Metathesis is generally understood as sound change involving the switching in position of two segments, or sequences of segments. For example, the non-standard English word ax ‘ask’ is related to the standard form by metathesis. But there are also some arguable cases where metathesis has involved the switching of individual features of segments, rather than the segments themselves.

For example, consider the Tocharian (Toch.) words for ‘tongue’: käntu in Toch. A, kantwo in Toch. B. From these two words we can reconstruct Proto-Tocharian (PToch.) *kəntwó; note that, following the convention of Ringe 1996, denotes a high central vowel, not a mid central one as it does in the IPA. Now, the Proto-Indo-European word for ‘tongue’ is reconstructed as *dn̥ǵʰwáh₂1. The development of *-n̥- into *-ən- and *-wáh₂ into *-wo in PToch. is regular. However, the regular development of *d- in PToch. would be *ts-, and the regular development of *-ǵʰ- in PToch. would be *-k-. In other words, the expected PToch. form is *tsənkwó, not *kəntwó.

How can we explain this outcome? The first thing one might notice about the two forms is that where the PIE form has a coronal stop, the PToch. form has a dorsal stop, and where the PIE form has a dorsal stop, the PToch. form has a coronal stop. One might therefore suggest that the PToch. form comes from a metathesized version of the PIE form, with the coronal stop *d and the dorsal stop *ǵʰ having changed places: *ǵʰn̥dwáh₂. If *kəntwó is the expected outcome of PIE *ǵʰn̥dwáh₂ in PToch., then this hypothesis explains the outcome in the sense that it makes its irregularity no longer surprising; changes of metathesis are well-known exceptions to the general rule that sound change is regular.

Unfortunately, there’s a problem with this hypothesis: the regular outcome of PIE *ǵʰn̥dwáh₂ in PToch. is *kənwó, not *kəntwó, because PIE *d is regularly elided in PToch. before consonants. In fact there are no circumstances under which PIE *d becomes PToch. *t; if, by some exceptional circumstance, *d failed to be elided in *ǵʰn̥dwáh₂, it would probably become *ts, rather than *t, resulting in PToch. *kəntswó.

The solution proposed by Ringe (1996: 45-6) is to suppose that what was metathesized was not the segments *d and *ǵʰ themselves, but rather their place of articulation features. So *d became [-coronal] and [+dorsal] (like *ǵʰ), while *ǵʰ became [+coronal] and [-dorsal] (like *d). But the laryngeal features of the two segments were unchanged: *d remained [-spread glottis], and *ǵʰ remained [+spread glottis]. Therefore, the outcomes of the metathesis were *ǵ and *dʰ, respectively. And *kəntwó is, indeed, the expected outcome in PToch. of PIE *ǵn̥dʰwáh₂, because PIE *dʰ becomes *t in PToch. (There’s the interesting question of why *d becomes an affricate *ts, but its aspirated counterpart *dʰ is unaffected—but let’s not get into that.)

I did a search of the literature using Google Scholar, but I couldn’t find any other explanations of the development of PIE *dn̥ǵʰwáh₂ into PToch. *kəntwó. And I can’t think of any myself. Still, the scenario posited above is perhaps too speculative to allow us to say that metathesis of features is definitely possible. It would be better to have an example of metathesis of features which is still taking place, or which occured recently enough that we can be very sure that a metathesis of features took place. Ringe & Eska (2014: 110-111) give a couple of other examples, but both are from the development of Proto-Indo-European, and therefore not much less speculative than the scenario above. (It might be of interest that one of their examples is Oscan fangva, a cognate of PToch. *kəntwó; PIE *dʰ- becomes Oscan f-, so what seems to have happened here is the same kind of metathesis as in PToch., but with the laryngeal features switching places, rather than the place of articulation features.) Ringe & Eska do also mention that one of their daughters, at the age of 2, pronounced the word grape as [breɪk], thus exhibiting the same kind of metathesis as hypothesized for pre-PToch., i.e. with the place of articulation features being switched with each other but with the laryngeal features remaining in place.


  1. ^ Normally I would cite other reflexes of the proto-form in IE, but the reflexes of *dn̥ǵʰwáh₂ exhibit an amazing variety of irregularities, so that to do so would probably break the flow of the text too much. It has been proposed that *dn̥ǵʰwáh₂ might have been susceptible to taboo deformation, although it’s hard to imagine why the word ‘tongue’, in particular, would have been tabooed; then again, the fact that only a single IE branch (Germanic) appears to preserve the regular reflex of the root does cry out for explanation. I’m not sure how secure the reconstruction of *dn̥ǵʰwáh₂ (given by Ringe & Eska) is, although I don’t recall seeing any alternative reconstructions. The main basis for this reconstruction seems to be Gothic tuggō (which has become an n-stem, cf. gen. sg. tuggōns, but is otherwised unchanged) and Latin lingva (which has the irregular d- to l- change observed in a few other Latin words). But Old Irish tengae seems to reflect *t- rather than *d- (this is without precedent in Celtic as far as I know, but I don’t know much about Celtic), Old Prussian insuwis seems to have lost the initial consonant entirely. And as for Sanskrit jihvā́, the second syllable of this word is the perfectly regular outcome of PIE *-wáh₂, but the first syllable is either completely unrelated to PIE *dn̥ǵʰ- or has undergone more than one irregular development.


The relative chronology of Grimm’s Law and Verner’s Law, part 1: aspiration in the Germanic languages.

Grimm’s Law and Verner’s Law are possibly the two most famous sound laws in historical linguistics. Despite this, there are some aspects of these two laws which we know little about. One of these is the question of the relative chronology of the sound changes described by these laws. That is: which came first? The sound changes described by Grimm’s Law, or the sound changes described by Verner’s Law? Handbooks, such as Ringe (2006), tend to ascribe to the view that those described by Grimm’s Law came first, and those described by Verner’s Law came second. But as I’m going to attempt to show, this is not a completely well-established fact.

Now, strictly speaking, Grimm’s Law and Verner’s Law describe correspondances between the sounds of Proto-Indo-European (PIE) and Proto-Germanic (PGmc); the actual sound changes that have resulted in these correspondances are another matter. The correspondances are very well-established; there is little disagreement over them. So one might well say that the question posed here is uninteresting, because we know which PGmc sounds reflect which PIE sounds in which positions, and that’s all we need to know. This is true to some extent, but I do think it is interesting in its own right to know more about the relative chronology of the sound changes that turned PIE into PGmc. Besides, our understanding of what a sound change must have been, in phonetic terms, can be affected by our understanding of its relative chronology, and this understanding may help us to understand the nature of other sound changes, or of the phonology of the language at an earlier or later date. More knowledge is usually a good thing, after all. (But it doesn’t surprise me that I can’t find much literature dealing with this issue specifically.)

With that said, let’s begin by reminding ourselves of the correspondences described by Grimm’s Law, which are listed in the table below.

Proto-Indo-European Proto-Germanic Example
*p *f PIE *pl̥h₁nós ‘full’ (cf. Skt pūrṇás, Lith. pìlnas) ↣ PGmc *fullaz (with the -az ending generalised from thematic nominals without stress on the ending) (cf. Goth. fulls, OE full [> NE full])
*t PIE *tréyes ‘three’ (cf. Skt trayaḥ, Grk treîs) > PGmc *þrīz (cf. Goth. þreis, OE þrī [↣ NE three])
*ḱ *h PIE *swéḱuros ‘father-in-law’ (cf. Skt śvaśuraḥ, OCS svekrŭ) > PGmc *swehuraz (cf. OE swēor, OHG swehur)
*k *h PIE *kóryos ‘army’ (cf. dialectal Lith. kãrias ‘army’, OIr. cuire ‘troop’) > PGmc *harjaz (cf. Goth. harjis, OE here)
*kʷ *hʷ PIE *ákʷah₂ ‘running water’ (cf. Lat. aqua ‘water’) > PGmc *ahʷō ‘river’ (cf. Goth. aƕa, OE ēa)
*b *p post-PIE *gʰreyb- ‘grab’ (cf. dialectal Lith. greĩbti [infinitive in -ti]) ↣ PGmc *grīpaną (infinitive in -aną) (cf. Goth. greipan, OE grīpan [> NE grip])
*d *t PIE *dóru ‘tree’ (cf. Skt dā́ru, Gk dóru ‘wood’), gen. sg. *dréws (cf. Skt drós) ↣ PGmc *trewą (with the neuter a-stem ending ) (cf.
*ǵ *k PIE *h₂áǵros ‘pasture’ (cf. Skt ájras ‘field’, Lat. ager) > PGmc *akraz (cf. Goth. akrs ‘field’, OE æcer ‘field’)
*g *k PIE *yugóm ‘yoke’ (cf. Skt yugám, Lat. iugum) > PGmc *juką (cf. Goth. juk, OE ġeoc)
*gʷ *kʷ PIE *gʷih₃wós ‘alive’ (cf. Skt jīváḥ, Gk zōós) > *kʷikʷaz (cf. ON kvikr, OE cwic)
*bʰ *b PIE *bʰéreti ‘(s)he is carrying’ (cf. Skt bhárati, Lat. fert) > PGmc *beraną (infinitive in -aną) (cf. Goth. baíran, OE beran)
*dʰ *d PIE *dʰédʰēm ‘I was putting’ (cf. Skt ádadhām [with the augment á-]) > PGmc *dedǭ ‘(s)he did’ (cf. OS deda, OHG teta)
*ǵʰ *g PIE *ǵʰáns ‘goose’ (cf. Gk khḗn, Lith. žąsìs [with the i-stem ending -is]) > PGmc *gans
*gʰ *g PIE *gʰóstis ‘stranger’ (cf. Lat. hostis ‘enemy’, OCS gostĭ ‘guest’) > PGmc *gastiz ‘guest’ (cf. Goth. gasts, OE ġiest)
*gʷʰ *gʷ PIE *sengʷʰ- ‘chant’ (cf. collective *songʷʰáh₂ > Gk omphḗ ‘voice of the gods’) > PGmc infinitive singʷaną ‘to sing’

Basically, the PIE voiceless unaspirated stops become fricatives, the PIE voiced unaspirated stops lose their voice, and the PIE voiced aspirated stops lose their aspiration. (But this is not quite a complete description of what happened, as we will see.)

The correspondances described by Grimm’s Law do not hold in every position. One position which they do not hold in is position after a voiceless obstruent. In this position, PIE voiceless unaspirated stops do not become fricatives in PGmc, and thus end up being reflected as the same kind of sound that the PIE voiced unaspirated stops are reflected as in other positions. Here is a full list of the clusters affected by this change, with examples.

Proto-Indo-European Proto-Germanic Example
*sp *sp PIE *spŕ̥dhs ‘contest’ (c.f. Skt spṛdh > PGmc *spurdz ‘racecourse’ (cf. Goth. spaúrds)
*st *st PIE *gʰóstis ‘stranger’ (cf. Lat. hostis ‘enemy’, OCS gostĭ ‘guest’) > PGmc *gastiz ‘guest’ (cf. Goth. gasts, OE ġiest)
*sḱ *sk PIE *sḱinédsti ‘(s)he cuts (it) off’ (cf. Skt chinátti), aor. sbjv. *skéydeti ↣ PGmc infinitive skītaną ‘to defecate’ (cf. ON skíta, OE scītan)
*sk *sk PIE *skabʰeti ‘(s)he is scratching’ (cf. Lat. scabit) > PGmc *skabidi or *skabiþi (cf. Goth. skabiþ, OE scæfþ)
*skʷ *skʷ (no examples that I know of, but this outcome can be assumed on the basis of the others)
*pt *ft PIE *kh₂ptós ‘grabbed’ (cf. Lat. captus ‘caught’) > PGmc *haftaz (cf. OE hæft, OHG haft)
*ḱt *ht PIE *oḱtṓw ‘eight’ (cf. Skt aṩṭā́u, Lat. octō) > PGmc *ahtōu (cf. Goth. ahtau, OE eahta)
*kt *ht PIE *mogʰ- ‘be able to’ (cf. Skt maghám ‘possessions’ [a-stem pl. in -ám], OCS mošti ‘I can’ [infinitive in -ti]) → nominal *mógʰtis > PGmc mahtiz ‘power’
*kʷt *ht PIE *nókʷts ‘night’ (cf. Gk núx, Lat. nox) > PGmc *nahts (cf. Goth. nahts, OHG naht)

Now, here’s an interesting observation: in English, there is a rule that voiceless stops (which are, in English, directly inherited from Proto-Germanic for the most part) are aspirated except after another voiceless obstruent: hence in my dialect of English tale is pronounced [ˈtʰejəɫ] (in my dialect, anyway) while stale is pronounced [ˈstejəɫ]. There may be other environments where there is no aspiration, depending on dialect and perhaps individual variation (for example, word-final voiceless stops can be aspirated, glottalised, unreleased or none of these things; and my own dialect tends to fricativise them, although this is one of its more idiosyncratic features). Also, it is possible for there to be different degrees of aspiration, which complicates matters further. But there is definitely no aspiration after a voiceless obstruent, and there is definitely a maximal level of aspiration when a stop is word-initial, or in the onset of a stressed syllable (as in attack).

The same rule is observable in most of the other Germanic languages. The only exception I know of is Dutch, in which voiceless stops are attributable in all positions, but this may be attributable to the influence of French. The case of German is particularly interesting, because in German, the stop t does not reflect Proto-Germanic *t; that phoneme became either z (the affricate /t͡s/) or s, depending on its position, in German due to the High German consonant shift. German t instead reflects Proto-Germanic *d, which filled the gap in the consonant system left by the loss of *t by losing its voice. Yet German t obeys the aspiration rule just like the other plosives. It is of course possible that the aspiration rule is simply something that came into effect after the separation of the Germanic languages after the devoicing of *d in the High German dialects. But in that case, it would have had to come into affect in all of the non-Dutch Germanic languages independently. Furthermore, the development of the PGmc voiceless stops in the High German consonant shift suggests that these voiceless stops were aspirated at the time of the shift, because as far as I know, the development of voiceless stops into affricates, when not motivated by palatalisation, tends to occur only when they are aspirated. After all, affrication under these circumstances can be explained as assimilation of the phonetic [h] that follows the release of voiceless aspirated stops to the place of articulation of the preceding stop; I know of no reason why unaspirated stops could be expected to turn into affricates. Lenition alone cannot account for affrication, because affricates involve just as much stricture, during their initial stop articulation, as stops.

For these reasons, I think it is more likely that this aspiration rule was inherited from Proto-Germanic into all of the Germanic languages, and that it persisted in German after the High German consonant shift, applying to the new instances of t produced by this shift. It is entirely possible for phonological rules to persist in this way. For example, Siever’s Law, the phonological rule that caused underlyingly non-syllabic PIE sonorants to become syllabic after heavy syllables, persisted into Proto-Germanic, as can be seen from the example of PIE *wr̥ǵjéti ‘(s)he is working’ (cf. Av. vərəziieiti) > *wurkijiþi > PGmc *wurkīþi (c.f. Goth. waúrkeiþ, OE wyrcþ).

Now, if you accept that the aspiration rule could have persisted in applying after the High German consonant shift, it’s no stretch to suppose that the aspiration rule took effect before the sound changes described by Grimm’s Law occured, and it persisted in applying to the new voiceless stops produced by these changes. Why would we want to suppose this? Because it allows us to neatly explain the fact that the PIE voiceless stops did not become fricatives after voiceless obstruents. Position after voiceless obstruents is exactly the position where these voiceless stops did not become aspirated by the aspiration rule. So if the aspiration rule did take effect before the sound changes described by Grimm’s Law, those sound changes applied precisely to the aspirated voiceless stops, in all positions, and not the unaspirated voiceless stops. And fricativization of voiceless aspirated stops but not voiceless unaspirated stops is well-attested from languages such as Greek (consider: theós = classical [tʰeós], modern [θɛˈɔs], treîs = classical [tré͡es], modern [ˈtris]).

Readers (if I have any?) might remember that I already proposed this scenario in an earlier post. But I don’t have any formal qualifications in linguistics (yet!), so I can’t be regarded as a reliable source. However, I did find a reassuring paper by Iverson & Salamon (1995) which proposes the same scenario. What’s more, they also provide convincing phonetic motivations for why it was the voiceless aspirated stops that became fricatives, rather than the voiceless unaspirated stops or both kinds of stops, and for why voiceless stops after voiceless obstruents failed to become aspirated in the first place.

In phonetic terms, voiceless aspirated stops are distinguished from voiceless unaspirated stops by the fact that the open state of the glottis which is required in order to produce a voiceless sound persists for a short period after the release of a voiceless aspirated stop (this might be achieved by closing the glottis more slowly, beginning with a wider glottal opening in the first place, or a combination of the two). This results in the production of a phonetic [h] sound ([h] being the sound obtained when air passes through the open glottis and out of the mouth without being obstructed in the oral tract), although this [h] sound is considered part of the aspirated stop, in phonological terms. (Languages which have a /h/ phoneme as well as voiceless aspirated stops may distinguish phonemic /h/ by its longer duration; compare the English near-minimal pairs deckhand and decad.) Hence voiceless aspirated stops endure for some time after their release. Voiceless unaspirated stops, on the other hand, do not; after their release, the glottis shifts almost immediately to the state required for the production of the next sound (or comes to rest, if a pause follows). Now, if we assume that there is a tendency for stop phonemes to have similar durations, it follows that we should expect voiceless aspirated stops to have a shorter duration up to the release, that is of the period of obstruction, than voiceless unaspirated stops. And this has been backed up by empirical observations. Because the period of obstruction is shorter in voiceless aspirated stops, there is a greater tendency for the obstruction to be weakened, for whatever reason (e.g. a natural tendency towards weakening of shorter sounds, or assimilation to neighbouring sounds whose production involves less obstruction in the oral tract). That is why the obstruction tends to be weakened from the complete closure required for a stop to mere close approximation, which results in a fricative sound.

As for the question of why the PIE voiceless unaspirated stops did not become aspirated after voiceless obstruents in pre-PGmc, Iverson & Salamon answer this by proposing that the [+spread glottis] feature (i.e. the feature of extending the period of glottal opening by closing the glottis more slowly, or beginning with a wider glottal opening in the first place, or a combination of both of these things) is shared between the constituent consonants in a cluster of two consonants in which the first is an obstruent. That means that the extended period of voicelessness, which normally manifests as the phonetic [h] sound that follows an aspirated stop, is absorbed by the second constituent consonant in the cluster. Clusters like /st/ start off being pronounced with a glottis which is as widely spread as it is at the start of an aspirated stop, and over the course of the cluster the glottis closes just as slowly; by the time the end of the cluster is reached, the glottis is closed enough that there is no discernable [h] sound at the end.

This is not the only phenomenon observable in the Germanic languages that can be explained by this proposal that the [+spread glottis] feature is shared in biconsonantal obstruent-initial clusters. In English, for example, sonorant consonants after tautosyllabic voiceless obstruents are, generally, devoiced. But they are not devoiced after tautosyllabic /s/ + voiceless stop clusters (e.g. in /spl/ and /spr/). If this devoicing is just a matter of perseverant assimilation, this is difficult to explain. But if the devoicing is the effect of the extended period of voicelessness following a voiceless aspirated stop, it is exactly what we would expect. Iverson & Salamon don’t mention if the same pattern is found in other Germanic languages, but we would expect it to be found in all of them except Dutch.

So, that’s the first exception to Grimm’s Law. The second exception is the one described by Verner’s Law. But this seems like a good point to pause for now; I’ll cover Verner’s Law, and its relative chronology, in another post. (This post hasn’t been wholly unrelated to that topic; the observation that PIE voiceless unaspirated stops probably became aspirated in most positions before the sound changes described by Grimm’s Law is going to be relevant.)

The phonetic motivation for Grimm’s Law

…is not as clear as I had thought.

According to the standard reconstruction of Proto-Indo-European (PIE), the language had three series of stops. One of the series is thought to have consisted of voiceless unaspirated stops: *p, *t, *ḱ, *k, and *kʷ. Another is thought to have consisted of voiced unaspirated stops: *b, *d, *ǵ, *g and *gʷ. And the other is thought to have consisted of voiced aspirated stops: *bʰ, *dʰ, *ǵʰ, *gʰ and *gʷʰ. These series were preserved in this form in Sanskrit, although Sanskrit also innovated a fourth series of voiceless aspirated stops out of clusters consisting of voiceless stops followed by laryngeals. In Proto-Germanic, however, the situation is different. The PIE voiceless unaspirated stops have become voiceless fricatives; c.f. Proto-Germanic *þū (> English thou) and Sanskrit tvám ‘you (singular)’. The PIE voiced unaspirated stops have become voiceless unaspirated stops; c.f. Proto-Germanic *twō (> English two) and Sanskrit dvā́ ‘two’. And the PIE voiced aspirated stops have become voiced unaspirated stops; c.f. Proto-Germanic *meduz (> English mead) and Sanskrit mádhu ‘honey’.

I had always assumed that the change went something like this. First, the voiceless unaspirated stops fricativised, retaining their lack of voice and aspiration and becoming voiceless fricatives. Changes of stops into fricatives are common and unremarkable; phonologists disagree on whether this is due to a natural tendency towards lenition (weakening) or due to assimilation to neighbouring phonemes which are more sonorous, but there is no dispute that such a change can be phonetically motivated. The change can be written formally in terms of distinctive features as follows.

[-continuant, -voice] > [+continuant]

Second, the voiced unaspirated stops devoiced, retaining their lack of frication and aspiration and becoming voiceless unaspirated stops. This change would be unusual if it occured on its own. However, the previous change had left the language with no voiceless unaspirated stops, only voiced unaspirated stops and voiced aspirated stops. The [±voice] feature which had been used to distinguish the three original series was now redundant. For obstruents the unmarked value of this feature is [-voice] (voicelessness); that is, obstruents tend to be voiceless unless something forces them to be voiced. Therefore, it was natural for the voiced unaspirated stops to be devoiced. The change can be written formally in terms of distinctive features as follows.

[-continuant, +voice, -spread glottis] > [-voice]

Third, the voiced aspirated stops deaspirated, retaining their voicing and lack of frication and becoming voiced aspirated stops. This change was increased in likelihood due to the fact that the previous change left the language with two series of stops, one of which was voiceless unaspirated and one of which was voiced aspirated; the two features [±voice] and [±spread glottis] were therefore redundant against each other. [-spread glottis] is the unmarked value of the [±spread glottis] feature on stops, so it was natural to resolve this by deaspirating the voiced aspirated stops (although devoicing the voiced aspirated stops would have worked just as well). The change can be written formally in terms of distinctive features as follows.

[-continuant, +spread glottis] > [-spread glottis]

But there are two questions I have about this account.

  1. Why did the second change involve the voiced unaspirated stops devoicing, rather than the voiced aspirated ones? The redundancy of the [±voice] feature could have been resolved either way. In fact, why didn’t both kinds of stop devoice? Since [-voice] is unmarked for obstruents there is nothing stopping this from happening.
  2. Why did the third change involve the voiced aspirated stops deaspirating rather than devoicing? Since the [±voice] and [±spread glottis] features were redundant against each other devoicing would have worked just as well as a means of resolving the redundancy.

Now, sound change is not a deterministic process, so perhaps the answers to these questions are just that out of all of the different ways the redundancies in question could be resolved, these were the ways that were chosen, more or less at random. I am satisfied with this as the answer to question 2. In fact, with respect to question 2 it seems like deaspiration would be a more likely occurence than devoicing because it is much more common for languages to distinguish stops using the [±voice] feature than it is for them to distinguish stops using the [±spread glottis] feature; contrasts of voice are therefore probably favoured over contrasts of aspiration (although this is only a tendency, and there are plenty of languages like Mandarin Chinese where [±spread glottis] is distinctive but [±voice] is not).

But I am less satisfied with this as an answer to question 1. As I mentioned above, the redundancy of the [±voice] feature could have been solved in three different ways:

  1. devoicing of the voiced unaspirated stops, resulting in a contrast between voiceless unaspirated stops and voiced aspirated stops.
  2. devoicing of the voiced aspirated stops, resulting in a contrast between voiced unaspirated stops and voiceless aspirated stops.
  3. devoicing of both kinds of stop, resulting in a contrast between voiceless unaspirated stops and voiceless aspirated stops.

There are languages with a contrast between voiced unaspirated stops and voiceless aspirated stops, as would result from option 2. English is such a language. There are also languages with a contrast between voiceless unaspirated stops and voiceless aspirated stops, as would result from option 3. Mandarin Chinese is such a language. But I know of no language which has a contrast between voiceless unaspirated stops and voiced aspirated stops, as would result from option 1. Yet option 1 seems to have been the option that was taken. This is odd.

I think there are phonetic reasons why we would expect options 2 or 3 to be favoured over option 1. If you examine the articulatory mechanisms which are used to produce voiced aspirated stops, you can see them as half-voiced stops, closer to voiceless stops than voiced unaspirated stops (but still voiced). If you think about voiced aspirated stops in this way, option 1 is weird, because it involves change of the voiced unaspirated (i.e. fully voiced) stops directly into voiceless unaspirated stops without passing through the intermediate stage where they would be voiced aspirated (i.e. half-voiced) and end up merging with the voiced aspirated stops. If the characterisation of voiced aspirated stops as half-voiced already makes sense to you, you can skip the next few paragraphs, because I’m now going to try and explain why this is an accurate characterisation.

The first thing that I want to explain is what voiced aspirated stops are. In terms of distinctive features, they are parallel to voiceless aspirated stops. Voiced aspirated stops are [+voice] and [+spread glottis], voiceless aspirated stops are [-voice] and [+spread glottis]. But the meaning of [+spread glottis] is different in the two cases. As a feature of voiceless stops, [+spread glottis] corresponds to increased duration of the period during which the vocal folds are prevented from vibrating (normally by keeping the vocal folds apart from each other, hence the name of the feature, although reducing the airflow is also an option). The between the release of a stop and the beginning of vocal fold vibration in order to voice the following voiced phoneme is called the voice onset time (VOT). For voiceless unaspirated stops, the VOT is close to 0, while for voiceless aspirated stops the VOT is larger, so that there is an audible period after the stop has been released where air flows through the glottis but the vocal folds do not vibrate. This results in a sound being produced during this period which is in fact exactly [h], the voiceless glottal continuant (although speakers of languages which have aspirated stops don’t usually perceive the [h], instead perceiving it as part of the preceding stop).

During the production of voiced stops, the vocal folds are already vibrating (that’s what it means for a stop to be voiced). So it is impossible for voiced stops to be aspirated if aspiration is defined as having a positive VOT1. Instead, [+spread glottis] as a feature of voiced stops corresponds to the vocal folds being held further apart than is normal for voiced stops, roughly speaking. The vocal folds are still close enough that they vibrate during the production of voiced aspirated stops, so such stops are not completely voiceless, but they are closer to voiceless than voiced unaspirated stops. The kind of voice that accompanies voiced aspirated stops is called breathy voice, as opposed to the modal voice that accompanies voiced unaspirated stops. It might help to look at the following diagram, which illustrates the relationship between the degree of closure of the glottis and different kinds of voicing. The diagram is adapted from Gordon & Ladefoged (2001).

Voiceless sounds have the least glottal closure. The glottal stop has the most glottal closure (complete closure). Modally-voiced sounds have a degree of glottal closure midway between these two extremes. Breathy-voiced sounds have a degree of glottal closure between that of voiceless sounds and voiced sounds. Creaky-voiced sounds have a degree of glottal closure between that of voiced sounds and the glottal stop.

(I should note that talking about the degree of closure of the glottis as if this was a scalar variable is an oversimplification. When the vocal folds vibrate, what happens is that the glottis alternates between a state where it is more or less fully open (as when a voiceless sound is being produced) and a state where it is more or less fully closed (as when a glottal stop is being produced). Closure occurs due to tension from the laryngeal muscles and opening occurs due to pressure from the flow of air through the trachea; closure results in buildup of air below the glottis, resulting in increased pressure, while opening allows air to flow through a greater area, resulting in decreased pressure, and this is why the alternation occurs. For a given rate of flow of air, there is a maximal tension above which opening cannot occur and a minimal tension below which closure cannot occur, and in between these two extremes there is an optimal tension which results in maximal vibration; this tension is approached during the production of modally-voiced sounds. If the tension is below the optimal tension but above the minimal tension, the result is a breathy-voiced sound. If the tension is above the optimal tension but below the maximal tension, the result is a creaky-voiced sound. Alternatively, creaky-voiced sounds can be produced by having the glottis completely closed at one end, with modal voice at the other end, and breathy-voiced sounds can be produced by having the glottis open so that the vocal folds do not vibrate at one end, with modal voice at the other end. But regardless of how these sounds are produced, they sound the same, so the distinction is not important. Either way, it is still accurate to say that breathy-voiced sounds are in a position between voiceless sounds and modally-voiced sounds.)

It would be helpful to see how voiced aspirated stops behave with respect to sound change in attested languages. Unfortunately, voiced aspirated stops are rare. which limits the number of available examples. As far as I know, voiced aspirated stops are mainly found in the Indo-Aryan languages of South Asia and the Nguni languages of South Africa. In the Indo-Aryan languages the voiced aspirated stops have been inherited from PIE, or at least Vedic Sanskrit (depending on what you believe about the nature of the PIE stops), and most of them seem to have preserved them unchanged. Sinhala and Kashmiri have no voiced aspirated stops, but I don’t know and can’t find any information on what happened to them in these languages. So it seems that the voiced aspirated stops have been stable in these languages. That suggests the rarity of voiced aspirated stops is probably more due to the infrequency of sound changes that would make them phonemic rather than inherent instability. However, the mutual influence of these languages upon each other within the South Asian linguistic area might have helped preserve the voiced aspirated stops; the fact that the two most peripheral Indo-Aryan languages do not have them is perhaps suggestive that this has been the case. What about the Nguni languages? These are a tight-knit group, probably having a common origin within the last millennium, and their closest relatives such as Tswana have no voiced aspirated stops. So their voiced aspirated stops are of more recent vintage. Interestingly, Traill, Khumalo & Fridjhon (1987) have found that the Zulu voiced aspirated stops are actually voiceless, with the breathy voice occuring after the release on the following vowel. This seems like it could be the first step on a change of voiced aspirated stops into voiceless aspirated stops. But I don’t think any of this evidence is of much use in making the case that Grimm’s Law is weird. My case primarily rests on the idea that voiced aspirated stops are intermediate between voiceless and modally-voiced stops on the basis of how they are produced.

If the changes as described above are odd, maybe we should consider the possibility that the changes described by Grimm’s Law were of a different nature.

Perhaps a minor amendment can solve the problem. It is universally agreed that the Proto-Germanic voiced stops had voiced fricative allophones. It is not totally clear which environments the stops occured in and which environments the fricatives occured in, but they were all definitely stops after nasals and when geminate and fricatives after vowels and diphthongs. There are three different ways this situation might have come to be.

  1. The PIE voiced aspirated stops might have turned into voiced unaspirated stops first and then acquired fricative allophones in certain environments.
  2. The PIE voiced aspirated stops might have turned into voiced unaspirated fricatives first and then acquired stop allophones in certain environments.
  3. The PIE voiced aspirated stops might have turned into voiced unaspirated fricatives in certain environments and voiced unaspirated stops in others.

If we suppose that number 2 is the accurate description of what happened, then it is possible that the fricativisation of the PIE voiced aspirated stops occured before the devoicing of the PIE voiced unaspirated stops. This devoicing would then be perfectly natural because the PIE voiced unaspirated stops would be the only stops remaining in the language, so the marked [+voice] feature would be dropped from them. The voiced aspirated stops would probably have become voiced aspirated fricatives (i.e. breathy-voiced fricatives) initially and then these fricatives would have become modally-voiced since there would be no need for them to contrast with modally-voiced fricatives. Is it plausible that the voiceless unaspirated and voiced aspirated stops would have devoiced, but not the voiced unaspirated stops? What do these two kinds of stop have in common that the third stop lacks? If we think of voiced aspirated stops as half-voiced stops, we can describe the change as affecting all of the stops which were not fully voiced. The change is especially plausible, however, if we suppose that the PIE voiceless unaspirated stops had become aspirated before the changes described by Grimm’s Law took place. In that case, the change would affect the aspirated stops and not affect the unaspirated stops. Fricativisation of aspirated stops but not unaspirated stops is a very well-attested sound change; it happened in Greek, for example. The sequence of changes would be as follows:

[-continuant, -voice] > [+spread glottis]

[-continuant, +spread glottis] > [+continuant]

[-continuant, +voice] > [-voice]

[+continuant, +voice] > [-continuant]

(The last change would have occurred only in some environments; there are also conditioned exceptions to some of the other changes.)

Is there any other reason to think the PIE voiceless unaspirated stops might have become aspirated in Proto-Germanic before fricativising? Well, the reflexes of the Proto-Germanic voiceless stops are aspirated in the North Germanic languages and English, and have become affricates in some positions in German which suggests that they were originally aspirated; the lack of aspiration in Dutch can probably be attributed to French influence. That suggests the Proto-Germanic voiceless stops were already aspirated. Of course, these voiceless stops are the reflexes of the PIE voiced unaspirated stops, not the PIE voiceless unaspirated stops. But perhaps the rule that aspirated voiceless stops was persistent in Proto-Germanic, so that it applied to both the PIE voiceless unaspirated stops before they fricativised and the PIE voiced unaspirated stops after they were devoiced. The rule seems to have persisted into German, because German went through its own kind of replay of Grimm’s Law in which the Proto-Germanic voiceless stops became affricates or fricatives and the Proto-Germanic voiced stops were devoiced. This second consonant shift was never fully completed in most German dialects; in Standard German, for example, Proto-Germanic *b and *g were not devoiced in word-initial position. However, *d was devoiced (c.f. English daughter, German Tochter) and modern Standard German /t/ is aspirated, so, for example, Tochter is pronounced [ˈtʰɔxtɐ].

I think this is a satisfactory solution to the problem. The idea that the PIE voiced aspirated stops became fricatives first is not a new one, in fact it is probably the favoured scenario, but I have never seen it justified in this way, and Ringe (2006) suggests that the voiced aspirates changed into both stops and fricatives depending on the environment (number 3 above), which is incompatible with the scenario I have proposed here.

Finally, I think I should mention that all of this reasoning has been done on the assumption that PIE had voiceless unaspirated, voiced unaspirated and voiced aspirated stops. If you subscribe to an alternative hypothesis about the nature of the PIE stops, such as the glottalic theory, Grimm’s Law might have to be explained in a completely different way. But despite it not being as easy as it might appear at first glance, it does seem that the standard hypothesis is capable of explaining Grimm’s Law.

Whether it can explain Verner’s Law is another matter. I have always thought it a little odd that the voiceless fricatives were voiced after unaccented syllables but not after accented syllables. It is not obvious how accent and voice can affect each other. But I’ll discuss this, perhaps, in another post.


