Tag Archives: phonetics

Emics and etics


Many non-linguists probably don’t know that linguists use the words “phonetics” and “phonology” to refer to two quite different subjects. There is, admittedly, a considerable degree of interconnection between the two subjects, but most of the time the difference is reasonably stark. The best way to describe the respective subjects is by an example. Many of the languages of the world make use of speech sounds which are known as lateral approximants. The Latin letter L is dedicated to representing such sounds. In English, lateral approximants appear at the start of words like “laugh” and “lion” (and, indeed, “lateral”), in the middle of words like “pillow” and “bulk”, and at the end of words like “tell” and “saddle”. The exact sound of the lateral approximants in these words varies to a considerable degree from utterance to utterance, due to factors such as intonation, the chosen volume and the simple fact that people do not replicate precisely the same physical actions every time they utter a sound. It also varies from speaker to speaker—different people have different voices. The term “lateral approximant” therefore refers not to a particular acoustic signal but to an abstract category including some but not all acoustic signals1. The way language works makes it inevitable that when we talk about speech sounds, we talk about these abstract categories of acoustic signals rather than the particular acoustic signals themselves. This point about “lateral approximant” being an abstract category is not directly relevant to the phonetics-phonology distinction, but I bring it up because it will help clarify things later.

One specific kind of variation in the sounds of speech is especially interesting to linguists. The pronunciation of a sound such as a lateral approximant can be affected by the surrounding sounds. The differences produced thus have the potential be regular and systematic in the sense that they are reproduced from utterance to utterance: after all, the same sequence of sounds exists in each utterance. The term for this kind of variation in particular is “allophony”. A particularly stark example of allophony is exhibited by English lateral approximants2 (which is why I chose to talk about this kind of sound in particular). Before other consonants and at the end of a word (such as in “bulk”, “tell” and “saddle”), they are pronounced one way; elsewhere (such as in “laugh”, “lion” and “pillow”), they are pronounced another way. When they are pronounced in the former way, English lateral approximants are referred to as “dark Ls”, and when they are pronounced in the latter way, they are referred to as “clear Ls”. The IPA has symbols for each pronunciation: dark L is [ɫ], clear L is [l]. If you don’t already know a lot about linguistics, it’s quite likely that you never noticed that this variation existed before, even though, as may be apparent to you now that I have drawn your attention to it, the difference is quite large. You never needed to notice it, because in the English language, the distinction between clear and dark L is never used to distinguish words. That is, there are no pairs of words which consist of the same sequence of speech sounds, except that one of them has a dark L in the same position that the other has a clear L. It is therefore convenient to treat clear L and dark L as the same sound, at least when we are talking about English. We can use the simpler of the two symbols, /l/, to represent this sound, but we add slashes rather than brackets around the symbol in order to make clear that the boundary of the category of acoustic signals referred to by /l/ is determined here by the distinctions the English language (or whatever language we are talking about) makes use of in order distinguish its words from each other. It is reasonable to suppose that the concept of /l/ does actually exist in the minds of speakers of English (and that separate concepts for clear L and dark L do not exist in their minds). But even if this were not the case, the concept of /l/ would still be useful for descriptive purposes. The name for this kind of concept is “phoneme”.

There are in fact languages in which the distinction between clear L and dark L is used to distinguish words. Russian is one of them. The word мел ‘chalk’ is pronounced like “Mel”, but with a dark L. The word мель ‘shallow’ is pronounced like “Mel”, but with a clear L. For this reason, Russian is said to have an /l/ phoneme, which is spelt ль, and a /ɫ/ phoneme, which is spelt л. Note that, despite the notation, the Russian /l/ is not the same as the English /l/, any more than the Russian /ɫ/ is the same as the English /l/: the two Russian phonemes correspond to a single, more general phoneme in English.

The crucial, defining property of phonemes is that they are abstract categories of acoustic signals whose boundaries are determined by the distinctions that a particular language makes use of. They are defined in opposition to abstract categories of acoustic signals in general, whose boundaries are not necessarily determined by the distinctions a particular language makes use of; they may be determined by the distinctions a linguist finds interesting to make, for example. Such categories are referred to by the words “sound” or (in my experience, less commonly) “phone”; it has always seemed to me that “phonete” would be the most appropriate word, but nobody uses that one. In the jargon of Less Wrong, the distinction can be conveyed by saying that phonemes carve reality at the joints (for a particular language’s purposes), while sounds in general don’t necessarily do the same.

It can be helpful to shift the viewpoint a little and consider the set of all the phonemes of a particular language. This set is always finite (this is a cross-linguistic universal). One can consider the space of all conceivable acoustic signals that might be produced by a speaker of the language. The set of phonemes constitutes a particular partition of this space into a finite number of parts, and speakers of the language do not make use of any of the differences within each part when processing speech3. The parts under this partition are represented by symbols surrounded by slashes. If you choose to partition the space in a different way for some reason, you need to represent the parts by symbols surrounded by square brackets.

One final point which I want to stress is that both phonemes and sounds in general are abstract categories! People (including me, when I’m not thinking carefully enough) often describe the distinction as something along the lines of “phonemes are abstract categories of sounds”, and this can be interpreted in a way that makes it a true statement, more or less, but it doesn’t constitute an exhaustive definition: the things we refer to as “sounds” in practice are abstract categories of sounds too, so phonemes are a particular kind of abstract category of sounds.

Anyway, the difference between phonetics and phonology is this: phonetics is about sounds in general (“phonetes”), phonology is about phonemes. Or to put it another way, phonology specifically studies the categorizations of acoustic signals that make sense with respect to particular languages, and phonetics studies speech sounds under other categorizations. For example, investigation of how common it is for lateral approximants to appear in speech in both clear and dark forms comes under phonetics. But once you start investigating in addition how common it is for clear L and dark L constitute separate phonemes, you’ve got into phonology.


The concept of the distinction between phonetics and phonology can be generalised. It has proved especially fruitful in the field of anthropology.

The first person to make the analogy was a man called Kenneth Pike. As you might imagine, he was both an anthropologist and a linguist. He was quite an interesting man, actually. According to Wikipedia, he was “the foremost figure in the history of SIL” (that slightly controversial organization, the Summer Institute of Linguistics). He also invented a (non-naturalistic) conlang called Kalaba-X. And he used to give what were called “monolingual demonstrations”, where he would work with a speaker of a language unknown to him and attempt to analyze it as far as he could without having known anything about it previously, all before an audience.

Anyway, Kenneth Pike thought that it was helpful to distinguish two different approaches to studying human culture, which he called the emic and etic approaches. The emic approach is analogous to phonology. The etic approach is analogous to phonetics. The anthropologist Marvin Harris later adopted the concept and made it critical to his theory of human culture, which he called “cultural materialism”. Harris made use of the concept in a somewhat different way than Pike originally did. If you want to see Pike’s side of things, you could look at this interview with him, which contains the following amusing illustration of the extent of their differences:

[…] it took me months and months and months to try to understand Harris. Would you like to know how I got started talking with Harris? I was in Spain at the request of some philosophers and spoke there on the relationship of language to the world (Pike 1987). Afterwards they told me that Harris had been there three months previously lecturing. When they invited me, they had sent me some articles with some references to the etics and emics of Harris. That is precisely why they had invited me. Harris had said that he wished he could talk to Pike.

So later we invited Harris to Norman [Oklahoma] to lecture. I asked him to arrive at least a day early so that we could talk privately before the lecture. So we spent four hours talking prior to the lecture. Tom Headland then met him at an AAA meeting and arranged the meeting and we both agreed.

We had a difficult time trying to understand each other. We each spoke 20 minutes, with 10 minutes for reply by the other. Later, we saw each other’s materials so that before publication we could revise our own materials after having read the comments. The commentators could also revise their materials after having read the revisions of our revisions. So we had maximum time to try to understand each other. Even so, every so often I still get a little perplexed.

I have read some of Harris’s work but none of Pike’s, so my discussion is going to be informed by his conception of the emic and etic approaches in particular. Let’s begin with an illustrative example, like the one I used in part I of this post. This example is taken from Harris’s book Cultural Materialism, published in 1979.

While doing fieldwork in the southern Indian state of Kerala, Harris observed that the sex ratio among the cattle owned by farmers there was highly skewed in favour of females: for every hundred female cattle there were only sixty-seven male cattle. The farmers, when asked about this, vehemently denied having killed the excess males, as expected given the Hindu prohibition against killing cattle. They instead attributed the difference to an innate propensity towards sickness among male cattle. When they were asked why this propensity existed, some of them replied that the male cattle ate less than the females. When they were asked again why the male cattle ate less than the females, some replied that they were given less time to suck on their mother’s teats. However, there are other states in India, such as Uttar Pradesh, where the sex ratio is skewed the other way: there are more than two oxen for every cow. Moreover, these states are precisely those where the ecological and economic situation is such that there is a relatively large need for traction animals, such as oxen. Suspicious, isn’t it? What seems to be happening is that, despite the Hindu prohibition against killing cattle, the farmers of Kerala take active steps to ensure that male calves drink less milk than their sisters4.

By taking these actions, the farmers cause the male calves to die, when they otherwise would survive. Therefore, there is a sense in which their action can be called “killing”. But the crucial point is that if we call the action “killing”, then we are making use of a categorization which is etic rather than emic. That is, it is not a categorization which makes sense on the terms of the culture of the Keralan farmers. These farmers’ concept of killing does not include neglecting to feed male calves properly5. It is just the same as how the Russian /l/ covers a smaller range of acoustic signals than the English /l/. The contradiction between Hindu custom and what actually takes place must be understood in this light: it is only an apparent contradiction, because, from the emic perspective, the farmers are not doing any killing of cattle.

Note that this is not to say that the Keralan farmers would be able to get away with openly slaughtering the cattle, say, by slitting their throats with knives. The concept of “killing” is not infinitely malleable. In the same way, no language that I know of considers both [p] and [l] to be the part of the same phoneme. All we are saying here is that the extent of variation in emic categorizations is constrained to some degree by the properties of the things they categorize. In describing these constraints we make use of categorizations that are chosen for their usefulness for this descriptive purpose, and not for their coincidence with categorizations that are used by a particular culture. Such categorizations are by definition etic. This means that if the extent of emic variation is sufficiently constrained, the distinction between emic and etic becomes redundant, because all cultures will essentially categorize things the same way, and this categorization can be perfectly well understood from an etic perspective. In most areas of human culture, however, there are considerable degrees of freedom in categorization and therefore the emic-etic distinction is very helpful in understanding cross-cultural variation.

The Keralan cattle sex ratio example is an especially striking one, but another example given by Harris in the same book is, I think, more illustrative of just how helpful the emic-etic distinction can be. In Brazil, Harris collected data on the number of people living in households. But doing this required a more complicated methodology than just asking people from different households, “How many people live here?” The culture of Harris’ informants was such that they did not consider their servants members of their households, even when they were permanent residents there. And for whatever purpose he was collecting the data, Harris found it more useful to consider these servants as household members. He therefore had to ask extra questions to get information about the numbers of servants, in order to make use of an etic categorization of his own that was different from the emic categorization of his Brazilian informants. It is easy to see how not heeding this kind of thing could lead to confusion: if, for example, you collected data on the number of people in households across both Brazil and some other country in which live-in servants were counted as household members, only asking, “How many people live here?”, and used that inference to make conclusions about, say, the amount of food that the average household consumed in both countries, then these conclusions could be grossly wrong, and the data would be meaningless in that sense.

This is connected with another important consideration. One of the things which gives the social sciences a rather different epistemic flavour from the natural sciences is the ubiquitous use of concepts which are rather slippery and vaguely defined: “status”, “role”, “social class”, “tribe”, “state”, “family”, “religion”, etc. Social scientists regularly try to make these definitions more precise (that is, to “operationalize” them), but they do this in a peculiar way: it is rare for a particular operationalization to actually become accepted as the one, true definition of the concept at this level of precision, or for two different operationalizations to be given different names so that researchers can from then one treat them as separate concepts. Indeed, I think a lot of social scientists might agree that it is more useful to leave these concepts vaguely defined and use the operationalizations appropriate to the circumstances. Why is this the case? The crucial factor may be that in the social sciences, the distinction between emics and etics comes into play. Social scientists often need to talk about “status”, “tribe”, “state”, “religion”, etc. as emic concepts; that is, as conceptualised by particular cultures. Different cultures have different ideas of what these concepts are, and hence different operationalizations are appropriate for different cultures. Having a common word for each of these different operationalizations is still useful as a way of emphasizing the similarity between them (and perhaps their common origin, in some sense). And it doesn’t cause too much confusion, because the sense of the word in a particular context can be inferred from the culture being talked about in that context. It’s only when one needs to make use of etic concepts that are similar to these emic concepts that the potential for confusion becomes large. One thing that might be useful in the social sciences is to reserve some words for the emic approach and others for the etic approach. For example, we might reserve “caste” as the word for social strata as conceptualized by particular cultures6 and “class” as the word for social strata as conceptualized in other ways.

To summarize: in order to understand a culture, one must understand the concepts which the culture’s members understand their experience in terms of. Emic approaches to culture work with these concepts only. On the other hand, etic approaches to culture may work with alternative conceptual systems which clash with that of the culture being studied. The two approaches are not rivals; they lead to insights about different things and at the same time complement each other, just as phonetics and phonology are not in conflict, and are different subfields of linguistics yet at the same time are closely interconnected.

  1. ^ By using the word “category” I don’t mean to imply that membership in the category is categorical, as in a mathematical set (i.e. that every acoustic signal is either a lateral approximant or not, and there is never any need for further clarification). The category may be radial: it may be the case that one particular acoustic signal or set of acoustic signals is maximally lateral approximant-like, and acoustic signals which are less similar to these central examples are less lateral approximant-like. Or it may have some other, more complicated structure.
  2. ^ Some dialects don’t exhibit this allophony—Welsh English sometimes has clear L everywhere, and certain American English dialects have dark L everywhere. So if you can’t see this distinction in your own speech after reading the rest of the paragraph this footnote is attached to, that may be why.
  3. ^ This isn’t quite true: for example, you might notice that somebody keeps pronouncing clear L where they should pronounce dark L and conclude on that basis that they must be Welsh or foreign. You may do this subconsciously, even if you don’t know about the distinction between clear L and dark L. (The subconscious understanding of allophonic variation patterns is a large part of why people find it difficult to imitate other accents than their own: they see the problem in others, but not in themselves. Conversely, understanding phonetics and phonology is the secret to being able to imitate accents like a boss.)
  4. ^ Harris does not go into very much detail about this example. There are some things I’d like to know more about, such as why this stark difference in demand for traction animals exists between different Indian states, and how exactly the farmers are supposed to ensure that the male cattle are fed less. If anyone reading this knows of some resources that would be helpful, I encourage you to point me to them.
  5. ^ Of course, there may be a certain level or style of neglect for which it would be regarded as killing; but the means by which the differential sex ratio is produced is certainly not considered to be killing.
  6. ^ Or subcultures, of course. Basically everything that is being said here about the analysis of cultures can also be applied to more finely-grained divisions within cultures.

An example of metathesis of features

Metathesis is generally understood as sound change involving the switching in position of two segments, or sequences of segments. For example, the non-standard English word ax ‘ask’ is related to the standard form by metathesis. But there are also some arguable cases where metathesis has involved the switching of individual features of segments, rather than the segments themselves.

For example, consider the Tocharian (Toch.) words for ‘tongue’: käntu in Toch. A, kantwo in Toch. B. From these two words we can reconstruct Proto-Tocharian (PToch.) *kəntwó; note that, following the convention of Ringe 1996, denotes a high central vowel, not a mid central one as it does in the IPA. Now, the Proto-Indo-European word for ‘tongue’ is reconstructed as *dn̥ǵʰwáh₂1. The development of *-n̥- into *-ən- and *-wáh₂ into *-wo in PToch. is regular. However, the regular development of *d- in PToch. would be *ts-, and the regular development of *-ǵʰ- in PToch. would be *-k-. In other words, the expected PToch. form is *tsənkwó, not *kəntwó.

How can we explain this outcome? The first thing one might notice about the two forms is that where the PIE form has a coronal stop, the PToch. form has a dorsal stop, and where the PIE form has a dorsal stop, the PToch. form has a coronal stop. One might therefore suggest that the PToch. form comes from a metathesized version of the PIE form, with the coronal stop *d and the dorsal stop *ǵʰ having changed places: *ǵʰn̥dwáh₂. If *kəntwó is the expected outcome of PIE *ǵʰn̥dwáh₂ in PToch., then this hypothesis explains the outcome in the sense that it makes its irregularity no longer surprising; changes of metathesis are well-known exceptions to the general rule that sound change is regular.

Unfortunately, there’s a problem with this hypothesis: the regular outcome of PIE *ǵʰn̥dwáh₂ in PToch. is *kənwó, not *kəntwó, because PIE *d is regularly elided in PToch. before consonants. In fact there are no circumstances under which PIE *d becomes PToch. *t; if, by some exceptional circumstance, *d failed to be elided in *ǵʰn̥dwáh₂, it would probably become *ts, rather than *t, resulting in PToch. *kəntswó.

The solution proposed by Ringe (1996: 45-6) is to suppose that what was metathesized was not the segments *d and *ǵʰ themselves, but rather their place of articulation features. So *d became [-coronal] and [+dorsal] (like *ǵʰ), while *ǵʰ became [+coronal] and [-dorsal] (like *d). But the laryngeal features of the two segments were unchanged: *d remained [-spread glottis], and *ǵʰ remained [+spread glottis]. Therefore, the outcomes of the metathesis were *ǵ and *dʰ, respectively. And *kəntwó is, indeed, the expected outcome in PToch. of PIE *ǵn̥dʰwáh₂, because PIE *dʰ becomes *t in PToch. (There’s the interesting question of why *d becomes an affricate *ts, but its aspirated counterpart *dʰ is unaffected—but let’s not get into that.)

I did a search of the literature using Google Scholar, but I couldn’t find any other explanations of the development of PIE *dn̥ǵʰwáh₂ into PToch. *kəntwó. And I can’t think of any myself. Still, the scenario posited above is perhaps too speculative to allow us to say that metathesis of features is definitely possible. It would be better to have an example of metathesis of features which is still taking place, or which occured recently enough that we can be very sure that a metathesis of features took place. Ringe & Eska (2014: 110-111) give a couple of other examples, but both are from the development of Proto-Indo-European, and therefore not much less speculative than the scenario above. (It might be of interest that one of their examples is Oscan fangva, a cognate of PToch. *kəntwó; PIE *dʰ- becomes Oscan f-, so what seems to have happened here is the same kind of metathesis as in PToch., but with the laryngeal features switching places, rather than the place of articulation features.) Ringe & Eska do also mention that one of their daughters, at the age of 2, pronounced the word grape as [breɪk], thus exhibiting the same kind of metathesis as hypothesized for pre-PToch., i.e. with the place of articulation features being switched with each other but with the laryngeal features remaining in place.


  1. ^ Normally I would cite other reflexes of the proto-form in IE, but the reflexes of *dn̥ǵʰwáh₂ exhibit an amazing variety of irregularities, so that to do so would probably break the flow of the text too much. It has been proposed that *dn̥ǵʰwáh₂ might have been susceptible to taboo deformation, although it’s hard to imagine why the word ‘tongue’, in particular, would have been tabooed; then again, the fact that only a single IE branch (Germanic) appears to preserve the regular reflex of the root does cry out for explanation. I’m not sure how secure the reconstruction of *dn̥ǵʰwáh₂ (given by Ringe & Eska) is, although I don’t recall seeing any alternative reconstructions. The main basis for this reconstruction seems to be Gothic tuggō (which has become an n-stem, cf. gen. sg. tuggōns, but is otherwised unchanged) and Latin lingva (which has the irregular d- to l- change observed in a few other Latin words). But Old Irish tengae seems to reflect *t- rather than *d- (this is without precedent in Celtic as far as I know, but I don’t know much about Celtic), Old Prussian insuwis seems to have lost the initial consonant entirely. And as for Sanskrit jihvā́, the second syllable of this word is the perfectly regular outcome of PIE *-wáh₂, but the first syllable is either completely unrelated to PIE *dn̥ǵʰ- or has undergone more than one irregular development.


Ringe, D. A. (1996). On the Chronology of Sound Changes in Tocharian: From Proto-Indo-European to Proto-Tocharian (Vol. 1). Eisenbrauns.

Ringe, D., & Eska, J. F. (2013). Historical linguistics: toward a twenty-first century reintegration. Cambridge University Press.

The relative chronology of Grimm’s Law and Verner’s Law, part 1: aspiration in the Germanic languages.

Grimm’s Law and Verner’s Law are possibly the two most famous sound laws in historical linguistics. Despite this, there are some aspects of these two laws which we know little about. One of these is the question of the relative chronology of the sound changes described by these laws. That is: which came first? The sound changes described by Grimm’s Law, or the sound changes described by Verner’s Law? Handbooks, such as Ringe (2006), tend to ascribe to the view that those described by Grimm’s Law came first, and those described by Verner’s Law came second. But as I’m going to attempt to show, this is not a completely well-established fact.

Now, strictly speaking, Grimm’s Law and Verner’s Law describe correspondances between the sounds of Proto-Indo-European (PIE) and Proto-Germanic (PGmc); the actual sound changes that have resulted in these correspondances are another matter. The correspondances are very well-established; there is little disagreement over them. So one might well say that the question posed here is uninteresting, because we know which PGmc sounds reflect which PIE sounds in which positions, and that’s all we need to know. This is true to some extent, but I do think it is interesting in its own right to know more about the relative chronology of the sound changes that turned PIE into PGmc. Besides, our understanding of what a sound change must have been, in phonetic terms, can be affected by our understanding of its relative chronology, and this understanding may help us to understand the nature of other sound changes, or of the phonology of the language at an earlier or later date. More knowledge is usually a good thing, after all. (But it doesn’t surprise me that I can’t find much literature dealing with this issue specifically.)

With that said, let’s begin by reminding ourselves of the correspondences described by Grimm’s Law, which are listed in the table below.

Proto-Indo-European Proto-Germanic Example
*p *f PIE *pl̥h₁nós ‘full’ (cf. Skt pūrṇás, Lith. pìlnas) ↣ PGmc *fullaz (with the -az ending generalised from thematic nominals without stress on the ending) (cf. Goth. fulls, OE full [> NE full])
*t PIE *tréyes ‘three’ (cf. Skt trayaḥ, Grk treîs) > PGmc *þrīz (cf. Goth. þreis, OE þrī [↣ NE three])
*ḱ *h PIE *swéḱuros ‘father-in-law’ (cf. Skt śvaśuraḥ, OCS svekrŭ) > PGmc *swehuraz (cf. OE swēor, OHG swehur)
*k *h PIE *kóryos ‘army’ (cf. dialectal Lith. kãrias ‘army’, OIr. cuire ‘troop’) > PGmc *harjaz (cf. Goth. harjis, OE here)
*kʷ *hʷ PIE *ákʷah₂ ‘running water’ (cf. Lat. aqua ‘water’) > PGmc *ahʷō ‘river’ (cf. Goth. aƕa, OE ēa)
*b *p post-PIE *gʰreyb- ‘grab’ (cf. dialectal Lith. greĩbti [infinitive in -ti]) ↣ PGmc *grīpaną (infinitive in -aną) (cf. Goth. greipan, OE grīpan [> NE grip])
*d *t PIE *dóru ‘tree’ (cf. Skt dā́ru, Gk dóru ‘wood’), gen. sg. *dréws (cf. Skt drós) ↣ PGmc *trewą (with the neuter a-stem ending ) (cf.
*ǵ *k PIE *h₂áǵros ‘pasture’ (cf. Skt ájras ‘field’, Lat. ager) > PGmc *akraz (cf. Goth. akrs ‘field’, OE æcer ‘field’)
*g *k PIE *yugóm ‘yoke’ (cf. Skt yugám, Lat. iugum) > PGmc *juką (cf. Goth. juk, OE ġeoc)
*gʷ *kʷ PIE *gʷih₃wós ‘alive’ (cf. Skt jīváḥ, Gk zōós) > *kʷikʷaz (cf. ON kvikr, OE cwic)
*bʰ *b PIE *bʰéreti ‘(s)he is carrying’ (cf. Skt bhárati, Lat. fert) > PGmc *beraną (infinitive in -aną) (cf. Goth. baíran, OE beran)
*dʰ *d PIE *dʰédʰēm ‘I was putting’ (cf. Skt ádadhām [with the augment á-]) > PGmc *dedǭ ‘(s)he did’ (cf. OS deda, OHG teta)
*ǵʰ *g PIE *ǵʰáns ‘goose’ (cf. Gk khḗn, Lith. žąsìs [with the i-stem ending -is]) > PGmc *gans
*gʰ *g PIE *gʰóstis ‘stranger’ (cf. Lat. hostis ‘enemy’, OCS gostĭ ‘guest’) > PGmc *gastiz ‘guest’ (cf. Goth. gasts, OE ġiest)
*gʷʰ *gʷ PIE *sengʷʰ- ‘chant’ (cf. collective *songʷʰáh₂ > Gk omphḗ ‘voice of the gods’) > PGmc infinitive singʷaną ‘to sing’

Basically, the PIE voiceless unaspirated stops become fricatives, the PIE voiced unaspirated stops lose their voice, and the PIE voiced aspirated stops lose their aspiration. (But this is not quite a complete description of what happened, as we will see.)

The correspondances described by Grimm’s Law do not hold in every position. One position which they do not hold in is position after a voiceless obstruent. In this position, PIE voiceless unaspirated stops do not become fricatives in PGmc, and thus end up being reflected as the same kind of sound that the PIE voiced unaspirated stops are reflected as in other positions. Here is a full list of the clusters affected by this change, with examples.

Proto-Indo-European Proto-Germanic Example
*sp *sp PIE *spŕ̥dhs ‘contest’ (c.f. Skt spṛdh > PGmc *spurdz ‘racecourse’ (cf. Goth. spaúrds)
*st *st PIE *gʰóstis ‘stranger’ (cf. Lat. hostis ‘enemy’, OCS gostĭ ‘guest’) > PGmc *gastiz ‘guest’ (cf. Goth. gasts, OE ġiest)
*sḱ *sk PIE *sḱinédsti ‘(s)he cuts (it) off’ (cf. Skt chinátti), aor. sbjv. *skéydeti ↣ PGmc infinitive skītaną ‘to defecate’ (cf. ON skíta, OE scītan)
*sk *sk PIE *skabʰeti ‘(s)he is scratching’ (cf. Lat. scabit) > PGmc *skabidi or *skabiþi (cf. Goth. skabiþ, OE scæfþ)
*skʷ *skʷ (no examples that I know of, but this outcome can be assumed on the basis of the others)
*pt *ft PIE *kh₂ptós ‘grabbed’ (cf. Lat. captus ‘caught’) > PGmc *haftaz (cf. OE hæft, OHG haft)
*ḱt *ht PIE *oḱtṓw ‘eight’ (cf. Skt aṩṭā́u, Lat. octō) > PGmc *ahtōu (cf. Goth. ahtau, OE eahta)
*kt *ht PIE *mogʰ- ‘be able to’ (cf. Skt maghám ‘possessions’ [a-stem pl. in -ám], OCS mošti ‘I can’ [infinitive in -ti]) → nominal *mógʰtis > PGmc mahtiz ‘power’
*kʷt *ht PIE *nókʷts ‘night’ (cf. Gk núx, Lat. nox) > PGmc *nahts (cf. Goth. nahts, OHG naht)

Now, here’s an interesting observation: in English, there is a rule that voiceless stops (which are, in English, directly inherited from Proto-Germanic for the most part) are aspirated except after another voiceless obstruent: hence in my dialect of English tale is pronounced [ˈtʰejəɫ] (in my dialect, anyway) while stale is pronounced [ˈstejəɫ]. There may be other environments where there is no aspiration, depending on dialect and perhaps individual variation (for example, word-final voiceless stops can be aspirated, glottalised, unreleased or none of these things; and my own dialect tends to fricativise them, although this is one of its more idiosyncratic features). Also, it is possible for there to be different degrees of aspiration, which complicates matters further. But there is definitely no aspiration after a voiceless obstruent, and there is definitely a maximal level of aspiration when a stop is word-initial, or in the onset of a stressed syllable (as in attack).

The same rule is observable in most of the other Germanic languages. The only exception I know of is Dutch, in which voiceless stops are attributable in all positions, but this may be attributable to the influence of French. The case of German is particularly interesting, because in German, the stop t does not reflect Proto-Germanic *t; that phoneme became either z (the affricate /t͡s/) or s, depending on its position, in German due to the High German consonant shift. German t instead reflects Proto-Germanic *d, which filled the gap in the consonant system left by the loss of *t by losing its voice. Yet German t obeys the aspiration rule just like the other plosives. It is of course possible that the aspiration rule is simply something that came into effect after the separation of the Germanic languages after the devoicing of *d in the High German dialects. But in that case, it would have had to come into affect in all of the non-Dutch Germanic languages independently. Furthermore, the development of the PGmc voiceless stops in the High German consonant shift suggests that these voiceless stops were aspirated at the time of the shift, because as far as I know, the development of voiceless stops into affricates, when not motivated by palatalisation, tends to occur only when they are aspirated. After all, affrication under these circumstances can be explained as assimilation of the phonetic [h] that follows the release of voiceless aspirated stops to the place of articulation of the preceding stop; I know of no reason why unaspirated stops could be expected to turn into affricates. Lenition alone cannot account for affrication, because affricates involve just as much stricture, during their initial stop articulation, as stops.

For these reasons, I think it is more likely that this aspiration rule was inherited from Proto-Germanic into all of the Germanic languages, and that it persisted in German after the High German consonant shift, applying to the new instances of t produced by this shift. It is entirely possible for phonological rules to persist in this way. For example, Siever’s Law, the phonological rule that caused underlyingly non-syllabic PIE sonorants to become syllabic after heavy syllables, persisted into Proto-Germanic, as can be seen from the example of PIE *wr̥ǵjéti ‘(s)he is working’ (cf. Av. vərəziieiti) > *wurkijiþi > PGmc *wurkīþi (c.f. Goth. waúrkeiþ, OE wyrcþ).

Now, if you accept that the aspiration rule could have persisted in applying after the High German consonant shift, it’s no stretch to suppose that the aspiration rule took effect before the sound changes described by Grimm’s Law occured, and it persisted in applying to the new voiceless stops produced by these changes. Why would we want to suppose this? Because it allows us to neatly explain the fact that the PIE voiceless stops did not become fricatives after voiceless obstruents. Position after voiceless obstruents is exactly the position where these voiceless stops did not become aspirated by the aspiration rule. So if the aspiration rule did take effect before the sound changes described by Grimm’s Law, those sound changes applied precisely to the aspirated voiceless stops, in all positions, and not the unaspirated voiceless stops. And fricativization of voiceless aspirated stops but not voiceless unaspirated stops is well-attested from languages such as Greek (consider: theós = classical [tʰeós], modern [θɛˈɔs], treîs = classical [tré͡es], modern [ˈtris]).

Readers (if I have any?) might remember that I already proposed this scenario in an earlier post. But I don’t have any formal qualifications in linguistics (yet!), so I can’t be regarded as a reliable source. However, I did find a reassuring paper by Iverson & Salamon (1995) which proposes the same scenario. What’s more, they also provide convincing phonetic motivations for why it was the voiceless aspirated stops that became fricatives, rather than the voiceless unaspirated stops or both kinds of stops, and for why voiceless stops after voiceless obstruents failed to become aspirated in the first place.

In phonetic terms, voiceless aspirated stops are distinguished from voiceless unaspirated stops by the fact that the open state of the glottis which is required in order to produce a voiceless sound persists for a short period after the release of a voiceless aspirated stop (this might be achieved by closing the glottis more slowly, beginning with a wider glottal opening in the first place, or a combination of the two). This results in the production of a phonetic [h] sound ([h] being the sound obtained when air passes through the open glottis and out of the mouth without being obstructed in the oral tract), although this [h] sound is considered part of the aspirated stop, in phonological terms. (Languages which have a /h/ phoneme as well as voiceless aspirated stops may distinguish phonemic /h/ by its longer duration; compare the English near-minimal pairs deckhand and decad.) Hence voiceless aspirated stops endure for some time after their release. Voiceless unaspirated stops, on the other hand, do not; after their release, the glottis shifts almost immediately to the state required for the production of the next sound (or comes to rest, if a pause follows). Now, if we assume that there is a tendency for stop phonemes to have similar durations, it follows that we should expect voiceless aspirated stops to have a shorter duration up to the release, that is of the period of obstruction, than voiceless unaspirated stops. And this has been backed up by empirical observations. Because the period of obstruction is shorter in voiceless aspirated stops, there is a greater tendency for the obstruction to be weakened, for whatever reason (e.g. a natural tendency towards weakening of shorter sounds, or assimilation to neighbouring sounds whose production involves less obstruction in the oral tract). That is why the obstruction tends to be weakened from the complete closure required for a stop to mere close approximation, which results in a fricative sound.

As for the question of why the PIE voiceless unaspirated stops did not become aspirated after voiceless obstruents in pre-PGmc, Iverson & Salamon answer this by proposing that the [+spread glottis] feature (i.e. the feature of extending the period of glottal opening by closing the glottis more slowly, or beginning with a wider glottal opening in the first place, or a combination of both of these things) is shared between the constituent consonants in a cluster of two consonants in which the first is an obstruent. That means that the extended period of voicelessness, which normally manifests as the phonetic [h] sound that follows an aspirated stop, is absorbed by the second constituent consonant in the cluster. Clusters like /st/ start off being pronounced with a glottis which is as widely spread as it is at the start of an aspirated stop, and over the course of the cluster the glottis closes just as slowly; by the time the end of the cluster is reached, the glottis is closed enough that there is no discernable [h] sound at the end.

This is not the only phenomenon observable in the Germanic languages that can be explained by this proposal that the [+spread glottis] feature is shared in biconsonantal obstruent-initial clusters. In English, for example, sonorant consonants after tautosyllabic voiceless obstruents are, generally, devoiced. But they are not devoiced after tautosyllabic /s/ + voiceless stop clusters (e.g. in /spl/ and /spr/). If this devoicing is just a matter of perseverant assimilation, this is difficult to explain. But if the devoicing is the effect of the extended period of voicelessness following a voiceless aspirated stop, it is exactly what we would expect. Iverson & Salamon don’t mention if the same pattern is found in other Germanic languages, but we would expect it to be found in all of them except Dutch.

So, that’s the first exception to Grimm’s Law. The second exception is the one described by Verner’s Law. But this seems like a good point to pause for now; I’ll cover Verner’s Law, and its relative chronology, in another post. (This post hasn’t been wholly unrelated to that topic; the observation that PIE voiceless unaspirated stops probably became aspirated in most positions before the sound changes described by Grimm’s Law is going to be relevant.)

Reconstructing the sound of the Proto-Indo-European laryngeals

What were the phonetic values of the Proto-Indo-European (PIE) “laryngeals”? The natural first place to look is in the Anatolian languages, because they are the only Indo-European languages that preserve direct reflexes of the laryngeals. According to Melchert (1994), Proto-Anatolian (PA) had two laryngeals, *H and *h. *H was the regular reflex of PIE *h₂, while *h was a reflex of both *h₃, in word-initial position, and *h₂, in the positions where it was affected by Anatolian lenition (more on this below). PIE *h₁, and *h₃ outside of word-initial position, were elided everywhere in PA, as in the other Indo-European languages.

There is disagreement over the outcome of *h₃ in word-initial position; some other authors propose that it was lost in that position too, just like *h₁. The problem is that in all of the non-Anatolian Indo-European languages, PIE *h₁o, *h₂o and *h₃e have identical outcomes. Hence there are cognate sets like Hittite arta and Greek ō̂rto ‘stands’, for which the pro-word-initial-*h₃-loss advocates reconstruct PIE *h₃érto and the anti-word-initial-*h₃-loss advocates reconstruct PIE *h₁órto, and cognate sets like Hittite ḫawis and Latin ovis ‘sheep’, for which the pro-word-initial-*h₃-loss advocates reconstruct PIE *h₂ówis and the anti-word-initial-*h₃-loss advocates reconstruct PIE *h₃éwis. Hence, to find out what happened we need to look at examples where *h₃ is in word-initial position before a consonant. But such examples are few in number. Melchert (1987) gives the following two examples:

  • PIE *h₃reǵ > Hittite ḫarganāu- ‘palm, sole’ (with a *-nṓw suffix), Greek orégō ‘I stretch out [esp. hands or feet]’ (with an *-oh₂ suffix).
  • PIE *h₃pus- > Hittite ḫapuš- ‘shaft of an arrow, stalk of a reed, penis’, Greek opuíō ‘marry’ (with a *-yoh₂ suffix).

Both examples are to some extent semantically problematic, especially the second one. In fact, Kloekhorst (2005), in an entire paper on the Hittite word ḫapuš-, argues that its actual stem is ḫāpūšašš- and that its meaning when it refers to a body part is ‘shin’, not ‘penis’ (the word only appears in a text describing a ritual where a dead ram’s body parts are placed on the corresponding body parts of a sick person in order to heal them, and its meaning is deduced based on the assumption that the body parts are arranged in a logical order). I therefore find the second example unconvincing. The first example still stands, and I would accept that *h₃ is retained in word-initial position on that basis, but the conclusion is obviously very tentative.

I mentioned Anatolian lenition above. An explanation of this change will be helpful. PA had two series of stops, which I will refer to as the fortis series and lenis series. The term “fortis” just means “strong” and the term “lenis” just means “weak”; I use these terms because there is disagreement over what the nature of the contrast between these two series was (I have an opinion on what it was, but it’s not particularly relevant for this post). In general, the PIE voiceless stops became fortis stops in PA, while the PIE voiced stops and voiced aspirated stops became lenis stops in PA. However, the change known as Anatolian lenition resulted in the PIE voiceless stops becoming lenis stops in certain positions, namely: word-finally, after accented long vowels, after accented diphthongs (i.e. accented vowel + *y or *w sequences) and between unaccented vowels.

The interesting thing about Anatolian lenition is that in the exact same set of environments, PIE *h₂ was reflected as *h rather than *H. So Anatolian lenition seems to have applied to *h₂ as well as the voiceless stops. The conclusion we can draw from this is that the contrast between PA *H and PA *h was of the same nature as the contrast between the PA fortis stops and the PA lenis stops. That is, *H was fortis and *h was lenis. Since *H is the regular outcome of PIE *h₂, this indicates that PIE *h₂ was voiceless. If it was a voiced consonant, it would probably have behaved like the PIE voiced stops, rather than the PIE voiceless stops, so it would have been unaffected by Anatolian lenition.

If PIE *h₃ was preserved in word-initial position as PA *h, it is tempting to identify *h with *h₃ and say that we can also conclude that *h₃ was voiced. But things are not so straightforward with *h₃, as there is no direct evidence that the outcome of word-initial PIE *h₃ and the outcome of PIE *h₂ when affected by Anatolian lenition were ever the same phoneme. In the Anatolian languages that were written in a cuneiform syllabary (Hittite, Palaic and Luwian), the outcomes of word-initial PA *H and *h were both indicated by the same syllabograms in writing; these outcomes in this position are conventionally transcribed . In word-medial position, the two outcomes are distinguished: the outcome of PA *h is written the same way it is in word-initial position, but the outcome of PA *H is written with syllabograms indicating at the end of the syllable before syllabograms indicating at the start of the syllable. In other words, the outcome of PA *H is written as a double consonant. Naturally, it was impossible to indicate the difference between *h and *H in the same way in word-initial position, so even if *h and *H were distinguished in word-initial position we should not expect the distinction to be made in the script. By the way, the stops are written in the exact same way, as single consonants (when lenis) or double consonants (when fortis), but always as single consonants in word-initial position. Anyway, the point is, it is entirely possible that PIE *h₃ became PA *H in word-initial position, without at any point being identified with *h, if you go by the evidence from the cuneiform languages alone.

So, the idea that PIE *h₃ became *h, specifically, rather than *H or something else entirely in word-initial position is based on evidence from the much more sparsely-attested Late Anatolian languages: Lydian, Carian, Lycian, Sidetic and Pisidian. In Lydian, the little evidence we have suggests that the PA laryngeals were both elided in all positions, so this language is of no help. Melchert’s basis for supposing that PIE *h₃ became *h is Lycian. In Lycian, it appears that PA *h was lost in word-initial position (c.f. Lycian epirije- ‘sell’ = Hittite ḫappariya- ‘deliver’ < PIE *h₃ep- ‘work’ [with a suffix], c.f. Latin ops ‘ability to help’ [with a *-s suffix]), but PA *H was retained and written with the Greek letter chi, transcribed as x (c.f. Lycian xñtawati = cuneiform Luwian ḫandawati < PIE *h₂ent- ‘front’ [with a suffix], c.f. English end). If this is correct, it does suggest that the outcome of word-initial *h₃ was “weaker” than the outcome of word-initial *h₂. Having established that the evidence isn’t overwhelmingly convicing, I would nevertheless tentatively assume that *h₃ was voiced in PIE based on the Anatolian evidence.

Now that we have some idea about the nature of the contrast between *h₂ and *h₃, what can we say about their actual realisations? The best evidence we have is the use of the -containing syllabograms for indicating laryngeals in Hittite. In Akkadian, continues Proto-Semitic (PS) *ḫ, which corresponds to Arabic /χ/ and Hebrew /ħ/. There is also a PS phoneme *ḥ which corresponds to /ħ/ in both Arabic and Hebrew, while being elided in Akkadian; hence, it seems that PS *ḫ should be reconstructed as /χ/ and PS *ḥ should be reconstructed as /ħ/. The simplest hypothesis on the Akkadian pronunciation of is is that it is unchanged from Proto-Semitic and hence pronounced as /χ/ (or perhaps some other voiceless dorsal fricative).

Lycian also preserves direct outcomes of the laryngeals, but unfortunately it is hard to interpret the Lycian orthography. Lycian is written using an alphabet adapted from the early Greek alphabet. The outcome of PA *H appears in Lycian as three different phonemes, according to Melchert. One of them is written with a letter corresponding to Greek chi or psi (these two letters were graphic variants of each other in early Greek). This one is transcribed x. Another is written with a letter of unknown origin (at least to me; it doesn’t look like any Greek letter I know of). This one is transcribed q. (There is a Lycian block in Unicode but virtually no fonts support it, so I’ve included a picture of q below.) The other is written with a letter corresponding to Greek kappa, and is transcribed k. The conditioning of the split between these three phonemes is quite unclear, but Melchert proposes that PA *H becomes k between front vowels, q between word boundaries, consonants or back vowels and front vowels and x elsewhere. If this is the case, the three consonants can probably be arranged in the order k, q, x, from most palatalised to least palatalised. What the manner of articulation of these consonants was is a puzzle. In Greek, kappa represents a voiceless unaspirated stop /k/, while chi represents a voiceless aspirated stop /kʰ/; chi would be the natural choice to represent /x/ if it was a phoneme in Lycian. But in Lycian the two letters contrast in palatalisation, rather than aspiration. I think there are too many uncertainties with regards to Lycian for it to allow us to conclude anything.

Three sticks, two diagonal and one vertical, each meeting at a single point in the middle.

The Lycian letter q.

So, my best guess for the values of the laryngeals in PIE based on the Anatolian evidence alone is that *h₂ and *h₃ were voiceless and voiced uvular fricatives, respectively (they might have been velar rather than uvular). Since *h₁ was elided in Anatolian, we do not have any direct evidence for what its realisation was, but we can deduce that it was probably “weaker” than *h₂ and *h₃. This would suggest that it was a true laryngeal: either /ʔ/ or /h/.

What about the evidence from outside of Anatolian? It turns out that this evidence is difficult to reconcile with the Anatolian evidence. In all of the Indo-European languages, we see the effects of a rule known as laryngeal colouring, by which short PIE *e becomes *a adjacent to *h₂ and short PIE *e becomes *o adjacent to *h₃. The rule applies in Anatolian too, so it was probably already in effect in PIE; roots like *h₃er- ‘eagle’ should really be transribed *h₃or- (but transcribing them with *e makes it easier to describe PIE morphology). Now, there is no reason why voiceless and voiced variants phonemes with the same place and mannner of articulation should cause adjacent to *e to turn into different vowels. If *h₂ and *h₃ were really counterparts of each other differing only in voicing, we would expect *e either to become *a adjacent to either or to become *o adjacent to either.

It’s tempting to suppose that *h₃ was labialised, to account for the colouring of adjacent *e to *o. But in PIE, labialised and non-labialised velars did not contrast adjacent to *w and *u: adding the suffix *-us to the root *h₁lengʷʰ- ‘light [in weight]’ yields *h₁léngʰus (c.f. Greek elakhús ‘little’). Therefore, if *h₃ was the labialised counterpart of *h₂ we would expect to see the same neutralisation of the contrast in this environment. As far as I can tell, this neutralisation does not occur: we have PIE *gwih₃wós ‘alive’ (c.f. Latin vīvus, English quick), which definitely has *h₃, because it is related to PIE *gʷíh₃woh₂ ‘I live’ > *gʷyṓwō (not *gʷyā́wō) > Greek zṓō ‘I live’, but also *h₂wéh₁mi ‘I blow’ > *áwēmi (not *ówēmi) > Greek áēmi ‘I blow’. The contrast would not have been neutralised if *h₂ and *h₃ differed in voicing as well as labialisation, but that would be a weird combination of contrasts, and until I see a language that uses the same combination I wouldn’t consider it a possibility.

There is more that could be said about the evidence from outside of Anatolian, but this post has got long enough already. I might post more about this later, but for now all I can say about the phonetic values of the laryngeals is that they are still a mystery, although hopefully you now have some understanding of why they are such a mystery 🙂

The phonetic motivation for Grimm’s Law

…is not as clear as I had thought.

According to the standard reconstruction of Proto-Indo-European (PIE), the language had three series of stops. One of the series is thought to have consisted of voiceless unaspirated stops: *p, *t, *ḱ, *k, and *kʷ. Another is thought to have consisted of voiced unaspirated stops: *b, *d, *ǵ, *g and *gʷ. And the other is thought to have consisted of voiced aspirated stops: *bʰ, *dʰ, *ǵʰ, *gʰ and *gʷʰ. These series were preserved in this form in Sanskrit, although Sanskrit also innovated a fourth series of voiceless aspirated stops out of clusters consisting of voiceless stops followed by laryngeals. In Proto-Germanic, however, the situation is different. The PIE voiceless unaspirated stops have become voiceless fricatives; c.f. Proto-Germanic *þū (> English thou) and Sanskrit tvám ‘you (singular)’. The PIE voiced unaspirated stops have become voiceless unaspirated stops; c.f. Proto-Germanic *twō (> English two) and Sanskrit dvā́ ‘two’. And the PIE voiced aspirated stops have become voiced unaspirated stops; c.f. Proto-Germanic *meduz (> English mead) and Sanskrit mádhu ‘honey’.

I had always assumed that the change went something like this. First, the voiceless unaspirated stops fricativised, retaining their lack of voice and aspiration and becoming voiceless fricatives. Changes of stops into fricatives are common and unremarkable; phonologists disagree on whether this is due to a natural tendency towards lenition (weakening) or due to assimilation to neighbouring phonemes which are more sonorous, but there is no dispute that such a change can be phonetically motivated. The change can be written formally in terms of distinctive features as follows.

[-continuant, -voice] > [+continuant]

Second, the voiced unaspirated stops devoiced, retaining their lack of frication and aspiration and becoming voiceless unaspirated stops. This change would be unusual if it occured on its own. However, the previous change had left the language with no voiceless unaspirated stops, only voiced unaspirated stops and voiced aspirated stops. The [±voice] feature which had been used to distinguish the three original series was now redundant. For obstruents the unmarked value of this feature is [-voice] (voicelessness); that is, obstruents tend to be voiceless unless something forces them to be voiced. Therefore, it was natural for the voiced unaspirated stops to be devoiced. The change can be written formally in terms of distinctive features as follows.

[-continuant, +voice, -spread glottis] > [-voice]

Third, the voiced aspirated stops deaspirated, retaining their voicing and lack of frication and becoming voiced aspirated stops. This change was increased in likelihood due to the fact that the previous change left the language with two series of stops, one of which was voiceless unaspirated and one of which was voiced aspirated; the two features [±voice] and [±spread glottis] were therefore redundant against each other. [-spread glottis] is the unmarked value of the [±spread glottis] feature on stops, so it was natural to resolve this by deaspirating the voiced aspirated stops (although devoicing the voiced aspirated stops would have worked just as well). The change can be written formally in terms of distinctive features as follows.

[-continuant, +spread glottis] > [-spread glottis]

But there are two questions I have about this account.

  1. Why did the second change involve the voiced unaspirated stops devoicing, rather than the voiced aspirated ones? The redundancy of the [±voice] feature could have been resolved either way. In fact, why didn’t both kinds of stop devoice? Since [-voice] is unmarked for obstruents there is nothing stopping this from happening.
  2. Why did the third change involve the voiced aspirated stops deaspirating rather than devoicing? Since the [±voice] and [±spread glottis] features were redundant against each other devoicing would have worked just as well as a means of resolving the redundancy.

Now, sound change is not a deterministic process, so perhaps the answers to these questions are just that out of all of the different ways the redundancies in question could be resolved, these were the ways that were chosen, more or less at random. I am satisfied with this as the answer to question 2. In fact, with respect to question 2 it seems like deaspiration would be a more likely occurence than devoicing because it is much more common for languages to distinguish stops using the [±voice] feature than it is for them to distinguish stops using the [±spread glottis] feature; contrasts of voice are therefore probably favoured over contrasts of aspiration (although this is only a tendency, and there are plenty of languages like Mandarin Chinese where [±spread glottis] is distinctive but [±voice] is not).

But I am less satisfied with this as an answer to question 1. As I mentioned above, the redundancy of the [±voice] feature could have been solved in three different ways:

  1. devoicing of the voiced unaspirated stops, resulting in a contrast between voiceless unaspirated stops and voiced aspirated stops.
  2. devoicing of the voiced aspirated stops, resulting in a contrast between voiced unaspirated stops and voiceless aspirated stops.
  3. devoicing of both kinds of stop, resulting in a contrast between voiceless unaspirated stops and voiceless aspirated stops.

There are languages with a contrast between voiced unaspirated stops and voiceless aspirated stops, as would result from option 2. English is such a language. There are also languages with a contrast between voiceless unaspirated stops and voiceless aspirated stops, as would result from option 3. Mandarin Chinese is such a language. But I know of no language which has a contrast between voiceless unaspirated stops and voiced aspirated stops, as would result from option 1. Yet option 1 seems to have been the option that was taken. This is odd.

I think there are phonetic reasons why we would expect options 2 or 3 to be favoured over option 1. If you examine the articulatory mechanisms which are used to produce voiced aspirated stops, you can see them as half-voiced stops, closer to voiceless stops than voiced unaspirated stops (but still voiced). If you think about voiced aspirated stops in this way, option 1 is weird, because it involves change of the voiced unaspirated (i.e. fully voiced) stops directly into voiceless unaspirated stops without passing through the intermediate stage where they would be voiced aspirated (i.e. half-voiced) and end up merging with the voiced aspirated stops. If the characterisation of voiced aspirated stops as half-voiced already makes sense to you, you can skip the next few paragraphs, because I’m now going to try and explain why this is an accurate characterisation.

The first thing that I want to explain is what voiced aspirated stops are. In terms of distinctive features, they are parallel to voiceless aspirated stops. Voiced aspirated stops are [+voice] and [+spread glottis], voiceless aspirated stops are [-voice] and [+spread glottis]. But the meaning of [+spread glottis] is different in the two cases. As a feature of voiceless stops, [+spread glottis] corresponds to increased duration of the period during which the vocal folds are prevented from vibrating (normally by keeping the vocal folds apart from each other, hence the name of the feature, although reducing the airflow is also an option). The between the release of a stop and the beginning of vocal fold vibration in order to voice the following voiced phoneme is called the voice onset time (VOT). For voiceless unaspirated stops, the VOT is close to 0, while for voiceless aspirated stops the VOT is larger, so that there is an audible period after the stop has been released where air flows through the glottis but the vocal folds do not vibrate. This results in a sound being produced during this period which is in fact exactly [h], the voiceless glottal continuant (although speakers of languages which have aspirated stops don’t usually perceive the [h], instead perceiving it as part of the preceding stop).

During the production of voiced stops, the vocal folds are already vibrating (that’s what it means for a stop to be voiced). So it is impossible for voiced stops to be aspirated if aspiration is defined as having a positive VOT1. Instead, [+spread glottis] as a feature of voiced stops corresponds to the vocal folds being held further apart than is normal for voiced stops, roughly speaking. The vocal folds are still close enough that they vibrate during the production of voiced aspirated stops, so such stops are not completely voiceless, but they are closer to voiceless than voiced unaspirated stops. The kind of voice that accompanies voiced aspirated stops is called breathy voice, as opposed to the modal voice that accompanies voiced unaspirated stops. It might help to look at the following diagram, which illustrates the relationship between the degree of closure of the glottis and different kinds of voicing. The diagram is adapted from Gordon & Ladefoged (2001).

Voiceless sounds have the least glottal closure. The glottal stop has the most glottal closure (complete closure). Modally-voiced sounds have a degree of glottal closure midway between these two extremes. Breathy-voiced sounds have a degree of glottal closure between that of voiceless sounds and voiced sounds. Creaky-voiced sounds have a degree of glottal closure between that of voiced sounds and the glottal stop.

(I should note that talking about the degree of closure of the glottis as if this was a scalar variable is an oversimplification. When the vocal folds vibrate, what happens is that the glottis alternates between a state where it is more or less fully open (as when a voiceless sound is being produced) and a state where it is more or less fully closed (as when a glottal stop is being produced). Closure occurs due to tension from the laryngeal muscles and opening occurs due to pressure from the flow of air through the trachea; closure results in buildup of air below the glottis, resulting in increased pressure, while opening allows air to flow through a greater area, resulting in decreased pressure, and this is why the alternation occurs. For a given rate of flow of air, there is a maximal tension above which opening cannot occur and a minimal tension below which closure cannot occur, and in between these two extremes there is an optimal tension which results in maximal vibration; this tension is approached during the production of modally-voiced sounds. If the tension is below the optimal tension but above the minimal tension, the result is a breathy-voiced sound. If the tension is above the optimal tension but below the maximal tension, the result is a creaky-voiced sound. Alternatively, creaky-voiced sounds can be produced by having the glottis completely closed at one end, with modal voice at the other end, and breathy-voiced sounds can be produced by having the glottis open so that the vocal folds do not vibrate at one end, with modal voice at the other end. But regardless of how these sounds are produced, they sound the same, so the distinction is not important. Either way, it is still accurate to say that breathy-voiced sounds are in a position between voiceless sounds and modally-voiced sounds.)

It would be helpful to see how voiced aspirated stops behave with respect to sound change in attested languages. Unfortunately, voiced aspirated stops are rare. which limits the number of available examples. As far as I know, voiced aspirated stops are mainly found in the Indo-Aryan languages of South Asia and the Nguni languages of South Africa. In the Indo-Aryan languages the voiced aspirated stops have been inherited from PIE, or at least Vedic Sanskrit (depending on what you believe about the nature of the PIE stops), and most of them seem to have preserved them unchanged. Sinhala and Kashmiri have no voiced aspirated stops, but I don’t know and can’t find any information on what happened to them in these languages. So it seems that the voiced aspirated stops have been stable in these languages. That suggests the rarity of voiced aspirated stops is probably more due to the infrequency of sound changes that would make them phonemic rather than inherent instability. However, the mutual influence of these languages upon each other within the South Asian linguistic area might have helped preserve the voiced aspirated stops; the fact that the two most peripheral Indo-Aryan languages do not have them is perhaps suggestive that this has been the case. What about the Nguni languages? These are a tight-knit group, probably having a common origin within the last millennium, and their closest relatives such as Tswana have no voiced aspirated stops. So their voiced aspirated stops are of more recent vintage. Interestingly, Traill, Khumalo & Fridjhon (1987) have found that the Zulu voiced aspirated stops are actually voiceless, with the breathy voice occuring after the release on the following vowel. This seems like it could be the first step on a change of voiced aspirated stops into voiceless aspirated stops. But I don’t think any of this evidence is of much use in making the case that Grimm’s Law is weird. My case primarily rests on the idea that voiced aspirated stops are intermediate between voiceless and modally-voiced stops on the basis of how they are produced.

If the changes as described above are odd, maybe we should consider the possibility that the changes described by Grimm’s Law were of a different nature.

Perhaps a minor amendment can solve the problem. It is universally agreed that the Proto-Germanic voiced stops had voiced fricative allophones. It is not totally clear which environments the stops occured in and which environments the fricatives occured in, but they were all definitely stops after nasals and when geminate and fricatives after vowels and diphthongs. There are three different ways this situation might have come to be.

  1. The PIE voiced aspirated stops might have turned into voiced unaspirated stops first and then acquired fricative allophones in certain environments.
  2. The PIE voiced aspirated stops might have turned into voiced unaspirated fricatives first and then acquired stop allophones in certain environments.
  3. The PIE voiced aspirated stops might have turned into voiced unaspirated fricatives in certain environments and voiced unaspirated stops in others.

If we suppose that number 2 is the accurate description of what happened, then it is possible that the fricativisation of the PIE voiced aspirated stops occured before the devoicing of the PIE voiced unaspirated stops. This devoicing would then be perfectly natural because the PIE voiced unaspirated stops would be the only stops remaining in the language, so the marked [+voice] feature would be dropped from them. The voiced aspirated stops would probably have become voiced aspirated fricatives (i.e. breathy-voiced fricatives) initially and then these fricatives would have become modally-voiced since there would be no need for them to contrast with modally-voiced fricatives. Is it plausible that the voiceless unaspirated and voiced aspirated stops would have devoiced, but not the voiced unaspirated stops? What do these two kinds of stop have in common that the third stop lacks? If we think of voiced aspirated stops as half-voiced stops, we can describe the change as affecting all of the stops which were not fully voiced. The change is especially plausible, however, if we suppose that the PIE voiceless unaspirated stops had become aspirated before the changes described by Grimm’s Law took place. In that case, the change would affect the aspirated stops and not affect the unaspirated stops. Fricativisation of aspirated stops but not unaspirated stops is a very well-attested sound change; it happened in Greek, for example. The sequence of changes would be as follows:

[-continuant, -voice] > [+spread glottis]

[-continuant, +spread glottis] > [+continuant]

[-continuant, +voice] > [-voice]

[+continuant, +voice] > [-continuant]

(The last change would have occurred only in some environments; there are also conditioned exceptions to some of the other changes.)

Is there any other reason to think the PIE voiceless unaspirated stops might have become aspirated in Proto-Germanic before fricativising? Well, the reflexes of the Proto-Germanic voiceless stops are aspirated in the North Germanic languages and English, and have become affricates in some positions in German which suggests that they were originally aspirated; the lack of aspiration in Dutch can probably be attributed to French influence. That suggests the Proto-Germanic voiceless stops were already aspirated. Of course, these voiceless stops are the reflexes of the PIE voiced unaspirated stops, not the PIE voiceless unaspirated stops. But perhaps the rule that aspirated voiceless stops was persistent in Proto-Germanic, so that it applied to both the PIE voiceless unaspirated stops before they fricativised and the PIE voiced unaspirated stops after they were devoiced. The rule seems to have persisted into German, because German went through its own kind of replay of Grimm’s Law in which the Proto-Germanic voiceless stops became affricates or fricatives and the Proto-Germanic voiced stops were devoiced. This second consonant shift was never fully completed in most German dialects; in Standard German, for example, Proto-Germanic *b and *g were not devoiced in word-initial position. However, *d was devoiced (c.f. English daughter, German Tochter) and modern Standard German /t/ is aspirated, so, for example, Tochter is pronounced [ˈtʰɔxtɐ].

I think this is a satisfactory solution to the problem. The idea that the PIE voiced aspirated stops became fricatives first is not a new one, in fact it is probably the favoured scenario, but I have never seen it justified in this way, and Ringe (2006) suggests that the voiced aspirates changed into both stops and fricatives depending on the environment (number 3 above), which is incompatible with the scenario I have proposed here.

Finally, I think I should mention that all of this reasoning has been done on the assumption that PIE had voiceless unaspirated, voiced unaspirated and voiced aspirated stops. If you subscribe to an alternative hypothesis about the nature of the PIE stops, such as the glottalic theory, Grimm’s Law might have to be explained in a completely different way. But despite it not being as easy as it might appear at first glance, it does seem that the standard hypothesis is capable of explaining Grimm’s Law.

Whether it can explain Verner’s Law is another matter. I have always thought it a little odd that the voiceless fricatives were voiced after unaccented syllables but not after accented syllables. It is not obvious how accent and voice can affect each other. But I’ll discuss this, perhaps, in another post.


Gordon, M., & Ladefoged, P. (2001). Phonation types: a cross-linguistic overview. Journal of Phonetics, 29(4), 383-406.

Ringe, D. (2006). From Proto-Indo-European to Proto-Germanic: A Linguistic History of English: Volume I. Oxford University Press.

Traill, A., Khumalo, J. S., & Fridjhon, P. (1987). Depressing facts about Zulu. African Studies, 46(2), 255-274.

/a/ in foreign language pronunciation teaching

Often, in pronunciation guides for English speakers learning another language, they’ll instruct you to pronounce that language’s ‘a’ sound like the ‘a in father’. Now, the problem with pronunciation instructions like this is that in English especially, people have different dialects and will pronounce vowels differently. There are lots of examples I could give to show how this complicates things, but I’ll just focus on the case of ‘father’.

Sounds conventionally transcribed with ‘a’ in the Latin alphabet are usually open vowels, but may differ in their front or backness. The IPA provides three symbols for ‘a’ sounds: æ, a and ɑ (plus ɐ which is not completely open, so I’m not considering it in this discussion).

æ is prototypically the symbol for a front vowel, with some slight closing. The symbol reflects this, in that it is the closest a sound to an e. ɑ is the back counterpart, although it doesn’t imply any closing. It gets a bit more confusing with a, which doesn’t really have any specific prototype. It can be used to refer to any open vowel in the middle.

Within the range of sounds covered by a, there are two useful points. One is like æ, but without the slight closing. One is the most common vowel sound in any language: a central, open vowel. To disambiguate, the centralisation diacritic is sometimes used for the second sound, giving . But this level of precision is rarely used, and when you see that a language has a phoneme pronounced [a] you can’t be certain whether this is a front open vowel or a central open vowel.

Now, in English, we have the full range of different ‘a’ sounds. Quite universally in American and Australian English, and traditionally in British English, an ‘a’ as in ‘cat’ is pronounced [æ]. And this is the symbol used to transcribe the phoneme when talking about all English dialects.

Most English dialects have another ‘a’-like sound: the one found in ‘father’. This is always, as far as I know, further back than the ‘a’ in ‘cat’. In most North American English, this is merged with the ‘o’ sound of ‘cot’. Both are transcribed with [ɑ].

So, from a North American point of view, the instruction to pronounce ‘a’ in, say, Japanese (which has the usual central [ä]) as the ‘a’ in ‘father’ is pretty sensible, if not perfect–[ɑ] is closer to the centre than [æ] is. Especially since in many American dialects, these two phonemes are undergoing a shift where [æ] gets even closer and fronter to approach [e], and [ɑ] moves to the front and becomes pronounced more like [ä]–for these speakers ‘father’ is the perfect example.

But for non-North American dialects, it’s not perfect. For example, I, being from England, pronounce ‘father’ as [fɑːðə]–with a long [ɑ]. If someone told me to pronounce the ‘a’ in Japanese like I do in ‘father’, I might end up pronouncing ‘katakana’ as [kɑːtɑːkɑːnɑː]. Which would sound comically wrong, and take much longer to say than the actual pronunciation of [kätäkänä] (I assume so anyway, I don’t know the details of Japanese phonology).

For people like me who have an [ɑ] that’s always long, which includes most speakers from England, Australia and New Zealand, [æ] would probably be a closer approximation to the [ä] sound. Although at some point, we’d just have to accept that this is a sound with no proper equivalent in most varieties of English.

Note ‘most’. Because it actually gets worse–in Northern England, Wales, Ireland and Scotland*, the ‘a’ in cat is pronounced as [a], with varying degrees of backness–for all speakers it has none of the slight closing of [æ], and for some it’s a fully central [ä]. Plus, a full [æ] pronunciation in England is now either old-fashioned or vernacular–even in southern England, many people use the northern [a] sound. So for these speakers, there’s actually a really good approximation of [ä] in their native accent, but you’re telling them to use an different phoneme that’s much less like it!

* Although for many speakers in Scotland, the ‘a’ in ‘cat’ and ‘father’ are not distinguished, so telling them to use ‘father’ works well enough but you could just as well tell them to use the vowel in ‘cat’.

I guess the lesson to learn is: don’t rely an approximations for foreign phonemes on terms of your native language. Listen to the language’s speakers, and use the sound they use.