Tag Archives: phonology

That’s OK, but this’s not OK?

Here’s something peculiar I noticed the other day about the English language.

The word is (the third-person singular present indicative form of the verb be) can be ‘contracted’ with a preceding noun phrase, so that it is reduced to an enclitic form -‘s. This can happen after pretty much any noun phrase, no matter how syntactically complex:

(1) he’s here

/(h)iːz ˈhiːə/[1]

(2) everyone’s here

/ˈevriːwɒnz ˈhiːə/

(3) ten years ago’s a long time

/ˈtɛn ˈjiːəz əˈgəwz ə ˈlɒng ˈtajm/

However, one place where this contraction can’t happen is immediately after the proximal demonstrative this. This is strange, because it can certainly happen after the distal demonstrative that, and one wouldn’t expect these two very similar words to behave so differently:

(4) that’s funny
/ˈðats ˈfʊniː/

(5) *this’s funny

There is a complication here which I’ve kind of skirted over, though. Sure, this’s funny is unacceptable in writing. But what would it sound like, if it was said in speech? Well, the -’s enclitic form of is can actually be realized on the surface in a couple of different ways, depending on the phonological environment. You might already have noticed that it’s /-s/ in example (4), but /-z/ in examples (1)-(3). This allomorphy (variation in phonological form) is reminiscent of the allomorphy in the plural suffix: cats is /ˈkats/, dogs is /ˈdɒgz/, horses is /ˈhɔːsɪz/. In fact the distribution of the /-s/ and /-z/ realizations of -‘s is exactly the same as for the plural suffix: /-s/ appears after voiceless non-sibilant consonants and /-z/ appears after vowels and voiced non-sibilant consonants. The remaining environment, the environment after sibilants, is the environment in which the plural suffix appears as /-ɪz/. And this environment turns out to be exactly the same environment in which -’s is unacceptable in writing. Here are a couple more examples:

(6) *a good guess’s worth something (compare: the correct answer’s worth something)

(7) *The Clash’s my favourite band (compare: Pearl Jam’s my favourite band)

Now, if -‘s obeys the same rules as the plural suffix then we’d expect it to be realized as /-ɪz/ in this environment. However… this is exactly the same sequence of segments that the independent word is is realized as when it is unstressed. One might therefore suspect that in sentences like (8) below, the morpheme graphically represented as the independent word is is actually the enclitic -‘s, it just happens to be realized the same as the independent word is and therefore not distinguished from it in writing. (Or, perhaps it would be more elegant to say that the contrast between enclitic and independent word is neutralized in this environment.)

(8) The Clash is my favourite band

Well, this is (*this’s) a very neat explanation, and if you do a Google search for “this’s” that’s pretty much the explanation you’ll find given to the various other confused people who have gone to websites like English Stack Exchange to ask why this’s isn’t a word. Unfortunately, I think it can’t be right.

The problem is, there are some accents of English, including mine, which have /-əz/ rather than /-ɪz/ in the allomorph of the plural suffix that occurs after sibilants, while at the same time pronouncing unstressed is as /ɪz/ rather than /əz/. (There are minimal pairs, such as peace is upon us /ˈpiːsɪz əˈpɒn ʊz/ and pieces upon us /ˈpiːsəz əˈpɒn ʊz/.) If the enclitic form of is does occur in (8) then we’d expect it to be realized as /əz/ in these accents, just like the plural suffix would be in the same environment. This is not what happens, at least in my own accent: (8) can only have /ɪz/. Indeed, it can be distinguished from the minimally contrastive NP (9):

(9) The Clash as my favourite band

In fact this problem exists in more standard accents of English as well, because is is not the only word ending in /-z/ which can end a contraction. The third-person singular present indicative of the verb have, has, can also be contracted to -‘s, and it exhibits the expected allomorphy between voiceless and voiced realizations:

(10) it’s been a while /ɪts ˈbiːn ə ˈwajəl/

(11) somebody I used to know’s disappeared /ˈsʊmbɒdiː aj ˈjuːst tə ˈnəwz dɪsəˈpijəd/

But like is it does not contract, at least in writing, after sibilants, although it may drop the initial /h-/ whenever it’s unstressed:

(12) this has gone on long enough /ˈðɪs (h)əz gɒn ɒn lɒng əˈnʊf/

I am not a native speaker of RP, so, correct me if I’m wrong. But I would be very surprised if any native speaker of RP would ever pronounce has as /ɪz/ in sentences like (12).

What’s going on? I actually do think the answer given above—that this’s isn’t written because it sounds exactly the same as this is—is more or less correct, but it needs elaboration. Such an answer can only be accepted if we in turn accept that the plural -s, the reduced -‘s form of is and the reduced -‘s form of has do not all exhibit the same allomorph in the environment after sibilants. The reduced form of is has the allomorph /-ɪz/ in all accents, except in those such as Australian English in which unstressed /ɪ/ merges with schwa. The reduced form of has has the allomorph /-əz/ in all accents. The plural suffix has the allomorph /-ɪz/ in some accents, but /-əz/ in others, including some in which /ɪ/ is not merged completely with schwa and in particular is not merged with schwa in the unstressed pronunciation of is.

Introductory textbooks on phonology written in the English language are very fond of talking about the allomorphy of the English plural suffix. In pretty much every treatment I’ve seen, it’s assumed that /-z/ is the underlying form, and /-s/ and /-əz/ are derived by phonological rules of voicing assimilation and epenthesis respectively, with the voicing assimilation crucially coming after the epenthesis (otherwise we’d have an additional allomorph /-əs/ after voiceless sibilants, while /-əz/ would only appear after voiced sibilants). This is the best analysis when the example is taken in isolation, because positing an epenthesis rule allows the phonological rules to be assumed to be productive across the entire lexicon of English. If such a fully productive deletion rule were posited, then it would be impossible to account for the pronunciation of a word like Paulas (‘multiple people named Paula’) with /-əz/ on the surface, whose underlying form would be exactly the same, phonologically, as Pauls (‘multiple people named Paul’). (This example only works if your plural suffix post-sibilant allomorph is /-əz/ rather than /-ɪz/, but a similar example could probably be exhibited in the other case.) One could appeal to the differing placement of the morpheme boundary but this is unappealing.

However, the assumption that a single epenthesis rule operating between sibilants is productive across the entire English lexicon has to be given up, because ‘s < is and ‘s < has have different allomorphs after sibilants! Either they are accounted for by two different lexically-conditioned epenthesis rules (which is a very unappealing model) or the allomorphs with the vowels are actually the underlying ones, and the allomorphs without the vowels are produced by a not phonologically-conditioned but at least (sort of) morphologically-conditioned deletion rule that elides fully reduced unstressed vowels (/ə/, /ɪ/) before word-final obstruents. This rule only applies in inflectional suffixes (e.g. lettuce and orchid are immune), and even there it does not apply unconditionally because the superlative suffix -est is immune to it. But this doesn’t bother me too much. One can argue that the superlative is kind of a marginal inflectional category, when you put it in the company of the plural, the possessive and the past tense.

A nice thing about the synchronic rule I’m proposing here is that it’s more or less exactly the same as the diachronic rule that produced the whole situation in the first place. The Old English nom./acc. pl., gen. sg., and past endings were, respectively, -as, -es, -aþ and -ede. In Middle English final schwa was elided unconditionally in absolute word-final position, while in word-final unstressed syllables where it was followed by a single obstruent it was gradually eliminated by a process of lexical diffusion from inflectional suffix to inflectional suffix, although “a full coverage of the process in ME is still outstanding” (Minkova 2013: 231). Even the superlative suffix was reduced to /-st/ by many speakers for a time, but eventually the schwa-ful form of this suffix prevailed.

I don’t see this as a coincidence. My inclination, when it comes to phonology, is to see the historical phonology as essential for understanding the present-day phonology. Synchronic phonological alternations are for the most part caused by sound changes, and trying to understand them without reference to these old sound changes is… well, you may be able to make some progress but it seems like it’d be much easier to make progress more quickly by trying to understand the things that cause them—sound changes—at the same time. This is a pretty tentative paragraph, and I’m aware I’d need a lot more elaboration to make a convincing case for this stance. But this is where my inclination is headed.

[1] The transcription system is the one which I prefer to use for my own accent of English.


Minkova, D. 2013. A Historical Phonology of English. Edinburgh University Press.


A language with no word-initial consonants

I was having a look at some of the squibs in Linguistic Inquiry today, which are often fairly interesting (and have the redeeming quality that, when they’re not interesting, they’re at least short), and there was an especially interesting one in the April 1970 (second ever) issue by R. M. W. Dixon (Dixon 1970) which I’d like to write about for the benefit of those who can’t access it.

In Olgolo, a variety of Kunjen spoken on the Cape York Peninsula, there appears to been a sound change that elided consonants in initial position. That is, not just consonants of a particular variety, but all consonants. As a result of this change, every word in the language begins with a vowel. Examples (transcriptions in IPA):

  • *báma ‘man’ > áb͡ma
  • *míɲa ‘animal’ > íɲa
  • *gúda ‘dog’ > úda
  • *gúman ‘thigh’ > úb͡man
  • *búŋa ‘sun’ > úg͡ŋa
  • *bíːɲa ‘aunt’ > íɲa
  • *gúyu ‘fish’ > úyu
  • *yúgu ‘tree, wood’ > úgu

(Being used to the conventions of Indo-Europeanists, I’m a little disturbed by the fact that Dixon doesn’t identify the linguistic proto-variety to which the proto-forms in these examples belong, nor does he cite cognates to back up his reconstruction. But I presume forms very similar to the proto-forms are found in nearby Paman languages. In fact, I know for a fact that the Uradhi word for ‘tree’ is /yúku/ because Black (1993) mentions it by way of illustrating the remarkable Uradhi phonological rule which inserts a phonetic [k] or [ŋ] after every vowel in utterance-final position. Utterance-final /yúku/ is by this means realized as [yúkuk] in Uradhi.)

(The pre-stopped nasals in some of these words [rather interesting segments in of themselves, but fairly widely attested, see the Wikipedia article] have arisen due to a sound change occurring before the word-initial consonant elision sound change, which pre-stopped nasals immediately after word-initial syllables containing a stop or *w followed by a short vowel. This would have helped mitigate the loss of contrast resulting from the word-initial consonant elision sound change a little, but only a little, and between e.g. the words for ‘animal’ and ‘aunt’ homophony was not averted because ‘aunt’ had an originally long vowel [which was shortened in Olgolo by yet another sound change].)

Dixon says Olgolo is the only language he’s heard of in which there are no word-initial consonants, although it’s possible that more have been discovered since 1970. However, there is a caveat to this statement: there are monoconsonantal prefixes that can be optionally added to most nouns, so that they have an initial consonant on the surface. There are at least four of these prefixes, /n-/, /w-/, /y-/ and /ŋ-/; however, every noun seems to only take a single one of these prefixes, so we can regard these three forms as lexically-conditioned allomorphs of a single prefix. The conditioning is in fact more precisely semantic: roughly, y- is added to nouns denoting fish, n- is added to nouns denoting other animals, and w- is added to nouns denoting various inanimates. The prefixes therefore identify ‘noun classes’ in a sense (although these are probably not noun classes in a strict sense because Dixon gives no indication that there are any agreement phenomena which involve them). The prefix ŋ- was only seen on a one word, /ɔ́jɟɔba/ ~ /ŋɔ́jɟɔba/ ‘wild yam’ and might be added to all nouns denoting fruits and vegetables, given that most Australian languages with noun classes have a noun class for fruits and vegetables, but there were no other such nouns in the dataset (Dixon only noticed the semantic conditioning after he left the field, so he didn’t have a chance to elicit any others). It must be emphasized, however, that these prefixes are entirely optional, and every noun which can have a prefix added to it can also be pronounced without the prefix. In addition some nouns, those denoting kin and body parts, appear to never take a prefix, although possibly this is just a limitation of the dataset given that their taking a prefix would be expected to be optional in any case. And words other than nouns, such as verbs, don’t take these prefixes at all.

Dixon hypothesizes that the y- and n- prefixes are reduced forms of /úyu/ ‘fish’ and /íɲa/ ‘animal’ respectively, while w- may be from /úgu/ ‘tree, wood’ or just an “unmarked” initial consonant (it’s not clear what Dixon means by this). These derivations are not unquestionable (for example, how do we get from /-ɲ-/ to /n-/ in the ‘animal’ prefix?) But it’s very plausible that the prefixes do originate in this way, even if the exact antedecent words are difficult to identify, because similar origins have been identified for noun class prefixes in other Australian languages (Dixon 1968, as cited by Dixon 1970). Just intuitively, it’s easy to see how nouns might come to be ever more frequently replaced by compounds of the dependent original noun and a term denoting a superset; cf. English koala ~ koala bear, oak ~ oak tree, gem ~ gemstone. In English these compounds are head-final but in other languages (e.g. Welsh) they are often head-initial, and presumably this would have to be the case in pre-Olgolo in order for the head elements to grammaticalize into noun class prefixes. The fact that the noun class prefixes are optional certainly suggests that the system is very much incipient, and still developing, and therefore of recent origin.

It might therefore be very interesting to see how the Olgolo language has changed after a century or so; we might be able to examine a noun class system as it develops in real time, with all of our modern equipment and techniques available to record each stage. It would also be very interesting to see how quickly this supposedly anomalous state of every word beginning with a vowel (in at least one of its freely-variant forms) is eliminated, especially since work on Australian language phonology since 1970 has established many other surprising findings about Australian syllable structure, including a language where the “basic’ syllable type appears to be VC rather than CV (Breen & Pensalfini 1999). Indeed, since Dixon wrote this paper 46 years ago Olgolo might have changed considerably already. Unfortunately, it might have changed in a somewhat more disappointing way. None of the citations of Dixon’s paper recorded by Google Scholar seem to examine Olgolo any further, and the documentation on Kunjen (the variety which includes Olgolo as a subvariety) recorded in the Australian Indigenous Languages Database isn’t particularly overwhelming. I can’t find a straight answer as to whether Kunjen is extinct today or not (never mind the Olgolo variety), but Dixon wasn’t optimistic about its future in 1970:

It would be instructive to study the development of Olgolo over the next few generations … Unfortunately, the language is at present spoken by only a handful of old people, and is bound to become extinct in the next decade or so.


Black, P. 1993 (post-print). Unusual syllable structure in the Kurtjar language of Australia. Retrieved from http://espace.cdu.edu.au/view/cdu:42522 on 26 September 2016.

Breen, G. & Pensalfini, R. 1999. Arrernte: A Language with No Syllable Onsets. Linguistic Inquiry 30 (1): 1-25.

Dixon, R. M. W. 1968. Noun Classes. Lingua 21: 104-125.

Dixon, R. M. W. 1970. Olgolo Syllable Structure and What They Are Doing about It. Linguistic Inquiry 1 (2): 273-276.

Vowel-initial and vowel-final roots in Proto-Indo-European

A remarkable feature of Proto-Indo-European (PIE) is the restrictiveness of the constraints on its root structure. It is generally agreed that all PIE roots were monosyllabic, containing a single underlying vowel. In fact, the vast majority of the roots are thought to have had a single underlying vowel, namely *e. (Some scholars reconstruct a small number of roots with underlying *a rather than *e; others do not, and reconstruct underlying *e in every PIE root.) It is also commonly supposed that every root had at least one consonant on either side of its vowel; in other words, that there were no roots which began or ended with the vowel (Fortson 2004: 71).

I have no dispute with the first of these constraints; though it is very unusual, it is not too difficult to understand in connection with the PIE ablaut system, and the Semitic languages are similar with their triconsonantal, vowel-less roots. However, I think the other constraint, the one against vowel-initial and vowel-final roots, is questionable. In order to talk about it with ease and clarity, it helps to have a name for it: I’m going to call it the trisegmental constraint, because it amounts to the constraint that every PIE root contains at least three segments: the vowel, a consonant before the vowel, and a consonant after the vowel.

The first thing that might make one suspicious of the trisegmental constraint is that it isn’t actually attested in any IE language, as far as I know. English has vowel-initial roots (e.g. ask) and vowel-final roots (e.g. fly); so do Latin, Greek and Sanskrit (cf. S. aj- ‘drive’, G. ἀγ- ‘lead’, L. ag- ‘do’), and L. dō-, G. δω-, S. dā-, all meaning ‘give’). And for much of the early history of IE studies, nobody suspected the constraint’s existence: the PIE roots meaning ‘drive’ and ‘give’ were reconstructed as *aǵ- and *dō-, respectively, with an initial vowel in the case of the former and a final vowel in the case of the latter.

It was only with the development of the laryngeal theory that the reconstruction of the trisegmental constraint became possible. The initial motivation for the laryngeal theory was to simplify the system of ablaut reconstructed for PIE. I won’t go into the motivation in detail here; it’s one of the most famous developments in IE studies so a lot of my readers are probably familiar with it already, and it’s not hard to find descriptions of it. The important thing to know, if you want to understand what I’m talking about here, is that the laryngeal theory posits the existence of three consonants in PIE which are called laryngeals and written *h1, *h2 and *h3, and that these laryngeals can be distinguished by their effects on adjacent vowels: *h2 turns adjacent underlying *e into *a and *h3 turns adjacent underlying *e into *o. In all of the IE languages other than the Anatolian languages (which are all extinct, and which records of were only discovered in the 20th century), the laryngeals are elided in pretty much everywhere, and their presence is only discernable from their effects on adjacent segments. Note that as well as changing the quality (“colouring”) underlying *e, they also lengthen preceding vowels. And between consonants, they are reflected as vowels, but as different vowels in different languages: in Greek *h1, *h2, *h3 become ε, α, ο respectively, in Sanskrit all three become i, in the other languages all three generally became a.

So, the laryngeal theory allowed the old reconstructions *aǵ- and *dō- to be replaced by *h2éǵ- and *deh3– respectively, which conform to the trisegmental constraint. In fact every root reconstructed with an initial or final vowel by the 19th century IEists could be reconstructed with an initial or final laryngeal instead. Concrete support for some of these new reconstructions with laryngeals came from the discovery of the Anatolian languages, which preserved some of the laryngeals in some positions as consonants. For example, the PIE word for ‘sheep’ was reconstructed as *ówis on the basis of the correspondence between L. ovis, G. ὄϊς, S. áviḥ, but the discovery of the Cuneiform Luwian cognate ḫāwīs confirmed without a doubt that the root must have originally begun with a laryngeal (although it is still unclear whether that laryngeal was *h2, preceding *o, or *h3, preceding *e).

There are also indirect ways in which the presence of a laryngeal can be evidenced. Most obviously, if a root exhibits the irregular ablaut alternations in the early IE languages which the laryngeal theory was designed to explain, then it should be reconstructed with a laryngeal in order to regularize the ablaut alternation in PIE. In the case of *h2eǵ-, for example, there is an o-grade derivative of the root, *h2oǵmos ‘drive’ (n.), which can be reconstructed on the evidence of Greek ὄγμος ‘furrow’ (Ringe 2006: 14). This shows that the underlying vowel of the root must have been *e, because (given the laryngeal theory) the PIE ablaut system did not involve alternations of *a with *o, only alternations of *e, *ō or ∅ (that is, the absence of the segment) with *o. But this underlying *e is reflected as if it was *a in all the e-grade derivatives of *h2eǵ- attested in the early IE languages (e.g. in the 3sg. present active indicative forms S. ájati, G. ἀγει, L. agit). In order to account for this “colouring” we must reconstruct *h2 next to the *e. Similar considerations allow us to be reasonably sure that *deh3– also contained a laryngeal, because the e-grade root is reflected as if it had *ō (S. dádāti, G. δίδωσι) and the zero-grade root in *dh3tós ‘given’ exhibits the characteristic reflex of interconsonantal *h3 (S. -ditáḥ, G. dotós, L. datus).

But in many cases there does not seem to be any particular evidence for the reconstruction of the initial or final laryngeal other than the assumption that the trisegmental constraint existed. For example, *h1éḱwos ‘horse’ could just as well be reconstructed as *éḱwos, and indeed this is what Ringe (2006) does. Likewise, there is no positive evidence that the root *muH- of *muHs ‘mouse’ (cf. S. mūṣ, G. μῦς, L. mūs) contained a laryngeal: it could just as well be *mū-. Both of the roots *(h1)éḱ- and *muH/ū- are found, as far as I know, in these stems only, so there is no evidence for the existence of the laryngeal from ablaut. It is true that PIE has no roots that can be reconstructed as ending in a short vowel, and this could be seen as evidence for at least a constraint against vowel-final roots, because if all the apparent vowel-final roots actually had a vowel + laryngeal sequence, that would explain why the vowel appears to be long. But this is not the only possible explanation: there could just be a constraint against roots containing a light syllable. This seems like a very natural constraint. Although the circumstances aren’t exactly the same—because English roots appear without inflectional endings in most circumstances, while PIE roots mostly didn’t—the constraint is attested in English: short unreduced vowels like that of cat never appear in root-final (or word-final) position; only long vowels, diphthongs and schwa can appear in word-final position, and schwa does not appear in stressed syllables.

It could be argued that the trisegmental constraint simplifies the phonology of PIE, and therefore it should be assumed to exist pending the discovery of positive evidence that some root does begin or end with a vowel. It simplifies the phonology in the sense that it reduces the space of phonological forms which can conceivably be reconstructed. But I don’t think this is the sense of “simple” which we should be using to decide which hypotheses about PIE are better. I think a reconstructed language is simpler to the extent that it is synchronically not unusual, and that the existence of whatever features it has that are synchronically unusual can be justified by explanations of features in the daughter languages by natural linguistic changes (in other words, both synchronic unusualness and diachronic unusualness must be taken into account). The trisegmental constraint seems to me synchronically unusual, because I don’t know of any other languages that have something similar, although I have not made any systematic investigation. And as far as I know there are no features of the IE languages which the trisegmental constraint helps to explain.

(Perhaps a constraint against vowel-initial roots, at least, would be more natural if PIE had a phonemic glottal stop, because people, or at least English and German speakers, tend to insert subphonemic glottal stops before vowels immediately preceded by a pause. Again, I don’t know if there are any cross-linguistic studies which support this. The laryngeal *h1 is often conjectured to be a glottal stop, but it is also often conjectured to be a glottal fricative; I don’t know if there is any reason to favour either conjecture over the other.)

I think something like this disagreement over what notion of simplicity is most important in linguistic reconstruction underlies some of the other controversies in IE phonology. For example, the question of whether PIE had phonemic *a and *ā: the “Leiden school” says it didn’t, accepting the conclusions of Lubotsky (1989), most other IEists say it did. The Leiden school reconstruction certainly reduces the space of phonological forms which can be reconstructed in PIE and therefore might be better from a falsifiability perspective. Kortlandt (2003) makes this point with respect to a different (but related) issue, the sound changes affecting initial laryngeals in Anatolian:

My reconstructions … are much more constrained [than the ones proposed by Melchert and Kimball] because I do not find evidence for more than four distinct sequences (three laryngeals before *-e- and neutralization before *-o-) whereas they start from 24 possibilites (zero and three laryngeals before three vowels *e, *a, *o which may be short or long, cf. Melchert 1994: 46f., Kimball 1999: 119f.). …

Any proponent of a scientific theory should indicate the type of evidence required for its refutation. While it is difficult to see how a theory which posits *H2 for Hittite h- and a dozen other possible reconstructions for Hittite a- can be refuted, it should be easy to produce counter-evidence for a theory which allows no more than four possibilities … The fact that no such counter-evidence has been forthcoming suggests that my theory is correct.

Of course the problem with the Leiden school reconstruction is that for a language to lack phonemic low vowels is very unusual. Arapaho apparently lacks phonemic low vowels, but it’s the only attested example I’ve heard of. But … I don’t have any direct answer to Kortlandt’s concerns about non-falsifiability. My own and other linguists’ concerns about the unnaturalness of a lack of phonemic low vowels also seem valid, but I don’t know how to resolve these opposing concerns. So until I can figure out a solution to this methodological problem, I’m not going to be very sure about whether PIE had phonemic low vowels and, similarly, whether the trisegmental constraint existed.


Fortson, B., 2004. Indo-European language and culture: An introduction. Oxford University Press.

Kortlandt, F., 2003. Initial laryngeals in Anatolian. Orpheus 13-14 [Gs. Rikov] (2003-04), 9-12.

Lubotsky, A., 1989. Against a Proto-Indo-European phoneme *a. The New Sound of Indo–European. Essays in Phonological Reconstruction. Berlin–New York: Mouton de Gruyter, pp. 53–66.

Ringe, D., 2006. A Linguistic History of English: Volume I, From Proto-Indo-European to Proto-Germanic. Oxford University Press.

Emics and etics


Many non-linguists probably don’t know that linguists use the words “phonetics” and “phonology” to refer to two quite different subjects. There is, admittedly, a considerable degree of interconnection between the two subjects, but most of the time the difference is reasonably stark. The best way to describe the respective subjects is by an example. Many of the languages of the world make use of speech sounds which are known as lateral approximants. The Latin letter L is dedicated to representing such sounds. In English, lateral approximants appear at the start of words like “laugh” and “lion” (and, indeed, “lateral”), in the middle of words like “pillow” and “bulk”, and at the end of words like “tell” and “saddle”. The exact sound of the lateral approximants in these words varies to a considerable degree from utterance to utterance, due to factors such as intonation, the chosen volume and the simple fact that people do not replicate precisely the same physical actions every time they utter a sound. It also varies from speaker to speaker—different people have different voices. The term “lateral approximant” therefore refers not to a particular acoustic signal but to an abstract category including some but not all acoustic signals1. The way language works makes it inevitable that when we talk about speech sounds, we talk about these abstract categories of acoustic signals rather than the particular acoustic signals themselves. This point about “lateral approximant” being an abstract category is not directly relevant to the phonetics-phonology distinction, but I bring it up because it will help clarify things later.

One specific kind of variation in the sounds of speech is especially interesting to linguists. The pronunciation of a sound such as a lateral approximant can be affected by the surrounding sounds. The differences produced thus have the potential be regular and systematic in the sense that they are reproduced from utterance to utterance: after all, the same sequence of sounds exists in each utterance. The term for this kind of variation in particular is “allophony”. A particularly stark example of allophony is exhibited by English lateral approximants2 (which is why I chose to talk about this kind of sound in particular). Before other consonants and at the end of a word (such as in “bulk”, “tell” and “saddle”), they are pronounced one way; elsewhere (such as in “laugh”, “lion” and “pillow”), they are pronounced another way. When they are pronounced in the former way, English lateral approximants are referred to as “dark Ls”, and when they are pronounced in the latter way, they are referred to as “clear Ls”. The IPA has symbols for each pronunciation: dark L is [ɫ], clear L is [l]. If you don’t already know a lot about linguistics, it’s quite likely that you never noticed that this variation existed before, even though, as may be apparent to you now that I have drawn your attention to it, the difference is quite large. You never needed to notice it, because in the English language, the distinction between clear and dark L is never used to distinguish words. That is, there are no pairs of words which consist of the same sequence of speech sounds, except that one of them has a dark L in the same position that the other has a clear L. It is therefore convenient to treat clear L and dark L as the same sound, at least when we are talking about English. We can use the simpler of the two symbols, /l/, to represent this sound, but we add slashes rather than brackets around the symbol in order to make clear that the boundary of the category of acoustic signals referred to by /l/ is determined here by the distinctions the English language (or whatever language we are talking about) makes use of in order distinguish its words from each other. It is reasonable to suppose that the concept of /l/ does actually exist in the minds of speakers of English (and that separate concepts for clear L and dark L do not exist in their minds). But even if this were not the case, the concept of /l/ would still be useful for descriptive purposes. The name for this kind of concept is “phoneme”.

There are in fact languages in which the distinction between clear L and dark L is used to distinguish words. Russian is one of them. The word мел ‘chalk’ is pronounced like “Mel”, but with a dark L. The word мель ‘shallow’ is pronounced like “Mel”, but with a clear L. For this reason, Russian is said to have an /l/ phoneme, which is spelt ль, and a /ɫ/ phoneme, which is spelt л. Note that, despite the notation, the Russian /l/ is not the same as the English /l/, any more than the Russian /ɫ/ is the same as the English /l/: the two Russian phonemes correspond to a single, more general phoneme in English.

The crucial, defining property of phonemes is that they are abstract categories of acoustic signals whose boundaries are determined by the distinctions that a particular language makes use of. They are defined in opposition to abstract categories of acoustic signals in general, whose boundaries are not necessarily determined by the distinctions a particular language makes use of; they may be determined by the distinctions a linguist finds interesting to make, for example. Such categories are referred to by the words “sound” or (in my experience, less commonly) “phone”; it has always seemed to me that “phonete” would be the most appropriate word, but nobody uses that one. In the jargon of Less Wrong, the distinction can be conveyed by saying that phonemes carve reality at the joints (for a particular language’s purposes), while sounds in general don’t necessarily do the same.

It can be helpful to shift the viewpoint a little and consider the set of all the phonemes of a particular language. This set is always finite (this is a cross-linguistic universal). One can consider the space of all conceivable acoustic signals that might be produced by a speaker of the language. The set of phonemes constitutes a particular partition of this space into a finite number of parts, and speakers of the language do not make use of any of the differences within each part when processing speech3. The parts under this partition are represented by symbols surrounded by slashes. If you choose to partition the space in a different way for some reason, you need to represent the parts by symbols surrounded by square brackets.

One final point which I want to stress is that both phonemes and sounds in general are abstract categories! People (including me, when I’m not thinking carefully enough) often describe the distinction as something along the lines of “phonemes are abstract categories of sounds”, and this can be interpreted in a way that makes it a true statement, more or less, but it doesn’t constitute an exhaustive definition: the things we refer to as “sounds” in practice are abstract categories of sounds too, so phonemes are a particular kind of abstract category of sounds.

Anyway, the difference between phonetics and phonology is this: phonetics is about sounds in general (“phonetes”), phonology is about phonemes. Or to put it another way, phonology specifically studies the categorizations of acoustic signals that make sense with respect to particular languages, and phonetics studies speech sounds under other categorizations. For example, investigation of how common it is for lateral approximants to appear in speech in both clear and dark forms comes under phonetics. But once you start investigating in addition how common it is for clear L and dark L constitute separate phonemes, you’ve got into phonology.


The concept of the distinction between phonetics and phonology can be generalised. It has proved especially fruitful in the field of anthropology.

The first person to make the analogy was a man called Kenneth Pike. As you might imagine, he was both an anthropologist and a linguist. He was quite an interesting man, actually. According to Wikipedia, he was “the foremost figure in the history of SIL” (that slightly controversial organization, the Summer Institute of Linguistics). He also invented a (non-naturalistic) conlang called Kalaba-X. And he used to give what were called “monolingual demonstrations”, where he would work with a speaker of a language unknown to him and attempt to analyze it as far as he could without having known anything about it previously, all before an audience.

Anyway, Kenneth Pike thought that it was helpful to distinguish two different approaches to studying human culture, which he called the emic and etic approaches. The emic approach is analogous to phonology. The etic approach is analogous to phonetics. The anthropologist Marvin Harris later adopted the concept and made it critical to his theory of human culture, which he called “cultural materialism”. Harris made use of the concept in a somewhat different way than Pike originally did. If you want to see Pike’s side of things, you could look at this interview with him, which contains the following amusing illustration of the extent of their differences:

[…] it took me months and months and months to try to understand Harris. Would you like to know how I got started talking with Harris? I was in Spain at the request of some philosophers and spoke there on the relationship of language to the world (Pike 1987). Afterwards they told me that Harris had been there three months previously lecturing. When they invited me, they had sent me some articles with some references to the etics and emics of Harris. That is precisely why they had invited me. Harris had said that he wished he could talk to Pike.

So later we invited Harris to Norman [Oklahoma] to lecture. I asked him to arrive at least a day early so that we could talk privately before the lecture. So we spent four hours talking prior to the lecture. Tom Headland then met him at an AAA meeting and arranged the meeting and we both agreed.

We had a difficult time trying to understand each other. We each spoke 20 minutes, with 10 minutes for reply by the other. Later, we saw each other’s materials so that before publication we could revise our own materials after having read the comments. The commentators could also revise their materials after having read the revisions of our revisions. So we had maximum time to try to understand each other. Even so, every so often I still get a little perplexed.

I have read some of Harris’s work but none of Pike’s, so my discussion is going to be informed by his conception of the emic and etic approaches in particular. Let’s begin with an illustrative example, like the one I used in part I of this post. This example is taken from Harris’s book Cultural Materialism, published in 1979.

While doing fieldwork in the southern Indian state of Kerala, Harris observed that the sex ratio among the cattle owned by farmers there was highly skewed in favour of females: for every hundred female cattle there were only sixty-seven male cattle. The farmers, when asked about this, vehemently denied having killed the excess males, as expected given the Hindu prohibition against killing cattle. They instead attributed the difference to an innate propensity towards sickness among male cattle. When they were asked why this propensity existed, some of them replied that the male cattle ate less than the females. When they were asked again why the male cattle ate less than the females, some replied that they were given less time to suck on their mother’s teats. However, there are other states in India, such as Uttar Pradesh, where the sex ratio is skewed the other way: there are more than two oxen for every cow. Moreover, these states are precisely those where the ecological and economic situation is such that there is a relatively large need for traction animals, such as oxen. Suspicious, isn’t it? What seems to be happening is that, despite the Hindu prohibition against killing cattle, the farmers of Kerala take active steps to ensure that male calves drink less milk than their sisters4.

By taking these actions, the farmers cause the male calves to die, when they otherwise would survive. Therefore, there is a sense in which their action can be called “killing”. But the crucial point is that if we call the action “killing”, then we are making use of a categorization which is etic rather than emic. That is, it is not a categorization which makes sense on the terms of the culture of the Keralan farmers. These farmers’ concept of killing does not include neglecting to feed male calves properly5. It is just the same as how the Russian /l/ covers a smaller range of acoustic signals than the English /l/. The contradiction between Hindu custom and what actually takes place must be understood in this light: it is only an apparent contradiction, because, from the emic perspective, the farmers are not doing any killing of cattle.

Note that this is not to say that the Keralan farmers would be able to get away with openly slaughtering the cattle, say, by slitting their throats with knives. The concept of “killing” is not infinitely malleable. In the same way, no language that I know of considers both [p] and [l] to be the part of the same phoneme. All we are saying here is that the extent of variation in emic categorizations is constrained to some degree by the properties of the things they categorize. In describing these constraints we make use of categorizations that are chosen for their usefulness for this descriptive purpose, and not for their coincidence with categorizations that are used by a particular culture. Such categorizations are by definition etic. This means that if the extent of emic variation is sufficiently constrained, the distinction between emic and etic becomes redundant, because all cultures will essentially categorize things the same way, and this categorization can be perfectly well understood from an etic perspective. In most areas of human culture, however, there are considerable degrees of freedom in categorization and therefore the emic-etic distinction is very helpful in understanding cross-cultural variation.

The Keralan cattle sex ratio example is an especially striking one, but another example given by Harris in the same book is, I think, more illustrative of just how helpful the emic-etic distinction can be. In Brazil, Harris collected data on the number of people living in households. But doing this required a more complicated methodology than just asking people from different households, “How many people live here?” The culture of Harris’ informants was such that they did not consider their servants members of their households, even when they were permanent residents there. And for whatever purpose he was collecting the data, Harris found it more useful to consider these servants as household members. He therefore had to ask extra questions to get information about the numbers of servants, in order to make use of an etic categorization of his own that was different from the emic categorization of his Brazilian informants. It is easy to see how not heeding this kind of thing could lead to confusion: if, for example, you collected data on the number of people in households across both Brazil and some other country in which live-in servants were counted as household members, only asking, “How many people live here?”, and used that inference to make conclusions about, say, the amount of food that the average household consumed in both countries, then these conclusions could be grossly wrong, and the data would be meaningless in that sense.

This is connected with another important consideration. One of the things which gives the social sciences a rather different epistemic flavour from the natural sciences is the ubiquitous use of concepts which are rather slippery and vaguely defined: “status”, “role”, “social class”, “tribe”, “state”, “family”, “religion”, etc. Social scientists regularly try to make these definitions more precise (that is, to “operationalize” them), but they do this in a peculiar way: it is rare for a particular operationalization to actually become accepted as the one, true definition of the concept at this level of precision, or for two different operationalizations to be given different names so that researchers can from then one treat them as separate concepts. Indeed, I think a lot of social scientists might agree that it is more useful to leave these concepts vaguely defined and use the operationalizations appropriate to the circumstances. Why is this the case? The crucial factor may be that in the social sciences, the distinction between emics and etics comes into play. Social scientists often need to talk about “status”, “tribe”, “state”, “religion”, etc. as emic concepts; that is, as conceptualised by particular cultures. Different cultures have different ideas of what these concepts are, and hence different operationalizations are appropriate for different cultures. Having a common word for each of these different operationalizations is still useful as a way of emphasizing the similarity between them (and perhaps their common origin, in some sense). And it doesn’t cause too much confusion, because the sense of the word in a particular context can be inferred from the culture being talked about in that context. It’s only when one needs to make use of etic concepts that are similar to these emic concepts that the potential for confusion becomes large. One thing that might be useful in the social sciences is to reserve some words for the emic approach and others for the etic approach. For example, we might reserve “caste” as the word for social strata as conceptualized by particular cultures6 and “class” as the word for social strata as conceptualized in other ways.

To summarize: in order to understand a culture, one must understand the concepts which the culture’s members understand their experience in terms of. Emic approaches to culture work with these concepts only. On the other hand, etic approaches to culture may work with alternative conceptual systems which clash with that of the culture being studied. The two approaches are not rivals; they lead to insights about different things and at the same time complement each other, just as phonetics and phonology are not in conflict, and are different subfields of linguistics yet at the same time are closely interconnected.

  1. ^ By using the word “category” I don’t mean to imply that membership in the category is categorical, as in a mathematical set (i.e. that every acoustic signal is either a lateral approximant or not, and there is never any need for further clarification). The category may be radial: it may be the case that one particular acoustic signal or set of acoustic signals is maximally lateral approximant-like, and acoustic signals which are less similar to these central examples are less lateral approximant-like. Or it may have some other, more complicated structure.
  2. ^ Some dialects don’t exhibit this allophony—Welsh English sometimes has clear L everywhere, and certain American English dialects have dark L everywhere. So if you can’t see this distinction in your own speech after reading the rest of the paragraph this footnote is attached to, that may be why.
  3. ^ This isn’t quite true: for example, you might notice that somebody keeps pronouncing clear L where they should pronounce dark L and conclude on that basis that they must be Welsh or foreign. You may do this subconsciously, even if you don’t know about the distinction between clear L and dark L. (The subconscious understanding of allophonic variation patterns is a large part of why people find it difficult to imitate other accents than their own: they see the problem in others, but not in themselves. Conversely, understanding phonetics and phonology is the secret to being able to imitate accents like a boss.)
  4. ^ Harris does not go into very much detail about this example. There are some things I’d like to know more about, such as why this stark difference in demand for traction animals exists between different Indian states, and how exactly the farmers are supposed to ensure that the male cattle are fed less. If anyone reading this knows of some resources that would be helpful, I encourage you to point me to them.
  5. ^ Of course, there may be a certain level or style of neglect for which it would be regarded as killing; but the means by which the differential sex ratio is produced is certainly not considered to be killing.
  6. ^ Or subcultures, of course. Basically everything that is being said here about the analysis of cultures can also be applied to more finely-grained divisions within cultures.

An example of metathesis of features

Metathesis is generally understood as sound change involving the switching in position of two segments, or sequences of segments. For example, the non-standard English word ax ‘ask’ is related to the standard form by metathesis. But there are also some arguable cases where metathesis has involved the switching of individual features of segments, rather than the segments themselves.

For example, consider the Tocharian (Toch.) words for ‘tongue’: käntu in Toch. A, kantwo in Toch. B. From these two words we can reconstruct Proto-Tocharian (PToch.) *kəntwó; note that, following the convention of Ringe 1996, denotes a high central vowel, not a mid central one as it does in the IPA. Now, the Proto-Indo-European word for ‘tongue’ is reconstructed as *dn̥ǵʰwáh₂1. The development of *-n̥- into *-ən- and *-wáh₂ into *-wo in PToch. is regular. However, the regular development of *d- in PToch. would be *ts-, and the regular development of *-ǵʰ- in PToch. would be *-k-. In other words, the expected PToch. form is *tsənkwó, not *kəntwó.

How can we explain this outcome? The first thing one might notice about the two forms is that where the PIE form has a coronal stop, the PToch. form has a dorsal stop, and where the PIE form has a dorsal stop, the PToch. form has a coronal stop. One might therefore suggest that the PToch. form comes from a metathesized version of the PIE form, with the coronal stop *d and the dorsal stop *ǵʰ having changed places: *ǵʰn̥dwáh₂. If *kəntwó is the expected outcome of PIE *ǵʰn̥dwáh₂ in PToch., then this hypothesis explains the outcome in the sense that it makes its irregularity no longer surprising; changes of metathesis are well-known exceptions to the general rule that sound change is regular.

Unfortunately, there’s a problem with this hypothesis: the regular outcome of PIE *ǵʰn̥dwáh₂ in PToch. is *kənwó, not *kəntwó, because PIE *d is regularly elided in PToch. before consonants. In fact there are no circumstances under which PIE *d becomes PToch. *t; if, by some exceptional circumstance, *d failed to be elided in *ǵʰn̥dwáh₂, it would probably become *ts, rather than *t, resulting in PToch. *kəntswó.

The solution proposed by Ringe (1996: 45-6) is to suppose that what was metathesized was not the segments *d and *ǵʰ themselves, but rather their place of articulation features. So *d became [-coronal] and [+dorsal] (like *ǵʰ), while *ǵʰ became [+coronal] and [-dorsal] (like *d). But the laryngeal features of the two segments were unchanged: *d remained [-spread glottis], and *ǵʰ remained [+spread glottis]. Therefore, the outcomes of the metathesis were *ǵ and *dʰ, respectively. And *kəntwó is, indeed, the expected outcome in PToch. of PIE *ǵn̥dʰwáh₂, because PIE *dʰ becomes *t in PToch. (There’s the interesting question of why *d becomes an affricate *ts, but its aspirated counterpart *dʰ is unaffected—but let’s not get into that.)

I did a search of the literature using Google Scholar, but I couldn’t find any other explanations of the development of PIE *dn̥ǵʰwáh₂ into PToch. *kəntwó. And I can’t think of any myself. Still, the scenario posited above is perhaps too speculative to allow us to say that metathesis of features is definitely possible. It would be better to have an example of metathesis of features which is still taking place, or which occured recently enough that we can be very sure that a metathesis of features took place. Ringe & Eska (2014: 110-111) give a couple of other examples, but both are from the development of Proto-Indo-European, and therefore not much less speculative than the scenario above. (It might be of interest that one of their examples is Oscan fangva, a cognate of PToch. *kəntwó; PIE *dʰ- becomes Oscan f-, so what seems to have happened here is the same kind of metathesis as in PToch., but with the laryngeal features switching places, rather than the place of articulation features.) Ringe & Eska do also mention that one of their daughters, at the age of 2, pronounced the word grape as [breɪk], thus exhibiting the same kind of metathesis as hypothesized for pre-PToch., i.e. with the place of articulation features being switched with each other but with the laryngeal features remaining in place.


  1. ^ Normally I would cite other reflexes of the proto-form in IE, but the reflexes of *dn̥ǵʰwáh₂ exhibit an amazing variety of irregularities, so that to do so would probably break the flow of the text too much. It has been proposed that *dn̥ǵʰwáh₂ might have been susceptible to taboo deformation, although it’s hard to imagine why the word ‘tongue’, in particular, would have been tabooed; then again, the fact that only a single IE branch (Germanic) appears to preserve the regular reflex of the root does cry out for explanation. I’m not sure how secure the reconstruction of *dn̥ǵʰwáh₂ (given by Ringe & Eska) is, although I don’t recall seeing any alternative reconstructions. The main basis for this reconstruction seems to be Gothic tuggō (which has become an n-stem, cf. gen. sg. tuggōns, but is otherwised unchanged) and Latin lingva (which has the irregular d- to l- change observed in a few other Latin words). But Old Irish tengae seems to reflect *t- rather than *d- (this is without precedent in Celtic as far as I know, but I don’t know much about Celtic), Old Prussian insuwis seems to have lost the initial consonant entirely. And as for Sanskrit jihvā́, the second syllable of this word is the perfectly regular outcome of PIE *-wáh₂, but the first syllable is either completely unrelated to PIE *dn̥ǵʰ- or has undergone more than one irregular development.


Ringe, D. A. (1996). On the Chronology of Sound Changes in Tocharian: From Proto-Indo-European to Proto-Tocharian (Vol. 1). Eisenbrauns.

Ringe, D., & Eska, J. F. (2013). Historical linguistics: toward a twenty-first century reintegration. Cambridge University Press.