The evolutionary approach to syntax


I’ve been reading an interesting book lately called Evolutionary Syntax, which is by Ljiljana Progovac. I picked it up because, judging by the title, I thought it might be something in a similar vein as Juliette Blevins’ Evolutionary Phonology. However, it’s not quite the same thing. I’ve not read Blevins’ book yet, but as I understand it, it’s about how patterns in sound change can be used to explain patterns in the phonology of the world’s languages. So it examines the evolution of phonology as it has taken place since the development of the mature human language faculty, and it’s more concerned with identifying the general tendencies of phonological evolution than describing the particular changes that languages have gone through. Progovac’s book, on the other hand, is about the evolution of syntax during the maturation of the human language faculty. This was a case of directed evolution—it’s reasonable to assume that there was a gradual tend towards more and more complex syntactic structures, and that’s the view Progovac takes. It’s also something which it makes sense to talk about as operating on human language as a whole, rather than on specific languages. So, roughly, while Blevins’ question is how phonology can evolve, Progovac’s question is how syntax did evolve.

Another important difference between these two evolutionary processes is the source of the mutations that allow them to occur. For Blevins it’s the errors that learners of a new language make in reproducing the language’s grammar from the speech they hear, while for Progovac it’s straightforward genetic variation: she thinks the evolution of syntax was driven by natural selection.

Anyway, it wasn’t a disappointment to me that the book wasn’t about exactly what I thought it would be; I still found it very interesting. (A more direct counterpart of Evolutionary Phonology for syntax would be interesting as well, though.) It was a little surprising to me that I did, because syntax has never been one of my favourite subfields of linguistics. Another book I’m trying to get through at the moment, because I need to for one of my classes, is David Adger’s Core Syntax, which does the standard thing of attempting to analyse syntax from a purely synchronic perspective, and I’m finding that one pretty dull and hard to get through. Part of the reason for this is that I have a particular interest in evolutionary processes; I like to see everything from an evolutionary perspective if it can possibly be seen that way. But another part of it is that I suspect theoretical approaches in linguistics that don’t make a lot of use of a diachronic perspective aren’t going to have much luck. Language is something that is enormously variable over time, space, social context, and other things, and when language is considered as removed from the context of variation, it seems likely that it will no longer be possible (or at least it may be less possible) to make sense of a lot of its characteristics.

I don’t by any means have a great deal of familiarity with theoretical linguistics, so don’t take my opinion here too seriously. But I do think my opinion is backed up by one part of Evolutionary Syntax. In chapter 5, Progovac outlines how the phenomenon of islandhood can be analysed using her evolutionary framework. Her analysis is rather different from the mainstream approach that I’ve been learning about at university, but the explanations it yields are somewhat more convincing to me. I’ll try to explain her analysis in the rest of this post. First, though, I should explain what islandhood is, and what about it needs explaining.


Consider the following two sentences:

(1) John loves Mary.

(2) Who does John love?

The meanings of these two sentences are similar. Both of them involve a proposition of the form ‘John loves x‘. In (1), the x is Mary, and the sentence is an assertion of the proposition’s truth. In (2), the x is a dummy variable, and the sentence is a question asking for a description of some x such that the proposition ‘John loves x‘ is true. So which word in (2) (if any) represents the dummy variable x? The most natural assumption is that it is the word who. But the position of who in (2) is very different from the position of Mary in (1), even though the meaning ‘Mary’ and the dummy variable x are in the same semantic “position” in both sentences.

There are some languages, like French, in which it would seem that a straightforward analysis of the translation of who (qui) as the dummy variable x is entirely possible. Cf. sentence (3) below, in which qui comes after the verb.

(3) Jean aime qui?

Perhaps, then, the who in (2) does represent the dummy variable, but for some reason the word is “moved” from its expected position after loves to the front of the sentence. The use of the verb “move” here reflects a particular conceptualization of how sentences are formed, where there is more than one level of structure—there’s an underlying structure where the word is not moved, and a surface structure where it is. There are syntactic theories that make this intuitive notion of “movement” more precise, but others analyze (2) in a way that does not involve anything the theory’s proponents like to call “movement”. I’m going to call the thing that explains the difference between (1) and (2) “movement” here, just so it has a name, without implying any particular analysis.

Now, islandhood is the phenomenon of movement being disallowed, for some reason, in certain syntactic environments. Consider, for example, the following examples from Progovac’s book (p. 133). (I’ve added answers to each question in order to help you parse the “expected” meaning.)

(4) *What did Bill reject the accusation that John stole? (cf. Bill rejected the accusation that John stole the jewellery.)

(5) *Which book did Bill visit the store that had in stock? (cf. Bill visited the store that had Crime and Punishment in stock.)

The stars at the start of these sentences indicate that they’re ungrammatical: that is, they violate the syntactic constraints of English. They’re supposed to give you the same instinctive “this is wrong” feeling as sentences like I speaks English correctly or I speak correctly English or I speak the English correctly. Note also that the problem isn’t in the meaning of the words in these sentences. Some sentences, like Colourless green ideas sleep furiously (Chomsky’s famous example) feel wrong for this reason, but it’s quite easy to see what meanings these ungrammatical sentences “should” have. That’s how you know the problem is syntactic, rather than semantic.

The specific problem with sentences (4) and (5) is that the wh-phrases in each of them have been moved out of subordinate clauses that are attached to the object of the main clause. For some reason, movement out of this environment is forbidden. In fact, it is forbidden out of all subordinate clauses that are attached to nouns. Note, however, that movement is permitted out of a subordinate clause if the subordinate clause is not attached to any noun, and is the object of the main clause. In fact, it’s permitted out of an arbitrarily deeply nested sequence of clauses as long as each clause is the object of the previous one, as illustrated by sentences (6), (7) and (8).

(6) Who does John think Mary loves? (cf. John thinks Mary loves Bill.)

(7) Who does John think Mary thinks Bill loves? (cf. John thinks Mary thinks Bill loves Susan.)

(8) Who does John think Mary thinks Bill thinks Susan loves? (cf. John thinks Mary thinks Bill thinks Susan loves Paul.)

Environments out of which movement is forbidden are called wh-islands, because wh-phrases are “stranded” within them. Islandhood is the phenomenon of the existence of wh-islands.

It may be the case you don’t consider both of the sentences (4) and (5) ungrammatical. This is not particularly unusual—judgements of islandhood seem to vary quite a lot between individual speakers of a language (and, sometimes, between different times for the same individual). My friend Darcey brought up a good example of this a while ago: sentence (9) below.

(9) *What do you wonder who fixed? (cf. I wonder who fixed the computer.)

She thought this was a perfectly comprehensible and grammatical sentence, but the vast majority of people, including me, would not agree. The problem with it is that the interrogative DP what is being moved out of a subordinate clause which also contains an interrogative DP (who) that has been moved to the front of the clause. (These clauses are known as indirect questions.) Movement out of this environment seems to be generally forbidden for most English speakers. The contrast between sentences (10) and (11) below illustrates this.

(10) What do you know John fixed? (cf. I know John fixed the computer.)

(11) *What do you know who fixed? (cf. I know who fixed the computer.)

There is another way to see the effects of islandhood, in case you find it hard to judge the grammaticality of sentences. Sometimes, the only way to explain why a sentence is not ambiguous is by appealing to islandhood. Consider sentence (12) below.

(12) When did you wonder who fixed the computer? (cf. I wondered, last night, who fixed the computer.)

I have phrased the answer here carefully, because the sentence I wondered who fixed the computer last night is ambiguous. In this sentence, the adverbial phrase last night could also refer to the time the fixing took place, rather than the time the wondering took place. To put it in terms of phrase structure, last night could be contained within the subordinate clause beginning with who fixed the shower, rather than being outside of it. Perhaps the best way to help you the ambiguity if you can’t already see it is to put some guiding brackets in the sentence:

(13) I wondered [who fixed the computer] last night.

(14) I wondered [who fixed the computer last night].

Now, here’s the funny thing. If we take sentence (14), replace last night by when and move the when to the front we get (12), right? But that suggests (14) is a possible answer to (12). And for me, that isn’t true—the ambiguity isn’t present in (12) at all. For me, (12) can only be interpreted as asking about the time the wondering took place, not the time the fixing took place. You might disagree. In particular, I suspect Darcey might disagree, given that she thought (9) was grammatical. But I know at least one other person agrees because my Introduction to Syntax lecturer, when I took that course last year, gave us the problem of explaining the unambiguity of the sentence When did you wonder whether he disappeared? (which is structurally parallel to (9)) as an exercise.

(If there were any students who did consider that sentence ambiguous, they must have found that problem really confusing.)

For people like me who find (9) ungrammatical, there’s an obvious explanation for this situation in terms of islandhood. Going from (14) to (12) involves the movement of when out of a subordinate clause which begins with an interrogative NP, so it is forbidden since indirect questions are wh-islands. But going from (13) to (12) involves no such thing, since last night is not part of the subordinate clause beginning with who in that sentence. I don’t know how else the unambiguity of (12) could be explained; that’s why I think that Darcey and others with different intuitions on the grammaticality of (9) might have different intuitions on the ambiguity of (12).

Now that was a bit of an aside, but I thought you might find it interesting to see a different way in which the phenomenon of islandhood manifests, and maybe it helps a little if you are finding these grammaticality judgements too subjective.

Anyway, the really intriguing question about islandhood is, why is islandhood a thing? Or, similarly, what distinguishes wh-islands from non-wh-islands? Why is the relatively simple six-word sentence (9) ungrammatical, but the horrendously complex sentence (8), involving movement out of a triply-embedded clause, is perfectly fine?


Before I talk about possible answers to this question, first, I should mention the other island environments in English. We’ve already seen that subordinate clauses which are attached to nouns or which begin with interrogative DPs are islands. A famous PhD dissertation by Ross (1967), which was the first comprehensive investigation of islandhood in English, identified the following additional islands:

  1. Sentential subjects (i.e., subordinate clauses in subject position)
  2. DP specifiers (i.e., noun phrases that are attached to nouns via the possessive clitic ‘s)
  3. Coordinated noun phrases (i.e., noun phrases that are attached to noun phrases via coordinating conjunctions)

These are illustrated by the example sentences below.

(15) *What does that he is denying make it worse? (cf. That he is denying his mistake makes it worse.)

(16) *Whose does John love daughter? (cf. John loves Mary’s daughter.)

(17) *Who does John love Mary and? (cf. John loves Mary and Alice.)

The question is: why are the wh-islands the members of this particular set of environments, rather than some other set?

The traditional, Chomskyan answer relies on a fundamental syntactic principle called Subjacency, which was proposed just in order to answer this question. There’s a nice exposition of this approach in chapter 12 of this online syntax textbook by Santorini & Kroch (2007), which I encourage you to read if you’re interested in the details. But I’ll try to give a brief explanation here. Roughly, the idea is that certain phrases comprise barriers to movement, and that if a phrase is moved from one position to another in a sentence, it can cross the boundaries of at most one barrier to movement. It’s fine if it crosses the boundary of just one barrier to movement, but any more than that, and the sentence becomes ungrammatical. So, for example, the ungrammaticality of (9), and other sentences involving movement out of an indirect question, is a result of IPs (inflectional phrases—these roughly correspond to clauses) being barriers to movement. Sentence (2) (Who does John love?) is grammatical because the word who in this sentence only has to cross the boundary of one IP. But in (9), what has to cross the boundary of two IPs, so (9) is ungrammatical. Another kind of phrase which is a barrier to movement is the DP (determiner phrase, roughly the noun phrase). This accounts for the ungrammaticality of sentences (4) and (5), in which the movement is out of subordinate clauses that are attached to objects in the main clause.

But wait, doesn’t that mean sentences (6), (7) and (8) are ungrammatical too? After all, they involve movement across two, three and four IP boundaries, respectively. Well, the crucial thing to understand here is that it’s possible for the same phrase to move multiple times. The principle of Subjacency only forbids crossing multiple barriers to movement in a single movement—crossing multiple barriers in multiple movements is fine. Here’s a diagram showing the phrase structure of sentence (7).


Whoever said syntax was complicated?

There’s obviously a lot of stuff going on here, but you can see that the DP who starts in the position labelled 4 (in the complement position of the VP loves), moves to the position labelled 1 (in the specifier position of the CP (that) Mary loves), crossing only one IP boundary in the process, then moves to its final position (in the specifier position of the CP containing the whole sentence), again crossing only one IP boundary in the process. In order for this two-step movement to be possible it is crucial that there is an empty node in the phrase structure tree, such as 1 in the one above, to act as a “landing site” for the moving interrogative DP. How we do know that this empty node exists? I can’t give a full justification of the underlying assumptions here, but I can give you a good reason to think that goes in the node labelled C, rather than occupying the position within the CP but outside and to the left of the C’ (which is called the specifier position), which is the position moved interrogative DPs occupy. Consider the first couplet of the Canterbury Tales:

(18) Whan that Aprill, with his shoures soote / The droghte of March hath perced to the roote

Here we have a CP (Whan that Aprill…) which begins with an interrogative phrase followed by that, and the only way we can fit both of them in is to suppose that the interrogative phrase goes in the specifier position of the CP and that goes in the C node. It’s not unreasonable to assume that clauses introduced by interrogative NPs in modern English are structured in the same way, only with the that absent.

So we can get around Subjacency if there are landing that can be used to perform the movement in sizable, non-multiple-barrier-crossing chunks. The reason this doesn’t apply in (9) is that the who in the subordinate clause has already moved into the landing site where the what would otherwise go, and there is no other possible landing site. (If what moved before who then it could use the landing site, but generally it is assumed that movement that is constrained to deeper levels of the phrase structure tree happens first.) What about (4) and (5)? In these sentences it is possible to move the interrogative NP into the specifier position of the CP which contains the subordinate clause, because only one IP boundary needs to be crossed to do that. But moving it again into the specifier position of the CP which contains the whole sentence would involve crossing both an IP and a DP, and as we said above, DPs are barriers to movement as well as IPs, so this would violate Subjacency.

Now, this analysis is reasonably successful, but it’s not without problems. For example, assuming that DPs are barriers to movement as well as IPs predicts the ungrammaticality of some sentences that strike most people as grammatical, such as (19) below.

(19) Who did you take a picture of?

There are ways of dealing with this, and you can read about them in the textbook chapter linked to above. It’s always possible to propose ever more complex principles in order to capture the islandhood conditions more precisely. But the more complex the principles are, the less satisfying the explanation is as an improvement over simply listing the wh-island environments in an unsophisticated way, as we noted that we could do at the start of this section.

Even if Subjacency was able to account for everything, there would still be something of a mystery here. A question remains: why does Subjacency exist in the form that it does? That’s something which the Chomskyan approach doesn’t really attempt to answer.


It’s a question we’d like to have an answer to, though. Subjacency is a rather complex, specific principle. If Subjacency blocked movement across barriers in general, I’d be more happy with it—but blocking movement across two barriers but not one? That’s just weird, and it seems like something that needs an explanation.

Note that Subjacency is supposed to be a universal principle. Even though I’ve only been talking about English here, many other languages have much the same set of island environments, and often where there are apparent differences, Chomskyans would argue that this is due to the misidentification of structures in different languages that are actually not identical. Also, the fact that most children successfully acquire the same grammaticality judgements with regard to islandhood suggests that the principle is part of the innate language faculty—it’s hard to imagine the necessary sentences being uttered often enough to enable a child to learn by example alone. (See Baker 2010 for more on the assumption of universality.) But it’s hard to see how Subjacency, if it’s part of the innate language faculty, could have evolved by natural selection. As Lightfoot (1991), quoted by Progovac, says:

Subjacency has many virtues, but […] it could not have increased the chances of having fruitful sex.

However, if we start from the conception of syntax as an evolved system, a different approach to the whole problem naturally presents itself.

As we said above, one of the basic operations of syntax is movement. In the jargon of syntax this operation is often referred to as Move. It’s called an “operation” because the idea is that when a sentence is formed in the mind, it first consists of an unordered set of words, and then syntactic operations are applied to that set of words in order to give it structure, so that the words can eventually be arranged in a linear order and uttered in that order. The other important syntactic operation is Merge, which combines pairs of words into units called constituents, and also combines pairs of sub-constituents into super-constituents, thus organizing the words into a binary tree. Move applies afterwards, moving words from one node in the tree to another under certain conditions. If Subjacency exists, then it blocks the application of Move in certain conditions.

Now, it’s not too difficult to see why natural selection would enable the Move operation to evolve. Language with Move is more expressive than language without Move. But who’s to say the whole of Move appeared all at once? It seems more likely that it would have evolved gradually, being first applicable only in a particular environment or set of environments, and having its applicability gradually expanded over time by analogy. And it could well be the case that there are some environments that it never became applicable in. These would be exactly the wh-islands.

Let’s not get ahead of ourselves: so far, this does not constitute any sort of answer to our question. What we want to know is how to distinguish wh-islands from non-wh-island environments, and all we’ve done in the above paragraph is shown that we can reformulate the question as “Why didn’t Move get generalized to be possible out of the wh-island environments?” It could still end up being the case that the wh-island environments can be characterised by a condition such as Subjacency, in which case that condition would still be a useful thing to talk about. But framing the question this way makes it clear that we probably shouldn’t expect this to be the case. Impossibility of movement is the original, hence default state. Possibility of movement is the innovative state. And analogical generalization goes from like to like. The second construction on which Move was able to operate would have been a construction very similar to the first; the third would have been similar to the second; and so on, throughout the whole set of non-wh-islands. Hence it’s the non-wh-islands that should be expected to form a natural class, not the wh-islands.

So there’s one way in which the evolutionary perspective has been helpful: it’s allowed us to get a better of idea of the kind of answer we should be looking for. But what would be even better is if we could actually find an answer, using this approach.

Progovac doesn’t have a complete answer here, but she does have a fairly promising sketch of one. The key insight it relies on is the idea that some syntactic constructions are more archaic than others. She identifies four approximate stages of syntactic evolution, which are listed below.

1. The one-word or holophrastic stage, in which all utterances consist of a single word, with no internal structure whatsoever. Multiple words may be uttered in succession, but there is no higher-level structure, only a string of isolated words. The words convey a set of concepts to the listener and the listener has to rely solely on pragmatics to work out how the concepts compose to form a statement about the world. Nim Chimpsky, a chimpanzee who researchers tried to teach (signed) language to, never got past this stage: an example utterance of his was “Give orange me give eat orange me eat orange give me eat orange give me you”.

One-word utterances are still possible in modern languages: “Fire!” Children also often pass through a one-word stage when they are learning to speak, although it’s not clear how far this can be attributed to linguistic constraints, as opposed to physical or general cognitive ones.

2. The two-word or paratactic stage. In this stage utterances consist of at most two words, which are linked by an operation called Conjoin. The two words are of equal status within the resulting constituent; there are no heads or complements. Conjoined constituents cannot themselves be Conjoined, so there is no recursion. Within a string of utterance, the separate utterances (each corresponding to a single Conjoined constituent) are identifiable via prosodic cues such as pitch rise-fall patterns. The interpretation of each constituent is less dependent on pragmatics. Simple, two-word intransitive sentences could already be uttered at this stage, and there might have already been a rudimentary noun-verb distinction. However, in order to convey more complex relations between concepts, inference from pragmatics would still be necessary.

Paratactic constructions still exist in mature human language, but, apart from simple intransitive clauses, they are marginal. Some notable examples are agentive verb-noun compounds (pickpocket, scarecrow), orders (Everybody out!), and the construction which involves a non-case-marked noun followed by a verb and is uttered with an exaggerated rise in pitch on each word, used to convey incredulity at the idea that the statement could be true: Me, a liar?! Adjunction (roughly, the addition of “optional” phrases that add extra information such as adjectives and adverbs) is also a little parataxis-like. When an adjunct combines with a phrase, the resulting phrase is of exactly the same type, in syntactic terms, as the original phrase—one can substitute “black dog” into more or less every sentence that contains “dog”, for example, and preserve grammaticality. This is in contrast to the combination of heads with complements, which results in entirely new phrases with different syntactic distributions.

3. The three-word or coordinate stage. This is similar to the previous stage, except that function words emerge for the first time, in the role of “linkers”: they come in between two Conjoined words in order to mark the conjunction. This makes it easier to identify constituent boundaries, backing up the prosodic cues which are relied on in the paratactic stage. Different linkers may have come to be used in order to add different shades of meaning (cf. and and but in English), but otherwise there is little change in expressive power.

4. The categorical / hierarchical stage. At this stage, different categories of phrases become identifiable for the first time, on the basis of the content words and linkers they contain. For example, John eat might be identifiable as a verb phrase (John is eating) and John dog might be identifiable as a noun phrase (John’s dog). This facilitates the substitution of words for phrases, which allows hierarchical, recursive structure to emerge for the first time (John dog eat = John’s dog is eating), and at this point language reaches its full expressive power.

The identification of these stages is not meant to imply that they were strictly separated; they would have blended into each other to a considerable degree. At the start of each stage the new structural possibilities would have been made use of in relatively few constructions, and as the stage progressed the new kind of structure would have become more and more dominant, just as in the scenario for the evolution of Move described above. So there would always be some constructions which were more “integrated” into the current stage than others. And this applies in just the same way to the categorical / hierarchical stage, which is the stage human language is at today. Progovac thinks that many modern syntactic constructions are “fossils” which have not been fully integrated into the categorical / hierarchical stage. Now, the Move operation is a feature of the categorical / hierarchical stage, and its successful application probably requires the presence of certain structural characteristics which are often not present in these fossil constructions. Hence, islandhood. The wh-island environments should be precisely those environments that involve archaic structure, for a certain level of archaicness.

The prohibition on movement out of coordinate structures has an obvious explanation, using this approach. Coordinate structures are relics of the coordinate stage and have not been fully integrated into the categorical / hierarchical stage. (There is little reason, for example, to believe that a phrase like ham and cheese is structured as either ham [and cheese] or [ham and] cheese; it may be best analyzed as a phrase with three direct sub-phrases which each contain a single word.) The same goes for the prohibition on movement out of subordinate clauses attached to nouns. These subordinate clauses are adjuncts, and hence not integrated into the categorical / hierarchical stage to the same extent as subordinate clauses that are verb complements, for the reasons alluded to above.

Of course, there are other wh-island environments: indirect questions, sentential subjects, and DP specifiers. These environments do not involve particularly archaic structure; in fact, all of these environments involve recursive, hierarchical embedding of structures, which is characteristic of the categorical / hierarchical stage and impossible at lower stages. But perhaps by identifying finer degrees of integratedness, it would be possible to explain why these environments are wh-islands as well. For example, specifiers might be less integrated, in some sense, than complements, which would account for both the sentential subject and DP specifier island constraints. (According to the theory I was taught, verb subjects occupy VP specifiers in the underlying structure and move into IP specifiers on the surface.) But a more in-depth treatment of the subject is needed here than is given in Evolutionary Syntax. Further investigation into exactly how the Move operation might have developed might yield helpful insights here.

The evolutionary approach may also prove helpful in understanding variation in the set of environments that are wh-islands. As described above, the categorical / hierarchical stage probably did not arise suddenly but rather in a gradual manner, with constructions becoming more and more integrated over time at varying rates. This trend towards greater integration could well be continuing to this day; after all, a language that allows movement out of, say, indirect questions has slightly more expressive power than one that doesn’t. It would be interesting to see whether certain kinds of wh-islands are more likely to be non-wh-islands in a minority of individuals’ grammars than others, and whether this likelihood correlates with the extent to which the wh-island environment has a typical categorical / hierarchical phrase structure. To me, indirect questions seem like the most integrated wh-islands that we’ve examined in this post, and hence the most difficult to explain under Progovac’s approach—and they’re the ones for which we’ve seen that there is some variation between individual speakers.

So Progovac’s approach is definitely in need of a lot of further elaboration. But it does explain some of the wh-islands fairly well, and it seems like it might be the right approach to take in explaining the others. By the way, the ideas it makes use of—the different stages of syntactic evolution, and the existence of fossil constructions and varying degrees of integratedness—aren’t just used in Chapter 5, the one that deals with the phenomenon of islandhood, but throughout the whole book. They’re its central ideas. So if you found these ideas interesting, I suggest you check Evolutionary Syntax out.


Emics and etics


Many non-linguists probably don’t know that linguists use the words “phonetics” and “phonology” to refer to two quite different subjects. There is, admittedly, a considerable degree of interconnection between the two subjects, but most of the time the difference is reasonably stark. The best way to describe the respective subjects is by an example. Many of the languages of the world make use of speech sounds which are known as lateral approximants. The Latin letter L is dedicated to representing such sounds. In English, lateral approximants appear at the start of words like “laugh” and “lion” (and, indeed, “lateral”), in the middle of words like “pillow” and “bulk”, and at the end of words like “tell” and “saddle”. The exact sound of the lateral approximants in these words varies to a considerable degree from utterance to utterance, due to factors such as intonation, the chosen volume and the simple fact that people do not replicate precisely the same physical actions every time they utter a sound. It also varies from speaker to speaker—different people have different voices. The term “lateral approximant” therefore refers not to a particular acoustic signal but to an abstract category including some but not all acoustic signals1. The way language works makes it inevitable that when we talk about speech sounds, we talk about these abstract categories of acoustic signals rather than the particular acoustic signals themselves. This point about “lateral approximant” being an abstract category is not directly relevant to the phonetics-phonology distinction, but I bring it up because it will help clarify things later.

One specific kind of variation in the sounds of speech is especially interesting to linguists. The pronunciation of a sound such as a lateral approximant can be affected by the surrounding sounds. The differences produced thus have the potential be regular and systematic in the sense that they are reproduced from utterance to utterance: after all, the same sequence of sounds exists in each utterance. The term for this kind of variation in particular is “allophony”. A particularly stark example of allophony is exhibited by English lateral approximants2 (which is why I chose to talk about this kind of sound in particular). Before other consonants and at the end of a word (such as in “bulk”, “tell” and “saddle”), they are pronounced one way; elsewhere (such as in “laugh”, “lion” and “pillow”), they are pronounced another way. When they are pronounced in the former way, English lateral approximants are referred to as “dark Ls”, and when they are pronounced in the latter way, they are referred to as “clear Ls”. The IPA has symbols for each pronunciation: dark L is [ɫ], clear L is [l]. If you don’t already know a lot about linguistics, it’s quite likely that you never noticed that this variation existed before, even though, as may be apparent to you now that I have drawn your attention to it, the difference is quite large. You never needed to notice it, because in the English language, the distinction between clear and dark L is never used to distinguish words. That is, there are no pairs of words which consist of the same sequence of speech sounds, except that one of them has a dark L in the same position that the other has a clear L. It is therefore convenient to treat clear L and dark L as the same sound, at least when we are talking about English. We can use the simpler of the two symbols, /l/, to represent this sound, but we add slashes rather than brackets around the symbol in order to make clear that the boundary of the category of acoustic signals referred to by /l/ is determined here by the distinctions the English language (or whatever language we are talking about) makes use of in order distinguish its words from each other. It is reasonable to suppose that the concept of /l/ does actually exist in the minds of speakers of English (and that separate concepts for clear L and dark L do not exist in their minds). But even if this were not the case, the concept of /l/ would still be useful for descriptive purposes. The name for this kind of concept is “phoneme”.

There are in fact languages in which the distinction between clear L and dark L is used to distinguish words. Russian is one of them. The word мел ‘chalk’ is pronounced like “Mel”, but with a dark L. The word мель ‘shallow’ is pronounced like “Mel”, but with a clear L. For this reason, Russian is said to have an /l/ phoneme, which is spelt ль, and a /ɫ/ phoneme, which is spelt л. Note that, despite the notation, the Russian /l/ is not the same as the English /l/, any more than the Russian /ɫ/ is the same as the English /l/: the two Russian phonemes correspond to a single, more general phoneme in English.

The crucial, defining property of phonemes is that they are abstract categories of acoustic signals whose boundaries are determined by the distinctions that a particular language makes use of. They are defined in opposition to abstract categories of acoustic signals in general, whose boundaries are not necessarily determined by the distinctions a particular language makes use of; they may be determined by the distinctions a linguist finds interesting to make, for example. Such categories are referred to by the words “sound” or (in my experience, less commonly) “phone”; it has always seemed to me that “phonete” would be the most appropriate word, but nobody uses that one. In the jargon of Less Wrong, the distinction can be conveyed by saying that phonemes carve reality at the joints (for a particular language’s purposes), while sounds in general don’t necessarily do the same.

It can be helpful to shift the viewpoint a little and consider the set of all the phonemes of a particular language. This set is always finite (this is a cross-linguistic universal). One can consider the space of all conceivable acoustic signals that might be produced by a speaker of the language. The set of phonemes constitutes a particular partition of this space into a finite number of parts, and speakers of the language do not make use of any of the differences within each part when processing speech3. The parts under this partition are represented by symbols surrounded by slashes. If you choose to partition the space in a different way for some reason, you need to represent the parts by symbols surrounded by square brackets.

One final point which I want to stress is that both phonemes and sounds in general are abstract categories! People (including me, when I’m not thinking carefully enough) often describe the distinction as something along the lines of “phonemes are abstract categories of sounds”, and this can be interpreted in a way that makes it a true statement, more or less, but it doesn’t constitute an exhaustive definition: the things we refer to as “sounds” in practice are abstract categories of sounds too, so phonemes are a particular kind of abstract category of sounds.

Anyway, the difference between phonetics and phonology is this: phonetics is about sounds in general (“phonetes”), phonology is about phonemes. Or to put it another way, phonology specifically studies the categorizations of acoustic signals that make sense with respect to particular languages, and phonetics studies speech sounds under other categorizations. For example, investigation of how common it is for lateral approximants to appear in speech in both clear and dark forms comes under phonetics. But once you start investigating in addition how common it is for clear L and dark L constitute separate phonemes, you’ve got into phonology.


The concept of the distinction between phonetics and phonology can be generalised. It has proved especially fruitful in the field of anthropology.

The first person to make the analogy was a man called Kenneth Pike. As you might imagine, he was both an anthropologist and a linguist. He was quite an interesting man, actually. According to Wikipedia, he was “the foremost figure in the history of SIL” (that slightly controversial organization, the Summer Institute of Linguistics). He also invented a (non-naturalistic) conlang called Kalaba-X. And he used to give what were called “monolingual demonstrations”, where he would work with a speaker of a language unknown to him and attempt to analyze it as far as he could without having known anything about it previously, all before an audience.

Anyway, Kenneth Pike thought that it was helpful to distinguish two different approaches to studying human culture, which he called the emic and etic approaches. The emic approach is analogous to phonology. The etic approach is analogous to phonetics. The anthropologist Marvin Harris later adopted the concept and made it critical to his theory of human culture, which he called “cultural materialism”. Harris made use of the concept in a somewhat different way than Pike originally did. If you want to see Pike’s side of things, you could look at this interview with him, which contains the following amusing illustration of the extent of their differences:

[…] it took me months and months and months to try to understand Harris. Would you like to know how I got started talking with Harris? I was in Spain at the request of some philosophers and spoke there on the relationship of language to the world (Pike 1987). Afterwards they told me that Harris had been there three months previously lecturing. When they invited me, they had sent me some articles with some references to the etics and emics of Harris. That is precisely why they had invited me. Harris had said that he wished he could talk to Pike.

So later we invited Harris to Norman [Oklahoma] to lecture. I asked him to arrive at least a day early so that we could talk privately before the lecture. So we spent four hours talking prior to the lecture. Tom Headland then met him at an AAA meeting and arranged the meeting and we both agreed.

We had a difficult time trying to understand each other. We each spoke 20 minutes, with 10 minutes for reply by the other. Later, we saw each other’s materials so that before publication we could revise our own materials after having read the comments. The commentators could also revise their materials after having read the revisions of our revisions. So we had maximum time to try to understand each other. Even so, every so often I still get a little perplexed.

I have read some of Harris’s work but none of Pike’s, so my discussion is going to be informed by his conception of the emic and etic approaches in particular. Let’s begin with an illustrative example, like the one I used in part I of this post. This example is taken from Harris’s book Cultural Materialism, published in 1979.

While doing fieldwork in the southern Indian state of Kerala, Harris observed that the sex ratio among the cattle owned by farmers there was highly skewed in favour of females: for every hundred female cattle there were only sixty-seven male cattle. The farmers, when asked about this, vehemently denied having killed the excess males, as expected given the Hindu prohibition against killing cattle. They instead attributed the difference to an innate propensity towards sickness among male cattle. When they were asked why this propensity existed, some of them replied that the male cattle ate less than the females. When they were asked again why the male cattle ate less than the females, some replied that they were given less time to suck on their mother’s teats. However, there are other states in India, such as Uttar Pradesh, where the sex ratio is skewed the other way: there are more than two oxen for every cow. Moreover, these states are precisely those where the ecological and economic situation is such that there is a relatively large need for traction animals, such as oxen. Suspicious, isn’t it? What seems to be happening is that, despite the Hindu prohibition against killing cattle, the farmers of Kerala take active steps to ensure that male calves drink less milk than their sisters4.

By taking these actions, the farmers cause the male calves to die, when they otherwise would survive. Therefore, there is a sense in which their action can be called “killing”. But the crucial point is that if we call the action “killing”, then we are making use of a categorization which is etic rather than emic. That is, it is not a categorization which makes sense on the terms of the culture of the Keralan farmers. These farmers’ concept of killing does not include neglecting to feed male calves properly5. It is just the same as how the Russian /l/ covers a smaller range of acoustic signals than the English /l/. The contradiction between Hindu custom and what actually takes place must be understood in this light: it is only an apparent contradiction, because, from the emic perspective, the farmers are not doing any killing of cattle.

Note that this is not to say that the Keralan farmers would be able to get away with openly slaughtering the cattle, say, by slitting their throats with knives. The concept of “killing” is not infinitely malleable. In the same way, no language that I know of considers both [p] and [l] to be the part of the same phoneme. All we are saying here is that the extent of variation in emic categorizations is constrained to some degree by the properties of the things they categorize. In describing these constraints we make use of categorizations that are chosen for their usefulness for this descriptive purpose, and not for their coincidence with categorizations that are used by a particular culture. Such categorizations are by definition etic. This means that if the extent of emic variation is sufficiently constrained, the distinction between emic and etic becomes redundant, because all cultures will essentially categorize things the same way, and this categorization can be perfectly well understood from an etic perspective. In most areas of human culture, however, there are considerable degrees of freedom in categorization and therefore the emic-etic distinction is very helpful in understanding cross-cultural variation.

The Keralan cattle sex ratio example is an especially striking one, but another example given by Harris in the same book is, I think, more illustrative of just how helpful the emic-etic distinction can be. In Brazil, Harris collected data on the number of people living in households. But doing this required a more complicated methodology than just asking people from different households, “How many people live here?” The culture of Harris’ informants was such that they did not consider their servants members of their households, even when they were permanent residents there. And for whatever purpose he was collecting the data, Harris found it more useful to consider these servants as household members. He therefore had to ask extra questions to get information about the numbers of servants, in order to make use of an etic categorization of his own that was different from the emic categorization of his Brazilian informants. It is easy to see how not heeding this kind of thing could lead to confusion: if, for example, you collected data on the number of people in households across both Brazil and some other country in which live-in servants were counted as household members, only asking, “How many people live here?”, and used that inference to make conclusions about, say, the amount of food that the average household consumed in both countries, then these conclusions could be grossly wrong, and the data would be meaningless in that sense.

This is connected with another important consideration. One of the things which gives the social sciences a rather different epistemic flavour from the natural sciences is the ubiquitous use of concepts which are rather slippery and vaguely defined: “status”, “role”, “social class”, “tribe”, “state”, “family”, “religion”, etc. Social scientists regularly try to make these definitions more precise (that is, to “operationalize” them), but they do this in a peculiar way: it is rare for a particular operationalization to actually become accepted as the one, true definition of the concept at this level of precision, or for two different operationalizations to be given different names so that researchers can from then one treat them as separate concepts. Indeed, I think a lot of social scientists might agree that it is more useful to leave these concepts vaguely defined and use the operationalizations appropriate to the circumstances. Why is this the case? The crucial factor may be that in the social sciences, the distinction between emics and etics comes into play. Social scientists often need to talk about “status”, “tribe”, “state”, “religion”, etc. as emic concepts; that is, as conceptualised by particular cultures. Different cultures have different ideas of what these concepts are, and hence different operationalizations are appropriate for different cultures. Having a common word for each of these different operationalizations is still useful as a way of emphasizing the similarity between them (and perhaps their common origin, in some sense). And it doesn’t cause too much confusion, because the sense of the word in a particular context can be inferred from the culture being talked about in that context. It’s only when one needs to make use of etic concepts that are similar to these emic concepts that the potential for confusion becomes large. One thing that might be useful in the social sciences is to reserve some words for the emic approach and others for the etic approach. For example, we might reserve “caste” as the word for social strata as conceptualized by particular cultures6 and “class” as the word for social strata as conceptualized in other ways.

To summarize: in order to understand a culture, one must understand the concepts which the culture’s members understand their experience in terms of. Emic approaches to culture work with these concepts only. On the other hand, etic approaches to culture may work with alternative conceptual systems which clash with that of the culture being studied. The two approaches are not rivals; they lead to insights about different things and at the same time complement each other, just as phonetics and phonology are not in conflict, and are different subfields of linguistics yet at the same time are closely interconnected.

  1. ^ By using the word “category” I don’t mean to imply that membership in the category is categorical, as in a mathematical set (i.e. that every acoustic signal is either a lateral approximant or not, and there is never any need for further clarification). The category may be radial: it may be the case that one particular acoustic signal or set of acoustic signals is maximally lateral approximant-like, and acoustic signals which are less similar to these central examples are less lateral approximant-like. Or it may have some other, more complicated structure.
  2. ^ Some dialects don’t exhibit this allophony—Welsh English sometimes has clear L everywhere, and certain American English dialects have dark L everywhere. So if you can’t see this distinction in your own speech after reading the rest of the paragraph this footnote is attached to, that may be why.
  3. ^ This isn’t quite true: for example, you might notice that somebody keeps pronouncing clear L where they should pronounce dark L and conclude on that basis that they must be Welsh or foreign. You may do this subconsciously, even if you don’t know about the distinction between clear L and dark L. (The subconscious understanding of allophonic variation patterns is a large part of why people find it difficult to imitate other accents than their own: they see the problem in others, but not in themselves. Conversely, understanding phonetics and phonology is the secret to being able to imitate accents like a boss.)
  4. ^ Harris does not go into very much detail about this example. There are some things I’d like to know more about, such as why this stark difference in demand for traction animals exists between different Indian states, and how exactly the farmers are supposed to ensure that the male cattle are fed less. If anyone reading this knows of some resources that would be helpful, I encourage you to point me to them.
  5. ^ Of course, there may be a certain level or style of neglect for which it would be regarded as killing; but the means by which the differential sex ratio is produced is certainly not considered to be killing.
  6. ^ Or subcultures, of course. Basically everything that is being said here about the analysis of cultures can also be applied to more finely-grained divisions within cultures.