Category Archives: Geography

The Kra-Dai languages of Hainan

One of my favourite blogs on the Internet is Martin Lewis’s GeoCurrents, a consistently high-quality and information-dense blog about geography, especially geopolitics, cultural geography and economic geography. As a student in linguistics I’m especially interested in the posts about linguistic geography (which comes under cultural geography), but almost every GeoCurrents post is interesting, and tells me lots of things I didn’t already know. As an example post for interested readers which touches on cultural, linguistic and ethnic geography and the history of agriculture, I recommend The Lost World of the Sago Eaters. Unfortunately, Martin Lewis recently announced that he was going to have to stop making any more posts until, at least, next June. So I thought it might be a good idea to try and do some posts in the style of GeoCurrents on this blog—introducing the reader to some region of the world and telling them whatever interesting things I can find out about this region and the people who live there.

For this particular post, I’ve decided to write about the island of Hainan, and in particular the Kra-Dai languages which are spoken there, which are in my opinion pretty interesting for several reasons. Hainan is an island off the southern coast of China, in the South China Sea. If you look along the southern Chinese coast, you can see Taiwan off the southeast, and then, further towards the west, not far from the Indochinese peninsula, and just across a strait from a little peninsula jutting out to the south, there’s another island of similar size—that’s Hainan. Politically, it’s part of the People’s Republic of China, and it has generally been a possession of the various Chinese states that have existed for over two thousand years. That makes it far more Chinese in terms of age than Taiwan, which was not settled by Han Chinese until the 17th century, was actually claimed by the Netherlands and Spain before China, and was under Japanese rule for most of the first half of the 20th century. However, Hainan has always been on the periphery of China culturally and economically, as well as geographically. Interested readers are referred to Michalk (1986) for an overview of the island’s history.

Below I’ve included a map of the languages of Hainan, based mainly on Steven Huffman‘s language maps.1 One remarkable feature of the island’s linguistic geography which you can see from this map is that languages of four out of the five language families of Southeast Asia are spoken on it: Sino-Tibetan, Hmong-Mien, Kra-Dai and Austronesian. Only Austro-Asiatic is absent, although an Austro-Asiatic language (Vietnamese) is spoken on the nearby island of Bạch Long Vĩ, which is politically part of Vietnam. That’s quite impressive for an island not much larger than Sicily.


All of these languages are interesting and worthy of discussion, but for the sake of not giving me too much to write about I’m going to focus on this post on those belonging to the Kra-Dai family: Be, Li, Cunhua and Jiamao. I will also—because it’s relevant—discuss the Austronesian language, Huihui, a little as well. These Kra-Dai languages are, most likely, the “indigenous” languages of the island, in the sense that they, or direct ancestors of them, were spoken on the island before the others. The Chinese language was obviously brought to Hainan by the Chinese settlers arriving mostly in the second millennium AD; the Mun language is closely related to (and sometimes considered the same as) the Kim Mun language spoken by some people of the Yao ethnicity in the mainland Chinese provinces of Guangxi and Hunan, and therefore these Mun-speakers are probably recent arrivals as well.

The most widespread and probably the most well-known language spoken on Hainan other than Chinese is the Li language, which is spoken in the mountainous interior of Hainan. Chinese sources use the name Li 黎 “black” to refer to the frequently-rebellious indigenous people of Hainan as early as the time of the Song Dynasty (960–1279). Of course, it can’t be assumed that this name refers to exactly the same group of people as the modern name Li does. But the geographic location of the Li-speakers—in the most inaccessible parts of the island, with Chinese settlers occupying the more habitable coastal lowlands—and their language’s phylogenetic position within Kra-Dai (we’ll talk about this more below) does strongly suggest that their language was the main language spoken on Hainan before Chinese settlement. Linguists sometimes use the name Hlai instead, which is presumably based on a native self-appellation. They also sometimes speak of the “Hlai languages” rather than the “Hlai language”, because, much like Chinese, the various Hlai “dialects” are actually highly divergent and often mutually unintelligible. This again suggests an antiquity to the presence of the Li language on Hainan—there must have been plenty of time for these dialects to differentiate from one another. Norquest (2007) has attempted a reconstruction of Proto-Hlai; you can look at his dissertation to get an idea of how different these dialects are from each other.

Two of the Li dialects—Cunhua and Nadouhua—have a special status. They are not much more distinct from neighbouring Li dialects than any other Li dialects are from the dialects neighbouring them. However, the speakers of these dialects are classified by the Chinese government as members of the Han Chinese ethnicity rather than the Li ethnicity, and they themselves identify more with the Han Chinese than the Li. Speakers of other Li dialects also refer to Han Chinese and Cunhua or Nadouhua speakers by the same name, Moi. Cunhua and Nadouhua do have lots of borrowings from Chinese to a much greater extent than the other Li dialects, but according to Norquest (2007) their basic vocabulary is mostly of Li origin which indicates that they should be regarded as Li dialects heavily influenced by Chinese, rather than Chinese dialects influenced by Li or mixed languages. The influence from Chinese is probably due to the fact that the speakers of these dialects live in the coastal lowlands, not in the mountains as do the speakers of the other Li dialects, where contact with Chinese settlers is greater. It is also likely that the speakers have significant Han Chinese ancestry as well as Li ancestry, but I don’t know if any genetic studies have been done. In any case, because of their different ethnic status Cunhua and Nadouhua are often regarded as comprising a separate language from Li, usually referred to as Cunhua or Cun after the more well-known of the two dialects (Cunhua has many times more speakers than Nadouhua). This is reflected on the map above.

Another Li “dialect” is special because it is in the opposite situation to Cunhua and Nadouhua: its speakers do not have a separate ethnic identity from the Li, but the language is clearly divergent and may not even be genetically a Li language at all. This is Jiamao, which is also shown as a distinct language on the map above. Less than half of its lexicon appears to be of Li origin—that is, more than half of its words cannot be identified as similar to words in other Li dialects. Moreover—and more significantly—linguists have been unable to establish regular sound correspondences between the Jiamao words that do look similar to those in other Li dialects, and those Li dialects. In the words of Thurgood (1992a):

The Jiamao tones do not correspond with the tones of Proto-Hlai at all. The Jiamao initials and finals correspond, but with a pervasive, unsystematic irregularity that raised more questions than it answered. The Jiamao initials often have two relatively-frequent unconditioned reflexes, with other less-frequent reflexes thrown in apparently randomly. The more comparative work that was done, the more obvious it became that a comparative approach was not going to explain the “extreme (and apparently unsystematic) aberrancy” of Jiamao.

Some information given to Thurgood by a Chinese linguist, Ni Dabai (it’s not clear where Ni Dabai got the information from) gave him an idea as to why this might be this case. Ni Dabai said that the Jiamao were originally Muslims, and they arrived in two waves, the first in 986 AD and 988 AD and the second in 1486. Thurgood concluded from this that the Jiamao were originally speakers of an Austro-Asiatic language, who migrated to Hainan and thus ended up in close contact with Li speakers. The Jiamao ignored the tone of the Li words they borrowed, and instead decided which tone to pronounce them with based on their initial consonants; this explains the apparently random tone correspondences. And they borrowed words in several strata; this explains the one-to-many correspondences among the non-tonal segments.

I’m not entirely sure how Thurgood gets straight to “they must have been Austro-Asiatic speakers” from “they were originally Muslims,” though. Unfortunately the copy of Thurgood’s paper that I can access online is inexplicably cut off after the fourth page, so I don’t know if he elaborates on the scenario later on in the paper. I’m not aware of any Austro-Asiatic-speaking ethnic group whose members are mostly Muslim. My understanding is that most of the Muslims in Southeast Asia are the Malays, and their close relatives, the Chams, who speak Austronesian languages. To my uninformed, non-Southeast Asian expert, not-having-access-to-the-full-Thurgood-paper self, the Chams seem like the obvious candidates. The Cham kingdom (Champa), situated in what is now southern Vietnam, was for a millennium and a half an integral part of the political landscape of continental Southeast Asia. Its history is one of constant conflict with the Vietnamese kingdom to its north, in which it tended to be something of the underdog. The Vietnamese sacked the Cham capital in 982, 1044, 1068, 1069 (clearly, the 11th century wasn’t a good time for Champa), 1252, 1446, and 1471; after the last and most catastrophic sacking in 1471, the Vietnamese emperor finally annexed the capital and reduced Champa to a rump state occupying only what were originally just its southern regions. Then these regions, too, were chipped away over the next few centuries, and Champa finally vanished from the map in 1832. Some Cham still live in these regions, but they are no longer the dominant ethnic group there, having mostly either been massacred or fled—mostly to Cambodia in the west, but also, in relatively small numbers, to Hainan in the east. This is how the Austronesian language you can see on the map, Huihui, ended up being spoken in Hainan. Huihui is simply an old-fashioned Chinese word for “Muslim”2, and the speakers of Huihui are indeed Muslims. The Huihui themselves call themselves and their language Tsat (which is cognate to Cham). According to Thurgood (1992b), the Tsat came to Hainan after the sacking of 982, and were mostly merchants who had established connections in the area, which explains their Muslim faith (most Cham at the time were Hindu, but much of the merchant class was Muslim; the Cham only became majority-Muslim during the 15th century, which is about the same time that the Malays converted). More Chams might have migrated to join the Tsat after the subsequent sackings.

Now, the dates Ni Dabai gave for the waves of Jiamao settlement—986 AD, 988 AD, 1486—are just a few years after the sackings of 982 AD and 1471 AD respectively, and that suggests to me that Jiamao, like Huihui, may have a Cham origin. But whereas the Cham origin of Huihui explains most everything about it, there are still a lot of unanswered questions with respect to Jiamao even if we accept that it has a Cham origin. Most obviously, what would have led them to take up residence in the highlands of the southeast, rather than the southern coast where Cham traders would have established the most contacts, and to assimilate so much into the Li culture that they gave up Islam (they are now animists like the Li and Be) and extensively relexified their language with Li loanwords?

Then there’s the problem of the actual linguistic evidence. Norquest in his dissertation examined the Jiamao lexicon and found a grand total of… 2 possible words of Austronesian origin (ɓaŋ˥ ɓɯa˩ ‘butterfly’ and pəj˦ ‘pig’; cf. Proto-Austronesian *qari-baŋbaŋ and *babuy), and none of Austro-Asiatic or of any other identifiable origin, apart from Li. He therefore regards the language as a provisional language isolate. Now, I don’t know how well Norquest knows Austronesian and Austro-Asiatic. He doesn’t explicitly rule out a connection with either of those families; he’s more concerned with simply listing the non-Li Jiamao vocabulary than identifying its origin. So it’s not impossible that Jiamao’s non-Li vocabulary is from one of the main Southeast Asian families, but this is certainly something on which more research needs to be done. I have included below some of the Jiamao and Proto-Hlai words for various body parts, to illustrate the difference; this data is taken from Norquest’s dissertation.

Proto-Hlai Jiamao Sense
*dʱəŋ pʰan1 ‘face’
*ʋaːɦ vet10 ‘shoulder’
*kʰiːn tɯːn1 ‘arm’
*ɦaːŋ tsʰɔːŋ1 ‘chin’

In any case, I assume Thurgood had a good reason for proposing the Austro-Asiatic connection (I just can’t figure out by myself what that reason would be). Another caveat to bear in mind here is that Ni Dabai’s information might be incorrect—even if the story of Jiamao being descended from Muslim immigrants arriving in 986 AD, 988 AD and 1486 isn’t completely false, it could be wrong in some details: perhaps they were Hindus rather than Muslims, and perhaps the dates are inaccurate. In short, it’s a mystery. But an interesting one, don’t you think? It’s just a shame that there has been so little investigation into it, so far—Thurgood’s not-wholly-accessible paper and Norquest’s dissertation are the only two papers I can find which go into any detail about Jiamao.


Moving on… there is one other Kra-Dai language spoken on Hainan, which is completely different, both linguistically and ethnically, from Li. The Be language constitutes a branch of Kra-Dai of its own, and it does not appear to be much more closely related to the Li languages than it is to other Kra-Dai languages. The subgrouping of the branches of the Kra-Dai family is not particularly certain (as usual for language families—subgrouping is a hard problem in linguistics); Wikipedia gives a nice overview, and I’ve included a tree on the right adapted from Blench (2013) below (which appears to be just the Edmondson and Solnit classification mentioned in the Wikipedia article). As you can see, Be is often considered the closest relative of the Tai branch (the one that contains the one Kra-Dai language most people have heard of, Thai, the official language of Thailand). In fact, Norquest in  his dissertation mentions that it shows the greatest lexical similarity with the Northern Tai subgroup, specifically, meaning it might actually be a Tai language; unfortunately, this cannot be verified until more comparative work on Kra-Dai languages is done (no full reconstruction of Proto-Tai or Proto-Northern Tai is yet available).

This suggests that Be is a more recent arrival on Hainan than Li, because it must have arrived after or close to the time that the Tai subgroup separated from the other Kra-Dai languages, whereas Li could have split off straight from Proto-Tai-Kadai. Shintani (1991) has some phonological evidence which he says supports this: the Hainanese dialect of Chinese has undergone a sound change s > t (that is, s in other Chinese dialects corresponds to Hainanese t), and the Be language reflects this sound change in borrowings from Chinese such as tuan “garlic” (cf. Mandarin suan). That means it must have borrowed these words from Hainanese, and Shintani takes this as indicating that Be speakers arrived on Hainan after Chinese settlers were established on the island (that would be no earlier than the time of the Song Dynasty of 960-1279). But I don’t quite follow this inference—couldn’t the Be have arrived first, and borrowed these words only after the Chinese arrived?

That a Tai-speaking group might have migrated to Hainan in the historical period is not implausible, however. Although the political prominence of Thai in modern times might lead you to think otherwise, the Tai languages originated in southern China—more precisely, in the area of the modern provinces of Guizhou and Guangxi, probably extending into adjacent regions of Yunnan and Vietnam as well—and were restricted to that region for much of the historical period. Around 1000 AD, some of them began to migrate to the southwest, perhaps to escape Chinese political domination, although this doesn’t seem like a complete explanation—though the Chinese population in the area has surely been growing over time, they had held the political power since long before 1000 AD. (Also, plenty of Tai-speaking peoples remained in their homeland—in fact, the Tai-speaking Zhuang people still comprise over a quarter of the population of Guangxi). These migrations continued for the next couple of centuries, and by the 13th century the familiar Tai kingdoms of the historical record were being established (Sukhothai in the central part of modern Thailand; Lanna in the northern part of modern Thailand; the Shan states in the eastern part of modern Burma; and Ahom way over in the Brahmaputra valley just east of modern Bangladesh). The Lao people of Laos established their kingdom, known then as Lan Xang “[land of the] million elephants”, in the following century. Over the centuries these evolved into the modern Tai states of Thailand and Laos. Now, if the Tai migrated to the southwest because they wished to leave southern China (rather than being attracted by some particular feature of the southwest), we could positively expect some of them to take the alternative route to the direct south and end up on Hainan. Perhaps this, then, is the origin of the Be.

There is an alternative scenario I can think of which is probably less plausible, but a bit more exciting. Maybe the Be have always been on Hainan—or at least, they have been there as long as the Li have. Be being part of or most closely related to the Tai branch isn’t incompatible with this hypothesis. There’s a useful heuristic in linguistics that a region where a language family is most diverse is likely to be its place of origin, because the longer the presence of a speech variety in a given area, the more time it has to diversify into divergent but genetically related daughters. It’s a heuristic, not a rule, so exceptions are possible, and in fact one of the obvious ways an exception could arise is if external pressure repeatedly pushes speakers of languages in the family into a particular small cul-de-sac region (a “refugium”), which is what would have happened in Hainan in the scenario described in the above paragraph. And of course, the diversity of Kra-Dai in Hainan, with just two independent branches represented, isn’t that much greater than anywhere else (there are four independent branches in Guangxi, namely Kra, Lakkia, Kam-Sui and Tai, and by including an adjacent region of Guangdong the remaining Biao branch can be included as well; of course, Guangxi is a lot bigger than Hainan, and depending on how deep you imagine some of the proposed subgroups are, your perception of each region’s diversity might be altered). But I don’t think it’s ludicrous to think that the Kra-Dai languages, or at least a sub-clade of them excluding Kra, might have originated on Hainan. They might have differentiated first into a southern variety (pre-Li) and a northern variety; a first wave of migration onto the mainland, by the speakers of the northern variety, would have brought about the split between Proto-Lakkia-Biao-Kam-Sui and Proto-Be-Tai; and a second wave would have brought about the split between Proto-Tai and Be.

This is especially interesting to consider in the light of the Austro-Tai hypothesis, one of the most plausible macrofamily proposals floating around. Essentially it proposes a genetic relationship between the Kra-Dai languages and the Austronesian languages, although opinions among proponents differ as to whether Kra-Dai is coordinate to Austronesian (that is, Proto-Kra-Dai and Proto-Austronesian share a common ancestor, but neither is the ancestor of the other) or subordinate to Austronesian (that is, Proto-Austronesian is the ancestor of Proto-Kra-Dai). Sagart (2004) is of the opinion that it is subordinate. If Kra-Dai is subordinate to Austronesian then the possibility arises that Austronesians migrated to Hainan, just as they migrated to essentially all of the islands in southeast Asia and Oceania (plus Madagascar!) Unfortunately, the facts do not seem friendly to this neat hypothesis: nobody, so far as I know, goes so far as to say that Kra-Dai is subordinate to Malayo-Polynesian (the subgroup of Austronesian which includes all of the Austronesian languages outside of Taiwan), and the Austronesians probably hadn’t developed their island-hopping habits so extensively at the point where they were still in Taiwan. The more likely scenario, if the Austro-Tai hypothesis is correct, is that Proto-Kra-Dai was the result of a migration from Taiwan onto mainland China; and in order to reconcile this with the Hainan homeland hypothesis we’d have to propose a migration onto Hainan and then multiple migrations back out again, which is kind of untidy. So, for various reasons, I don’t really think the Hainan homeland hypothesis is likely to be correct. I’d say it’s more likely that the homeland of the Kra-Dai languages is on the mainland, somewhere in Guangxi. But it’s not impossible.


  1. ^ Huffman’s maps do not always make it clear which language is spoken within a given boundary; in order to identify the languages spoken in scattered pockets in the northern part of the Li-speaking area and to the north and east of that area, I had to refer to the wonderful but not entirely reliable map at Muturzikin. Unfortunately the boundaries on Muturzikin’s map are not entirely the same as those on Huffman’s, and even on Muturzikin’s map, it is sometimes not entirely clear what language is spoken within a particular boundary, so I have had to make some guesses in identifying all of these pockets as Mun-speaking.
  2. ^ The modern Chinese word for “Muslim” is Musilin, but the unreduplicated word Hui, which strictly speaking refers only to Chinese Muslims, is often colloquially used to refer to Muslims of any nationality.


Blench, R., 2013. The prehistory of the Daic (Tai-Kadai) speaking peoples and the hypothesis of an Austronesian connection. In Unearthing Southeast Asia’s past: Selected Papers from the 12th International Conference of the European Association of Southeast Asian Archaeologists (Vol. 1, pp. 3-15).

Michalk, D.L., 1986. Hainan Island: A brief historical sketch. Journal of the Hong Kong Branch of the Royal Asiatic Society, pp.115-143.

Norquest, P.K., 2007. A phonological reconstruction of Proto-Hlai. ProQuest.

Sagart, L., 2004. The higher phylogeny of Austronesian and the position of Tai-Kadai. Oceanic Linguistics, 43(2), pp.411-444.

Shintani, T., 1991. Preglottalized consonants in the languages of Hainan Island, China. Journal of Asian and African Studies, (41), pp.1-10.

Thurgood, G., 1992. The aberrancy of the Jiamao dialect of Hlai: speculation on its origins and history. Southeast Asian Linguistics Society I, pp.417-433.

Thurgood, G., 1992b. From Atonal to Tonal in Utsat (A Chamic Language of Hainan). In Proceedings of the Eighteenth Annual Meeting of the Berkeley Linguistics Society: Special Session on the Typology of Tone Languages (pp. 145-146).