Similarity Across Families and the Limits of Genetic Classification
by dchph
Basic word lists often reveal striking similarities across languages. Yet resemblance alone is not proof of genetic kinship. Vietnamese vocabulary demonstrates how contact, substratum, and cultural embedding produce overlaps that mislead classification.
Vietnamese core lexicon reflects a layered history of contact and inheritance. Beneath the later Sino‑Vietnamese overlay lies a substratum of indigenous and Mon‑Khmer elements, themselves intertwined with Yue and proto‑Vietic speech. These strata reveal that Vietnamese is not a simple offshoot of Mon‑Khmer, but a hybrid language shaped by multiple converging traditions in the Red River basin.
In several other articles, the author has shown how deeply Vietnamese is rooted in the Yue–Sinitic continuum, with disyllabicity and inversion serving as diagnostic tools for etymology. Yet this is only one side of the story. Vietnamese also bears the imprint of Mon‑Khmer migrations, whose vocabulary and cultural practices penetrated the Red River basin and left a lasting substratum in the language. Chapter 8 turns to this Mon‑Khmer association, examining how these layers interacted with the Yue foundation and the later Sinitic overlay to produce the complex etymological mosaic we recognize today.
Vietnamese emerges not as a late‑tonalized Mon‑Khmer language, but as a language forged at the crossroads of Yue, Chinese, and Austroasiatic influences. Its tonal system and basic vocabulary testify to a long history of interaction and parallel development with Chinese, demanding a reassessment of entrenched theories about its genetic affiliation.
The Austroasiatic Mon‑Khmer theory of Vietnamese‑Khmer affiliation has gained recognition for drawing attention to genuine lexical parallels between Vietnamese and southern Mon‑Khmer languages. By emphasizing shared vocabulary items, particularly in the semantic domains of agriculture, kinship, and daily life, the framework has underscored the importance of Austroasiatic influence in shaping the Vietnamese lexicon. This perspective has also provided a counterweight to Sinitic‑centered narratives, reminding scholars that Vietnam's linguistic heritage is layered and cannot be reduced solely to Chinese contact. In this sense, the Mon‑Khmer hypothesis has played a valuable role in broadening the scope of comparative research and situating Vietnamese within the wider Austroasiatic family.
At the same time, the Mon‑Khmer framework rests on a relatively narrow foundation, relying heavily on lexical comparisons drawn from southern vocabulary while giving less attention to phonological systems, morphosyntactic structures, and historical migration patterns. Such a selective approach risks overstating Khmer‑Vietnamese affinity while underrepresenting the influence of Yue substrata and Sinitic superstrata. For this reason, the theory warrants closer scrutiny and reevaluation within a broader historical and comparative context. A more comprehensive account of Vietnamese origins must integrate Austroasiatic evidence with Sinitic, Tai‑Kadai, and Yue elements, recognizing the complex interplay of contact, convergence, and inheritance that has shaped the language over millennia.
I) Similarity in cross‑linguistic‑family vocabularies proves no genetic relation
Superficial resemblance across languages has long tempted scholars to infer genetic kinship. Vietnamese, with its overlaps in Mon–Khmer, Tai, and even Indo‑European wordlists, is a case in point. Yet similarity alone is not evidence of inheritance. Many of these parallels are better explained as loanwords, substratal survivals, or cultural diffusion. Without systematic sound correspondences and historical context, resemblance remains a mirage, a crossroads where contact and coincidence masquerade as lineage.
-
Wordlists show parallels between Vietnamese and Mon‑Khmer, Tai, and even Indo‑European.
These parallels are often coincidental or due to borrowing.
-
Methodological caution: similarity must be tested against systematic sound change and historical context.
A central question in Vietnamese historical linguistics is whether the language should be regarded as a hybrid, layered through centuries of contact with neighboring peoples, or as a direct descendant of a single ancestral stock. One hypothesis posits that an ancestral root, which we may call Taic, gave rise to both the Yue and Daic linguistic families. These spread across southern China, including the northeastern Red River Basin of northern Vietnam, where aboriginal communities long cultivated irrigated rice.
Over time, large numbers of Mon‑Khmer speakers from what is now northern Cambodia and southern Laos resettled in the fertile delta. Their subsistence practices–hunting and shifting cultivation on dry fields–contrasted with the wet‑rice agriculture of the Yue. With them came Mon‑Khmer vocabulary, which penetrated the native Viet‑Muong speech and accounts for the many basic Mon‑Khmer words preserved in Vietnamese today. This ancient language was likely spoken by the people of the Phùng Nguyên culture and by the legendary subjects of the Hùng kings some three millennia ago. It would not have sounded like modern Vietnamese. (1).
The arrival of the Han in 111 B.C. profoundly altered this linguistic landscape. Old Chinese reshaped the Yue speech of the majority, leading to the split of the Viet‑Muong continuum into Mường and Vietic (early Annamese), both heavily infused with Chinese elements. From this entanglement of Yue, Mon‑Khmer, and Chinese influences, a distinct Sinitic‑based Vietnamese language began to crystallize by the 10th century.
Linguists who support the Austroasiatic hypothesis argue that Vietnamese descends from the Mon‑Khmer branch of the larger Austroasiatic family. They point to fossilized remnants in Mon‑Khmer that appear as substratal layers in Vietnamese, including a set of basic words that may have remained stable for as long as 15,000 years (Zachary Stieber, Ancient Languages Have Words in Common). (2)
II) Vietnamese at the crossroads
Vietnamese vocabulary often resembles Mon-Khmer forms, yet many of these items are attested in early Chinese glossaries as Yue loans. The following grid illustrates how resemblance can mask deeper strata of contact and convergence:
A. Agriculture & kinship
| Vietnamese | Mon-Khmer parallel | Chinese Cognate | Gloss |
|---|---|---|---|
| sông | Khmer srê 'stream' | 江 jiāng | 'river' |
| chuối | Khmer čluəy | 蕉 jiāo | 'banana' |
| gạo | Khmer srɔʔ 'paddy' | 稻 dào | 'rice' |
| cháu | Old Khmer cau | 侄兒 zhír | 'nephew' |
| năm | Old Khmer cnam | 年 nián | 'year' |
| bọt | Khmer babuh | 泡 pào | 'bubble' |
B. Anatomy
| Vietnamese | Mon-Khmer Parallel | Chinese Cognate | Gloss |
|---|---|---|---|
| bụng | Khmer pɔŋ | 腹 fù | 'abdomen' |
| tay | Khmer day | 手 shǒu | 'hand' |
| mắt | Khmer mat | 目 mù | 'eye' |
| răng | Khmer dɑŋ | 齒 chǐ | 'tooth' |
C. Household & utensils
| Vietnamese | Mon–Khmer Parallel | Chinese Cognate | Gloss |
|---|---|---|---|
| cửa | Khmer ko | 戶 hù | 'door' |
| nhà | Khmer nha | 家 jiā | 'home' |
| nồi | Khmer no | 鍋 guō | 'pot' |
| đèn | Khmer dɛn | 燈 dēng | 'lamp' |
Notes:
-
Diffusion vs. inheritance: These items show cultural diffusion (agriculture, kinship, daily life) rather than genetic descent. Cultural terms illustrate diffusion rather than inheritance.
-
Chinese dimension: Each Vietnamese form aligns with a Chinese cognate, underscoring Yue substratum and Sino‑Vietnamese layering.
-
Methodological caution: Without including Chinese forms, Mon-Khmer parallels appear stronger than they truly are. Examples of Vietnamese words resembling Mon‑Khmer but attested in Chinese glossaries as Yue loans.
-
On the sideline, polysyllabic grouping, as noted throughout all articles, will reveal structural consistency with Sino‑Vietnamese, not Mon‑Khmer.
The position advanced here, however, is that both Vietnamese and Mon‑Khmer may ultimately derive from a Yue‑related ancestral language, Taic. This proto‑language could have given rise not only to Tai‑Kadai (Ding Bangxin, 1977) (3) and Sinitic within the Sino‑Tibetan family, but also to Austronesian and Austroasiatic branches. (4)
The debate over the Austroasiatic origin of Vietnamese has persisted for more than a century. The prevailing view still classifies Vietnamese as a Mon‑Khmer descendant, citing numerous basic words scattered across Mon‑Khmer languages and even some cognates with Munda. Yet the deeper picture suggests a more complex genealogy, rooted in the linguistic mosaic of southern China and the Red River basin. (5)
With respect to those Austroasiatic languages, Norman (1988) noted that they "are spoken over a vast geographic range: the Munda languages in north Western India, Khasi in Assam, Palaung-Wa and Mon in Burma, the Mon-Khmer languages in Indo-China, Vietnamese and Muong in Vietnam [...] and were once spoken much more widely in China." (pp. 7-8)
Figure 1 - Visual view of linked kinship of Vietnamese
with other major linguistic families and their sub-strata
| Sino-Tibetan | Proto-Taic | ||||||
| Proto-Tibetan | Proto-Chinese | Yue | Austroasiatic | ||||
| Tibetan | Archaic Chinese | Proto-Vietic | Proto-Daic | Mon-Khmer | |||
| Old Chinese | Vietic | Proto-Muong | Tai-Kadai | Zhuang | Yao | ||
| Ancient Chinese | Proto-Vietmuong | Muong, Chac, Arem, Ruc, etc. | Daic | Dong, Miao | Mon-Khmer | ||
| Annamese | |||||||
| Middle Chinese | Vietnamese | Thai | Shui, etc. | Khmu, Riang, etc. | |||
| modern Chinese dialects | Laotian, etc. | etc. | Bahnar | Hrê, etc. | |||
Before we go on, it is worth mentioning here that in the early 20th century there existed a long-gone past trend for linguists to partake in the School of Prague on analysis of phonemic system and phonological description of languages for its simplicity in methods and procedures and without the need to learn the language; their focus on such practice suggested that the methodology was scientific. A renown linguist of our contemporary time, Bloomfield, for example, was able to describe and analyze the Tagalog language solely based on the basis of the information provided by one informant (Indo-Pacific, Part II, Descriptive Linguistics, or Lingua 15, 1963, p. 515).
It is therefore unsurprising that many early proponents of the Austroasiatic Mon‑Khmer hypothesis worked within this framework, often without first‑hand command of the languages they studied. Their analyses relied heavily on data gathered from local informants who themselves lacked linguistic training. By the 1960s, a generation of what the author calls "summer‑camp linguists" – researchers funded by short‑term grants from institutions such as the U.S. National Endowment for the Arts – conducted brief field trips in South Vietnam. Few of them achieved real mastery of the languages under investigation. The deficiencies of this approach are visible in their published work, which is marred by orthographic inconsistencies, misspellings, typographical errors, and mismatched cognate pairings.
Even as late as 1991, when Parkin classified Vietnamese (of the Viet‑Muong branch) as Austroasiatic, he acknowledged that "considerable controversy has surrounded the problem of the affiliation of Vietnamese." (Parkin, 1991. p. 89) His acceptance of Haudricourt's and Shorto's position formed the basis of his classification. In effect, the Austroasiatic view of Vietnamese origins, grounded in a relatively small set of presumed Mon‑Khmer ~ Viet‑Muong cognates, became entrenched among leading scholars. Their students, in turn, built upon this inherited foundation, treating it as a springboard for further hypotheses rather than re‑examining its premises.
Readers will have noticed that many of the etyma cited in this paper demand not only solid linguistic training but also a kind of "linguistic feeling", an intuitive sense that comes only with first‑hand experience in the target language. Such sensitivity is essential for engaging in the necessary exercise of "guesswork" (W) which allows us to appreciate how and where words have evolved. As King observed, "this procedure [guesswork] is not guaranteed to lead infallibly to the correct form of an innovation. But progress in historical reconstruction has always come from making guesses–not wild and unsupported guesses but those credible by considerations of simplicity and naturalness. In any case, the historical linguist usually has very little to lose and much to gain from pressing his reconstruction to the utmost in the directions of simplicity and naturalness" (1969: 164).
On the one hand, when a theory first proposed by a few prominent scholars gained enough traction to appear convincing, it was soon repeated by later entrants to the field. Many of these newcomers were not specialists in Vietnamese linguistics but simply adopted the prevailing view and echoed what others were already saying. In the early twentieth century, this meant embracing the newly theorized Austroasiatic Mon‑Khmer affiliation of Vietnamese with languages of Southeast Asia.
On the other hand, unlike the immutable laws of physics or astronomy, empirical sciences such as anthropology, history, and historical linguistics are always subject to revision. Newcomers entering the study of Vietnamese linguistics should therefore resist the temptation to follow the same well‑trodden path. Instead, they should chart new directions. The approach advanced here to explore Sinitic‑Vietnamese etymology with disyllabicity as the central focus opens a fresh realm of inquiry. Sound changes in disyllabic formations cannot be reduced to one‑to‑one correspondences between isolated syllables, and this recognition provides a more accurate framework for reconstruction.
Yet most newcomers in Vietnamese historical linguistics have begun from the Austroasiatic Mon‑Khmer baseline, a non‑historical theory shaped by misconceptions about monosyllabicity versus disyllabicity. This reliance stemmed from misperception, misinterpretation, lack of proficiency in the target languages, and an uncritical acceptance of early research simply because it was produced by renowned specialists at the dawn of Vietnamese linguistics. Their work became the foundation for subsequent studies, all moving in the same direction. In the mid‑twentieth century, it became fashionable to debate tonogenesis, a discussion initiated by Henry Maspero and André Haudricourt. Unsurprisingly, their views were revisited in the latter half of the 20th century by scholars such as Barker, Parkin, and Thomas, whose Mon‑Khmer lexical data continue to be cited.
The challenge for new scholars is to resist the dogmatism that discourages deviation from established premises. True progress requires decisiveness and the spirit of novelty. The message here is clear: newcomers should not simply follow "pre‑set premises" that have grown stale and unproductive. The pioneering works that once seemed innovative, linking Vietnamese to Austroasiatic or Mon‑Khmer, gained popularity because they were new at the time. But they no longer offer fresh insight.
Ultimately, the Austroasiatic Mon‑Khmer theory benefited from this cycle of repetition. Parkin (1991) paraphrased Maspero's argument that the absence of tonality in Mon‑Khmer languages was contradicted by the presence of Thai vocabulary in Vietnamese, where tonal words were treated as cognates. Maspero also pointed to other peculiarities (p. 89), even while accepting Haudricourt's proposal of a Mon‑Khmer substratum. Haudricourt, however, directly challenged Maspero's key claims. As Thomas summarized, "Maspero's examples of Thai‑Vietnamese cognates [were reinterpreted as] general Southeast Asian vocabulary, with correspondences between Vietnamese tones and Mon‑Khmer final consonants." Thus, "Maspero's key argument, that tones cannot be acquired by a language previously lacking them, is rejected" (p. 90). Haudricourt's view remains the one generally accepted today.(See Haudricourt's theory of tonal development in the next section.)
It often does not occur to novices in the field that what circulates online is not necessarily reliable scholarship. More often than not, it is only a summary of what has already been repeated elsewhere, not original academic work by serious researchers. This is precisely why books and peer‑reviewed publications in print continue to matter: they provide a stable, verifiable record that cannot be so easily altered or diluted by the churn of the internet.
III) Implications for comparative linguistics
-
Similarity is necessary but not sufficient for proving genetic
relation.
-
Vietnamese demonstrates how substratum and superstratum complicate
classification.
- Comparative grids must include Chinese forms to avoid false positives.
The Austroasiatic position rests largely on the cognateness of basic vocabulary shared between Mon‑Khmer and Vietnamese. The question of tonality remains relevant here: although the Mon‑Khmer equivalents under examination are toneless, in many cases they correspond to Vietnamese etyma that also align with Chinese cognates within a tonal framework. From the author's perspective, certain Mon‑Khmer items, identified by Maspero as of Thai origin, may in fact be loanwords from Vietnamese, re‑packaged with a tonal substitute such as a glottal stop [ʔ] after the original tone was lost.
Interestingly, the same items in Mường subdialects can appear with tones. More broadly, the Mường language has retained its tonal system despite prolonged contact with neighboring Mon‑Khmer groups. This persistence suggests that tonality in Mường, and by extension in Vietnamese, cannot be dismissed as a superficial borrowing but reflects a deeper structural feature. It is worth recalling that Mường is classified within the same family as Vietnamese.
There are relatively few true cognates between Vietnamese and Mon‑Khmer
basic vocabulary, and many of the Mon‑Khmer items cited as such rest on
dubious etymological foundations. Beyond the lexicons already listed in
this section, even the names of the twelve zodiac animals–chuột, trâu, cọp
(hùm), mèo, rồng, rắn, ngựa, dê, khỉ (vượn), chó, heo–illustrate the
problem. A handful of correspondences can be identified with more
certainty, for example: Old Khmer /cnam/ ~ VS 'năm' (year) 年 nián; Old
Khmer /cau/ ~ VS 'cháu' (nephew) 侄兒 zhír; Khmer /babuh/ ~ VS 'bọt'
(bubble) 泡 pào.
At the same time, there are numerous
fundamental Vietnamese words for which Mon‑Khmer provides no cognates at
all. Examples include 蓮藕 lián'ǒu ~ VS 'ngósen' (lotus stem); 'đồng' 田
tián (paddy field) ~ VS 'ruộng'; and 'đồng' 銅 tóng (bronze) ~ VS 'thau'.
These gaps underscore the limitations of the Mon‑Khmer hypothesis when
applied to the Vietnamese core lexicon.
The possibility that many so‑called basic cognates in Mon‑Khmer are in fact Vietnamese loanwords supports a reverse logic to Maspero's claim about the non‑inheritance of tones–namely, that tones could not be acquired naturally or intuitively by speakers of non‑tonal languages. A parallel phenomenon can be observed in Japanese and Korean, where Chinese loanwords appear without tones, even though we know from historical records that they were borrowed directly from Chinese during the Tang Dynasty (618-907 A.D.).
With this in mind, let us examine the nature of the Thai, Mon‑Khmer, Vietnamese basic vocabulary that undermines the postulations advanced by Maspero and Haudricourt, Maspero's thesis of Thai originality and Haudricourt's theory of tonogenesis. In what follows, the author will elaborate on each etymon, grouping them under a Sino‑Vietnamese label that accompanies each cited item. To begin, we may enumerate several Vietnamese words from Maspero's own examples (Études sur la Phonétique Historique de la Langue Annamite, 1952), which he classified as having a Mon‑Khmer substratum and Thai cognates. In each case, however, the author finds that they also display clear Chinese and Sino‑Tibetan correspondences:
A) Mon‑Khmer (items Maspero accepted as substratum in Vietnamese, following Haudricourt)
1. rừng 林 lín 'forest' (SV lâm)
- Derivation: M 林 lín < MC lim < OC ɡ·rɯm. Cf. OC srɯm (SV sâm, VS rậm). Cantonese /lam4/.
- Pattern: /l‑ ~ r‑/ parallels include 龍 lóng (SV long) ~ VS rồng 'dragon'; 蘢 lóng (SV long) ~ VS rậm 'dense'; 壟 lóng (SV long) ~ VS rẫy 'farming ridge'.
- Cognates: Burmese rum 'dense'; Kachin diŋgram2 'forest'; Lushei ram 'forest' (Starostin). Shafer: Sino‑Tibetan Luśei ram (p. 67); Central Branch: Kukis r2am, Ngente, Haka ram (p. 230).
- Mon‑Khmer parallels: Old Mon /grīp/, modern /gruip/; Danaw /pʿrɑ2bo4/; Riang White /priʔ/; Riang Black /prɪʔ/; Palaung /bréɪ2/; Wa /brɑʔ3/; Old Khmer /vraɪ/; Sakai /brɪ/; Besisi /ʾmbri/; Semang /těpɪʾ/; Srê /brɪ/; T'eng /brɪ/; K'mu /mprɪ/; Khasi /brɪ/; Mundari /bɪr/.
- Wiktionary: Etymologically from Proto‑Sino‑Tibetan rəm 'jungle, forest, country, field' (STEDT ram). Cognate with 森 (OC srɯm 'forest'), Mizo ram 'forest, country', Karbi ram 'jungle'. Alternatively, an areal word (Schuessler 2007), shared with Khmer រាម riəm 'jungle along a stream', Old Khmer rām 'inundated forest', Mon ရာံ rèm 'copse'.
2. áo 衣 yī 'shirt' (SV y)
- Derivation: M 衣 yī, yì < MC ʔiəi, ʔɨj < OC *qɯl, qɯls.
- Notes: Starostin: 'clothes, garment, gown'. As a verb, also ʔjəj‑s, MC ʔyj (FQ 於既), Pek. yì 'to wear'. Sometimes conflated with 依 ʔjə.
- Related form: 襖 ào (SV áo) 'coat'. Attested late (earliest in Shuowen Jiezi), possibly Austroasiatic in origin. Compare Proto‑Mon‑Khmer ʔaawʔ 'upper garment', whence VS áo, Mường ảo, Bahnar ao, Khmer អាវ ʼaaw, Pacoh ao.
3. chim 禽 qín 'bird' (SV cầm)
- Derivation: M 禽 (擒) qín < MC gim < OC ɡrɯm. Tang reconstruction: ghyim.
- Dialects: Cantonese kam4; Hẹ kim2; Tc ʑin12; Ôc ʑiaŋ12; Shuangfeng ʑin12.
- Classical sources: Shuowen defines 禽 as 'two‑footed creatures with feathers'; Kangxi cites multiple glosses, including 'bird and beast collectively'.
- Guangyun: 禽 琴 巨金 羣 侵B, MC gi̯əm.
- Starostin: Since Late Zhou, 禽 often used for 'wild bird(s)' ('something caught'), while 擒 is used for 'to catch, capture'.
4. lúa 來 lái 'unhusked rice' (SV lai)
- Derivation: M 來 lái, lài, lāi < MC ləj < OC mrɯːɡ. Tang reconstruction: ləi.
- Dialects: Cantonese lai4, loi4, loi6; Hẹ loi2.
- Shuowen: associates 來 with 麰 'barley/wheat'.
- Starostin: Shijing OC rjəs. MinNan forms: Jianou lej2, Jianyang le2, Shaowu li2.
- Wiktionary: 來 originally a pictogram of wheat, later borrowed for 'to come'. Related to 麥 (OC *mrɯːɡ 'wheat'). Cognate with Burmese လာ (la 'come'), Proto‑Vietic laːjʔ.
Notes: Vietnamese lúa is an archaic loan; regular Sino‑Vietnamese is đạo 稻. The irregular tonal development suggests a complex borrowing history, possibly involving an intermediate stage with ‑k > ‑ʔ.
5. ngày 日 rì 'day' (SV nhật)
- Derivation: M 日 rì, mì < MC ȵit < OC njiɡ.
- Dialects: Min forms–Xiamen tɕit8, lit8; Chaozhou zik8; Fuzhou nik8; Jianou ni8; Cantonese /jat8/, /jit8/.
- Sino‑Tibetan parallels: OB nyi‑ (nyin); Dwags nyen‑te; Old Kukish k‑ni; Luśei, Meithlei ni; Burmish ńi‑; Loloic ńi; Akha nẵ¯; Ulu nie. Baric: Bodo ‑ni, Dimasa ‑nai, Atong ‑ni, etc.
- Mon‑Khmer parallels (Luce): Old Mon /tŋey/, modern /tŋai/; Danaw /tsʿɪ1/; Riang White /sʿɤŋyiʔ/; Palaung /săŋɑ'i2/; Wa /ʃɪ4ŋɑiʔ3/; Old Khmer /tŋaɪ/; Sakai /těŋŋɪ/; Srê /ŋái/; K'mu /simyi/; Khasi /sngi/; War /juŋai/; Gadaba /sĩi/.
- Vietnamese variants: VS giời 'sun' < trời 'heaven, sky'.
B) Thai (Vietnamese words of Thai origin as posited in Maspero's list)
1. gà 雞 jī 'chicken' (SV kê)
Derivation: M 鷄 jī < MC kiej < OC *ke:
Phonetic pattern: /j- ~ g-/.
Examples: gàmái: 雞母 jīmǔ 'hen'; gàtrống: 雞公 jīgōng 'cock' (Cantonese, Minnan, including Hai.). Also gàmẹ: 母雞 mǔjī 'hen'; gàcồ: 公雞 gōngjī 'cock'.
Related correspondences: cf. jìn 近 (SV cận: gần), jì 記 (SV ký: ghi), jì 寄 (SV ký: gởi), jí 急 (SV cấp: gấp).
2. vịt 鵯 bēi (SV phi, thiết)
Derivation: M 鴄 pī, pǐ (phất, tiết) < MC pjie < OC *pʰid, now considered obsolete in Sinitic and Sino-Xenic; the common word for "duck" in modern Sinitic is 鴨 (OC *qraːb).
Wiktionary: Tai-Kadai: Proto-Tai *pitᴰ ('duck') > Thai เป็ด (bpèt), Lao ເປັດ (pet), Zhuang bit;
Proto-Vietic *viːt ('duck') > Vietnamese vịt; to which (Alves 2015) proposes a Tai origin;
Sino-Tibetan: Miju kɹɑi³⁵ pit⁵⁵ ('duck'); Pela pjɛ̱t⁵⁵ ('duck'), Zaiwa pje̱t⁵⁵ ('duck'); Proto-Lolo-Burmese *baj¹/² ('duck') > Burmese ဘဲ (bhai:).
Dialects: Cantonese 鵯 /bei1/, 鴄 /pat4/.
3. gạo 稻 dào 'paddy', 'rice' (SV đạo)
Derivation: M 稻 dào < MC daw < OC *l'uːʔ
Etymology: Area word (rice culture originated in the south). Often compared with Proto-Hmong-Mien *mbləu (“rice plant/paddy”), whence White Hmong nplej (Bodman, 1980). The relationship with similar-looking Mon-Khmer words is ambiguous (Schuessler, 2007). Ferlus (2010) proposes a connection to Proto-Austroasiatic *srɔ(ː)ʔ (“paddy”) (Sidwell's 2024 reconstruction; revised from Shorto's 2006 *sruʔ)
Proto-Austroasiatic: *sroʔ (“taro”) (Sidwell's 2024 reconstruction; revised from Shorto's 2006 *t₂rawʔ), as the two plants share the same farming niche.
Viet. lúa is an archaic loanword; regular Sino-Viet. is đạo. Protoform: *ly:wH (~ l^-), Meaning: rice, grain, Chinese: 稻 *lhu:? (~L^h-) rice, paddy, Burmese: luh sp. of grain, Panicum paspalum, Kachin: c^@khrau1 paddy ready for husking. Kiranti: *lV 'millet'
Alternative reconstructions: Sagart (2011) derives this word from 舀 (OC *lowʔ, *lu, *lo, “to scoop (hulled grain) from a mortar”). If so, since the Hmong-Mien comparandum only has the derived sense of “rice”, it would be borrowed from Chinese rather than the other way around. The native Min word 粙 may be a variant (Schuessler, 2007, apud Norman, p.c.), Schuessler: MC dâu < OC *gləwʔ or *mləwʔ. Starostin's posit of 稻 dào (SV 'đạo') as 'lúa' cited above.
3. cam 甘 gān 'sweet' (SV cam)
Vietnamese equivalent: 'ngọt' @ '𩜌 yuē (SV ngạt)'.
Derivation: M 甘 gān < MC kam < OC *ka:m; *OC 甘 甘 談 甘 kaːm; FQ 古三.
Phonetic pattern: /g- ~ ng-/.
Classical sources: Shuowen: 也。从口含一。一,道也。凡甘之屬皆从甘。古三切; Kangxi glosses include 'beautiful, sweet; one of the five tastes', fruit name (俗作柑 'cam'), herbs, and idiomatic uses.
Examples: 甘心 gānxīn (camtâm), 甘苦 gānkǔ (camkhổ), 甘泉 gānquán (camtuyền), 食不甘味 shí bù gān wèi ('ăn không thấy ngon'), 甘草 gāncǎo (camthảo).
Note: Maspero related the "cam" doublets to Daic languages such as Thai Blanc, Thai, Laotian, Ahom, Shan, etc.
4. cam 柑 gān 'orange' (SV cam)
Derivation: M 柑 gān < MC kam < OC *ka:m.
Gloss: Orange, Citrus nobilis (Han).
Etymology: 甘 (OC *kaːm, “sweet”) (Wang, 1982); in light of the citrus fruit's southern origin, possibly connected with Austroasiatic; compare Proto-Austroasiatic *ŋaːm (Schuessler, 2007).
5. cam 疳 gān 'infantile disease' (SV cam)
Derivation: M 疳 gān (historically linked to M 甘 gān) < MC kam < OC *ka:m.
Dialectal note: Hakka gam1.
Classical sources: Kangxi: 疳 as pediatric disease from eating sweet things; detailed traditional medical descriptions.
Example: 疳積 (gānjī, SV camtích, 'infantile disease').
6. cả 價 jià 'price' (SV giá)
Compound usage: 'giácả' 價格 (jiàgé, SV giácác, 'price').
Derivation: 價 jià, jiè, jie < MC ka < OC *krajʔs; related 賈 jià, jiă, gǔ (giá, giả, cổ).
Classical sources: Shuowen and Kangxi gloss 價 as 'value, price', with historical borrowing interplay between 價 and 賈; Guangyun gives 駕 古訝 for related readings.
Note: Maspero did not associate 'cả' with 價 or the disyllabic 價格, and thus posited a Daic origin; however, 'giácả' is of Chinese origin in formation.
C) Old Chinese (Vietnamese words of Thai origin by Maspero)
Maspero listed a number of Vietnamese words he believed to be of Thai origin. Haudricourt (1961: 51–52), however, argued that many of these are better understood as Old Chinese loans into both Vietnamese and Thai.
1. chèo 掉 diáo 'to row' (SV trạo)
Derivation: M 棹 (桌, 櫂) zhào, zhuō, zhuó (trạo, trác) < MC ɖaɨw < OC *rdeːwɢs
Starostin: originally written 櫂 (Late Zhou), reconstructable as ɬ(h)e:kʷ‑s. After Han, the reading shifted to d.(h)ie:\w (retroflex development in lateral hsieh‑sheng series), hence the later form 櫂 (attested since Jin).
Later Han reading: ɬ(h)e:kʷ, MC ḍạuk, Mand. zhuo 'a kind of bowl, vessel'.
Notes: VS chèo is colloquial; regular Sino‑Vietnamese is trạo.
Austric: Thai ʔcɛ:w.A 'to row', Khmer ce:w 'row, oar', Mon tasu 'paddle': phonology suggests a very late (post-MC) borrowing from Chinese for all these forms.
2. bè 筏 fá 'raft' (SV phiệt, VS phà)
Derivation: M 筏 fá < MC bʷiɐt, pwat < OC *pa:d, *bad || Cf. 'bắc' 艊 舶) bó (SV bạc) < MC baɨjk < OC *bra:g | Ex. 船舶. chuánbó. (thuyềnbè.) 'ships'
Note: 舶 (bó, VS 'bắc', 'large oceangoing ship') Japanese: びゃく (byaku) 'large oceangoing ship'
3. bánh 餅 bǐng 'bread, cake' (SV bính)
Derivation: M 餅 bǐng < MC pjɛŋ < OC *peŋʔ
Example: 白餅 báibǐng (VS bánhdày), cf. (from Teochow, literally 'bánhbao'), 包餅 (bāobǐng, SV bòbía, 'lapxuong tapioca spring roll'), also 'bánhpía' ('beancake')
Notes: Early Nôm attestations (Ngọc Nam Chỉ Âm, 16th c.) show alternation /baj2 ~ jaj2/.
Descendants:
- → Khmer: បាញ់ (bañ, 'cake, pastry')
- → Lao: ແປ້ງ (pǣng, 'flour; starch; powder')
- → Thai: แป้ง (bpɛ̂ɛng, 'powder; flour; starch')
- → Vietnamese: bánh ('pastry', 'cake', 'bread'), 'bánhpía' ('Suzhou-style mooncake')
4. tiếng 聲 shēng 'sound, voice, word, speech, language' (SV thanh)
Derivation: M 聲 shēng < MC ɕiajŋ < OC qʰjeŋ
Dialects: Cant. ʃieŋ21; Hainanese tje1; Amoy sɨŋ11 (lit.), siã11; Chaozhou siã11; Fukienese siŋ11 (lit.)
Classical sources: Shuowen defines 聲 as 'sound'; Kangxi cites multiple glosses including 'music', 'resonance', 'speech'.
Examples: 聲張 shēngzhāng (VS lêntiếng, 'to voice'), 聲名 shēngmíng (VS danhtiếng, 'renown').
Notes: VS tiếng reflects a colloquial development from the same root.
5. đũa 箸 zhú 'chopstick' (SV trợ, chừ, trừ)
Derivation: 箸 zhù, zhú, zhuó, zhuò < MC ɖɨə̆ < OC *tas, *das
Dialects: Hainanese /du2/.
Cultural note: Likely a Yue loan into Chinese. Chopsticks are tied to rice culture, which originated in the South (Hunan region). Northern Chinese, who did not cultivate rice early on, adopted the term later. To avoid the taboo 倒 dào (SV đảo, VS đổ 'overturn') in boat‑based cultures, southerners coined 筷 homophonous with 快 kuài (VS mau, 'fast'). However, it is also said that is homophony with 住 (zhù, 'stopping') in boatmen's language. Still used in almost all Min dialects and sporadically in other topolects, such as Southern Wu topolects including Wenzhounese.
6. nàng 娘 niáng 'miss', 'girl', 'she', 'mother' (SV nương)
VS variants: ná, nạ, nường.
Derivation: M 嬢 (娘) niáng < MC ɳɨaŋ < OC *naŋ
Dialects: Fukienese nuəŋ12; ZYYY niaŋ12; Amoy nĩu12; Chaozhou niẽ12; Shanghai niã32.
Related: 妳 nǐ (SV nhĩ). In Beijing colloquial 娘兒 niár 'mom'. It is suggested a loan from Old Turkic anaŋ (“your mother”), from Proto-Turkic *ana ~ *eńe (“mother”) (whence Turkish ana and Uyghur ئانا (ana)) and *-iŋ (“second person singular possessive suffix”), (Vovin and McCraw, 2011).
Notes: 'ná' ancient sound to call 'má' ('mom'). § 'Phậtthuyết': 'Chẳng biếtơn áng ná.' VS nạ preserves the older sense 'mother'.
Descendants:
- → Khmer: នាង (niəng, 'young woman; girl')
- → Lao: ນາງ (nāng, 'woman; girl; lady; Mrs.')
- → Thai: นาง (naang, 'woman; wife; female lover')
- → Vietnamese: nàng ('lady; young woman; she')
7. mèo 貓 māo 'cat' (SV miêu)
Derivation: M 貓 (猫) māo, máo < MC miaw, maɨw < OC *mrew, *mreːw
Related: 卯 mǎo (SV mão, VS mẹo).
Example: 卯年 mǎonián ~ VS nămmèo or nămmão ('Year of the Cat').
Note: In the Vietnamese zodiac, 卯 corresponds to the cat, not the rabbit (兔年 tùnián, SV Thốniên, VS nămThỏ). 卯年 (Mǎo year) is interpreted as the “Year of the Cat,” whereas in China it became the “Year of the Rabbit.” The confusion stems from the phonetic similarity between 貓 (māo, 'cat') and 卯 (mǎo), with 卯 functioning as a phonetic substitute. Because cats were considered inauspicious in Chinese belief, 'Year of the Cat' (貓歲 māosuì, SV miêutuế) was reinterpreted and misread as “Year of the Rabbit” (卯兔 mǎotù, SV mãothố).
D) Additional items (Haudricourt's claims of Austroasiatic loans in Thai)
In addition to Maspero's cited examples, Haudricourt (1961) identified several more Vietnamese words that he described as Austroasiatic loans into Thai. Amusingly, each of these also shows clear cognacy with Chinese forms:
1. bụng 腹 fù 'abdomen' (SV phục)
Derivation: M 腹 fù < MC puwk < OC *pug
Phonetic shifts: OC p‑ > VS b‑; M f‑ > VS b‑.
Comparative data: Tibetan (W) ze‑a~bug ''maw, fourth stomach of ruminants'; Burmese pjəuk 'belly', 'stomach'; Lushei KC puk; Lepcha ta‑fuk, ta‑bak 'abdomen'; Kiranti ʔpo/k. Also Sho puk; Kham phu 'belly'; Gyarung tepok.
Sino‑Tibetan: From Proto-Sino-Tibetan *d-puːk ('belly; vitals; hollow object; cave'); cognate with 𥨍 ('cave'), Tibetan ཕུགས (phugs, 'innermost parts'), Burmese ဗိုက် (buik, 'belly'; 'pregnancy'), အပေါက် (a.pauk, 'hole'), Chepang तुक् ('belly'; 'stomach'), Proto-Bodo-Garo *bi(ʔ)-buk ('guts'), Cogtse Situ /tə-pōk/, 'belly', Brag-bar Situ, /tə-vōk/, 'belly'), Proto-Tani *puk ('heart') (STEDT; Schuessler, 2007; Zhang, Jacques, and Lai, 2019).
Also compare Austroasiatic words: Proto-Mon-Khmer *bo()k ('belly'), Khmer ពោះ (pŭəh, 'belly'), Vietnamese bụng ('belly') (Shorto, 2006; Schuessler, 2007).
2. nghe 聽 tīng 'hear' (SV thính)
Derivation: M 聽 (听) tìng, tīng < MC tʰɛjŋ < OC *l̥ʰeːŋ, *l̥ʰeːŋs
Dialects: Hainanese /k'ɛ1/; Amoy thiɛŋ11, thiã11; Chaozhou thiã11.
Sound correspondences: /t‑, d‑ ~ ng‑/, e.g. 停 tíng (SV đình) ~ VS ngừng ('pause'); 短 duǎn (SV đoản) ~ VS ngắn (short')
Notes: 聞 wén (VS nghe, 'hear') may underlie VS ngửi 'smell' as a later semantic development. Cf. 門 mén ~ VS ngõ ('gate')
Example: 聽話 tīnghuà: nghelời ('obey'), 聽說 tīngshuō: nghenói ('hearsay'), 凝聽 níngtīng: nghengóng ('listening'), 聆聽 língtīng: lắngnghe ('listen attentively'), etc.
3. cổ 胡 hú 'neck, dewlap' (SV hồ, SV cổ, cồ)
Derivation: M 胡 hú < MC ɦɔ < OC *ga:
Dialects: Cant. wu4; Hakka fu2. Tang reconstruction: /ho/, Proto-Tai *ɣo:ᴬ.
Classical sources: Shuowen defines 胡 as 'dewlap of cattle'; Kangxi glosses include 'throat', 'neck', 'dewlap', 'longevity'.
Proto-Vietic *koh ('throat'; 'neck'), from Proto-Austroasiatic *kɔːʔ ('neck') (Sidwell, 2024). Cognate with Tho (Cuối Chăm) kɔː⁵, Khmer ក (kɑɑ), Bahnar hơko, Mon ကံ.
Comparative: Tibetan kru‑kru 'windpipe'; Kachin z^jəkhro1 'throat', 'gullet',
Notes: VS cổhọng ~ cuốnghọng 胡嚨 húlóng ~ 喉嚨 hóulóng ('throat'). Modern M 脖子 bózi corresponds to VS cáicổ. Vietnamese compounds like cổchân 'ankle' (lit. 'neck of the foot') reflect the same semantic extension.
4. cằm 頷 hàn, 'chin', 'jowl' (SV hàm, VS cằm, ngậm)
Derivation: M 頷 hàn, ǎn, hán < MC ɦəm < OC *ɡɯːm, ɡɯːmʔ
Etymology: From Proto-Sino-Tibetan *mV-qəm ('jaw'; 'chin'; 'molar') (STEDT under *gam). Bodman (1980) considers it to be the endoactive of 含 (OC *ɡɯːm, 'to hold in the mouth'), literally 'the thing that holds something in the mouth'. Starostin: glossed as 'chin', 'lower jaw' (Late Zhou). Within Chinese, cognate with 函 (OC *ɡuːm, *ɡruːm, 'to contain; box; letter') (Schuessler, 2007). 銜 (OC *ɡraːm, 'to carry in the mouth; horse's bit') is probably related.
Dialects: Amoy, Teochew am4.
Notes: Modern M 下巴 xiàbā (SV hạba) is the standard Mandarin word for 'chin'. Vietnamese cằm may derive from a disyllabic MC form /xaba/ > /χamba/ > /kamba/ > /kamɓ/ > /kăm/ through epenthesis and labial conditioning.
5. cà 茄 qié, 'eggplant' (SV già, VS 'cà')
Derivation: M 茄 qié < MC kaɨ, gɨa < OC *ga, *gal, *kra:l
Dialects: Cant. khe12; Amoy khe11, kio12; Chaozhou kie12; Fuzhou kia11; Shanghai ka32.
Etymology: Attested very rarely and late, earliest in the 59 BCE 'Slave's Contract' (《僮約》) by Wang Bao (王褒) (Wang et al., 2008): '二月春分,……別 茄 披蔥. 'In the second month of the year, the Spring Equinox […] separate and transplant seedlings of eggplant and scallion.' Alves (2022) relates this to Proto-Vietic *gaː (whence Vietnamese cà), which he considers an early Chinese loanword. Per Starostin, earliest meaning was 'lotus stalk' (OC kra:j, MC ka). The sense 'eggplant' is attested from Jin.
Notes: The MC reading ga is exceptional and may be dialectal. Vietnamese cà is colloquial; regular SV is già. Likely a Yue loan into Chinese, since eggplant was not native to northern China. Compare 西紅柿 xīhóngshì and 番茄 fānqié ('foreign egg‑fruit') for later introductions like the tomato.
Despite the extensive examples laid out above, both Maspero and Haudricourt overlooked the possibility that nearly all the cited items may trace back to Chinese cognates. If we are to consider the dichotomy between their respective views–regardless of what kind of relationship the etyma might suggest–the core question remains unchanged: whether these words were borrowed from Chinese into Vietnamese, from Vietnamese into Chinese, or whether they stem from a shared ancestral source. This ambiguity persists across cases like lúa ~ gạo 稻 dào 'paddy (rice)' and cà 茄 qié 'eggplant', especially when viewed alongside other items such as đường 糖 táng 'sugar', voi 為 wēi 'elephant', chuối 蕉 jiāo 'banana', dừa 椰 yé 'coconut', chó 狗 gǒu 'dog', and sông 江 jiāng ''river'. These, and dozens of other foundational words, consistently point to a Yue substrate–many of which also show cognacy with Austroasiatic and Austronesian forms, as noted by other scholars (see Mon-Khmer And Vietnamese Basic Words list.)
In the case of Chinese and Vietnamese, whenever correspondences appear in their vocabularies, the likelihood is strong that they are related to one another rather than to any outside language. Their contact history stretches back more than 2,250 years BP, at least from the pre-Han period onward. Whether the direction of borrowing was from ancient Chinese into Vietnamese or the reverse, the relationship is evident in shared items such as the names of the twelve animals of the zodiac, which correspond to the Earthly Branches.
The Chinese characters that represent these words today are later developments. Each is built on the structural pattern {radical + phonetic}, where the radical functions as the semantic indicator. They are not the original ideographs of the earliest stage, such as 火, 日, 刀. This fact opens the possibility that some basic nominals in Vietnamese may predate the script and reflect Yue or southern sources. As a matter of fact, the Annamese did not need to wait until the twelfth century to know how to pronounce intimate, everyday words with tones. On the contrary, the evidence suggests that many fundamental items may have been Yue loanwords into Chinese. Examples include:
-
豆 (dòu, nồi, 'pot') [ phonetic loangraph of base meaning for 'bean' 荳, that still exists. ]
-
弩 (nǔ, ná, 'crossbow') [ > VS nỏ. M 弩 nǔ < MC nuo < OC *naːʔ. According to Starostin, Viet. ná is an archaic loanword; a later borrowing from the same source is Viet. nỏ. Standard Sino-Viet. is nỗ. In Chinese, 弩 is attested since Late Zhou (Zhouli). Already in Shujing appears 砮 *n(h)āʔ, *n(h)ā, MC nó, no, Mand. nǔ, Viet. nỗ 'flint arrowhead', likely the same root. For *nh- cf. Xiamen lɔ6, Jianou noŋ8. ]
-
舟 (zhōu, ghe, 'boat') [ M 舟 zhōu (chiêu, châu, chu) < MC tɕɨu < OC *tjɯw, also compare 舠 dāo 'boat' and 刀 dāo 'knife'. The southern Jiangnan people were renowned for water navigation. ]
-
舠 (dāo, tàu, 'boat') [ M 舠 dāo < MC taw < OC *ta:w. Cf. 刀 dāo 'knife'. According to Schuessler (2007), a loan from Proto-Mon-Khmer *ɗuuk ~ ɗuk 'boat, canoe', whence Khmer ទូក (tuuk) and Vietnamese nốc (< Proto-Vietic ɗoːk 'boat'). Possibly cognate with 輈 (OC tɯw 'trunk, pole'). Yang Xiong's Fangyan notes 舟 (OC tjɯw) was common in central and eastern China, while 船 (OC ɦljon) was used in the west. ]
-
船 (chuán, thuyền, 'ship') [ > VS xuồng 'small boat'. Note 駕船 (jiàchuán, láithuyền, 'steer a boat', a cognate of chèothuyền), where the signific 馬 mă 'horse' reflects the nomadic north. In the south, however, water-savvy natives coined words with 掉 diáo 'to row' (SV trạo, VS chèo, cf. 櫂 zhào VS chèo 'oar'). ]
-
井 (jǐng, giếng, 'well') [ M 井 jǐng < MC tsiajŋ < OC *skeŋʔ | Note: It might have been difficult to dig in the northwest where proto-Chinese first arose. ]
-
耕 (gēng, cày, 'plow') [ M 耕 (畊) gēng < MC kəɨjŋ < OC *kre:ŋ | cf. SV canh. Southern peoples excelled in wet-rice cultivation. ]
-
種 (zhòng, trồng / zhǒng, giống, 'plant, seed, breed') [ M 種 zhǒng, zhòng, chóng (chủng, chúng, chùng) < MC tɕiowŋ < OC *tjoŋʔ, *tjoŋʔs | Cf. SV chủng. An Chi (2016, vol. II) even boldly suggested a link with trứng 'egg', though the proper form is 蛋 dàn (SV đản) ]
-
銅 (tóng, thau, 'bronze') [ M 銅 tóng < MC dəwŋ < OC *do:ŋ | Cf. SV đồng 'copper'. The Yue were famed for bronze drums and advanced metallurgy. ]
-
鋤 (jǔ, cuốc, 'hoe') [ M 鋤 chú, zhù, jǔ < MC dʐɨə̆ < OC *zra | Note: Advanced bronze work likely led to iron extraction and metallurgy as well. ]
-
鋸 (jū, cưa, 'saw') [ M 鋸 jù, jū (cứ, cư) < MC kɨə̆ < OC *kas | Note: Attested in early Chinese texts (e.g. Shuowen, Hanshu), often paired with 刀 'knife' as 刀鋸. ]
These examples, among others, suggest that many of the most basic Vietnamese words have deep roots in the Yue substratum, layered with borrowings and convergences across Chinese, Austroasiatic, and Austronesian spheres.
The genetic affiliation between Chinese and Vietnamese basic words is further affirmed by the theory that the ancient Yue language contributed significantly to proto-Chinese. As the nomadic ancestors of the Chinese expanded east and south, it is plausible that they borrowed many words from the Yue, whom they regarded as southerners outside their cultural sphere. Around 5,000 years ago, when the so-called pre-Chinese were still nomads on horseback before the founding of the Xia Dynasty, the Yue had already mastered wet-rice cultivation, river navigation, and seafaring. Their influence likely extended into the southward dispersal of the Yue and their Austronesian peoples, with many words originating in South Chin, so to speak.
Early Yue tribesmen (百越 BaiYue, SV BáchViệt, 'Bod') cultivated the fertile lands along both banks of the Yangtze River, where the states of Shu 蜀, Chu 楚, Wu 吳, and Yue 越 later flourished. As populations expanded across regions before the Qin (秦, SV Tần, 'Chin') unified them into what became 'China', Yue loanwords naturally slipped into the speech of many communities. This influence is especially visible in the adoption of the Yue zodiac system of twelve animals, paired with the Earthly Branches: '子 zǐ, 丑 chǒu, 寅 yǐn, 卯 mǎo, 辰 shěn, 巳 sì, 午 wǔ, 未 wèi, 申 shēn, 酉 yǒu, 戌 xù, 亥 hài'. These correspond to Vietnamese basic words for the same animals: chuột, trâu, cọp, mèo, rồng, rắn, gà, chó, heo, and others.
One may ask why the pre-Chinese or ancient Vietnamese, who already possessed their own words for these animals, would borrow them from another source. A likely explanation is that such borrowings served spiritual or ritual purposes, whether for the pre-Qin Chinese or for later Vietnamese. In fact, the entire set was reintroduced into ancient Vietic as the Sino-Vietnamese forms tý, sửu, dần, mẹo, thìn, tỵ, ngọ, mùi, thân, dậu, tuất, hợi, respectively, through Early Middle Chinese.
Similarly, these sounded more elevated or scholarly to the masses, much as modern Vietnamese still borrow Sino-Vietnamese terms for the Western Horoscope, e.g. Bạchdương (白羊 Băiyáng) for 'Aries', Kimngưu (金牛 Jīnníu) for 'Taurus'. Such names carry an academic aura precisely because they are less transparent to everyday speakers.
For the early pre-Chinese, however, alternate pronunciations of the zodiac animals may not have sounded very different from their own words. Otherwise, they would not have needed to substitute 'cat' 卯 (VS mèo) and 'goat' 未 (VS dê /je1/) with 'rabbit' 兔 tù (VS thỏ) and 'sheep' 羊 yáng (VS dê). These substitutions likely reflected cultural sentiment: the Chinese were superstitious about cats, while their northern culture centered on sheep-herding, in contrast to the southern reliance on water buffalo (丑 chǒu, VS trâu) and pigs (亥 hài, VS heo). The point remains that the twelve zodiac animals were cognates across both traditions in antiquity.
The Sino-Vietnamese zodiac set, which made a round trip back into Vietnamese, illustrates the coexistence of at least two layers of nominals. This supports the hypothesis that many other basic Chinese words may have evolved from what Norman (1988:17) called "an already extinct foreign source," apart from the common etyma shared with Tibetan. That foreign source may have been the Yue substratum, which also shaped the Vietic language. It was from this base that the Yue (百越, 'Bod'), Chu (楚國), and Zhou (周朝) emerged some 3,000 years ago, possibly with contributions from Taic elements. The term "proto-Chinese," as used here, refers to the racially mixed groups who had not yet blended with all the indigenous peoples before their southern expansion.
Regular lexical interchange is another indicator of affiliation. Many core words are cognate not only between Chinese and Vietnamese but also across Sino-Tibetan. Basic vocabulary does not appear exclusively in Mon-Khmer. For instance, 娘 (niáng, SV nương) corresponds to nàng ('girl') and nạ ('mother'), while 爹 (diè, SV giả) corresponds to both tía ('daddy') and cha ('father)'. Such parallels raise the question: is Vietnamese truly a Mon-Khmer language?
The Vietnamese words shared with Mon-Khmer are fewer, and their similarity may reflect cultural influence rather than genetic inheritance. The Khmer Kingdom was once a dominant power in Southeast Asia, and influence often flows from stronger to weaker states. Later, as the southern state of ĐạiViệt expanded, both Champa and Khmer were absorbed, and their linguistic elements blended into Vietnamese. This reflects a broader anthropological pattern: the dominant polity shapes the linguistic landscape. With annexed territories came new populations and speech forms, which merged with Vietnamese and evolved into a new entity. (6).
After the decline of Cambodia's ancient Khmer Empire, the Annamese realm, by contrast, expanded in size, ambition, and aggressiveness. Over the following millennium of sovereignty, Annam not only eradicated the Kingdom of Champa to its southern border but also absorbed much of the eastern flank of Cambodia's former territories.
In today's Vietnam, as one travels further south, one encounters placenames such as Phanrang, Phanrí, Sóctrăng, and others that stand in contrast to the ancient Vietnamese toponyms of the far north. There, deeply rooted Sino-Vietnamese etyma have long been embedded in local names. For example, the prefix Kẻ- ('market', 'city') appears in Kẻchèm, interchangeable with today's SV Từliêm 慈廉 Cílián; in Kẻchợ ~ 市街 Shìjiē; in Kẻbảng ~ 棒街 Bàngjiē; and in Kẻon ~ 峴港 Xiàngăng. Similarly, Chằm- ('marsh') corresponds to 澤 zé (SV trạch), as in Chằm Dạtrạch ~ 夜澤 Yèzé or Chằmdơi ~ 蝠澤 Fúzé. These names reflect an older stratum of settlement and linguistic layering.
In terms of racial composition, later migrants who resettled in the south inevitably intermarried with local populations, producing mixed descendants. This process mirrored earlier developments in the north, where the growth of both southern China and ancient Vietnam was marked by continual blending of peoples.
As will be seen in later chapters, Vietnamese and Chinese share most of their basic vocabulary with Sino-Tibetan etymologies. Yet when scaled down, only a few dozen cognates overlap with Mon-Khmer, forming a small subset of a much larger union that includes possible Chinese affiliation. Many of the proposed Khmer-Vietnamese cognates may in fact derive from the same roots that also gave rise to ancient Chinese. With so many items in both Vietnamese and Chinese demonstrably cognate, the real question is whether these are cases of genetic affiliation within the same linguistic family or simply straightforward loanwords. Without critically basic items such as 頭 (tóu, đầu, 'head'), 胡 (hú, cổ, 'neck'), 目 (mù, mắt, 'eye'), 翁 (wēng, ông, 'grandfather'), 婆 (pó, bà, 'grandmother'), 父 (fù, bố, 'father'), 母 (mǔ, mẹ, 'mother'), 兄 (xiōng, anh, 'older brother'), 姊 (zǐ, chị, 'older sister'), 妹 (mèi, em, 'younger sister'), 家 (jiā, nhà, 'home'), 戶 (hù, cửa, 'door'), and others, the ancient Annamese language could not have existed at all if these were merely Chinese loanwords. If they were, the language would have to be considered a case of pidginization or even creolization, arising to meet the communicative needs of Chinese immigrants who followed in the wake of the Han conquest.
It is more likely, however, that genetic affiliation was the true case. From the dawn of humanity, nothing is closer than kinship. In the deepest lexical stratum, we find a small number of words of mixed origin, including Austroasiatic Mon-Khmer and Sino-Tibetan stocks, or more precisely, cognates of roots yet to be fully identified. Given the spread of language contact across space and time, whether in wave-like or ripple-like patterns, the etyma listed above appear to have originated either within the Sino-Tibetan family or from common Taic-descendant forms, such as Yue languages (Cantonese, Hokkien, etc.) that emerged after the break-up of Taic into Tai-Kadai and Yue branches.
This postulation suggests that Austroasiatic peoples themselves may have diverged from Taic aboriginals in southern China. Later, when new waves of mixed northern resettlers, such as Yue-mixed Han Chinese, moved south into ancient northern Vietnam, they displaced Muong and other indigenous groups, pushing them closer to Mon-Khmer speakers who had migrated from the southwest centuries earlier (Nguyễn Ngọc San 1993). Through such contact, basic words could have entered Vietnamese, especially since Muong minorities maintained constant interaction with Kinh lowlanders in trade and social life. Indeed, King Lê Lợi, who expelled the Ming occupiers after twenty years of harsh rule in the fifteenth century, was himself likely of Muong origin.
Linguistically, this proposition cannot be dismissed. Many basic words appear in one Mon-Khmer language but are absent in others, while the same words are found in both Vietnamese and Chinese, traceable to earlier historical periods. The reverse scenario, deriving Vietnamese from Mon-Khmer alone, does not hold when considering the time frame of Khmer-Vietnamese cognates. The persistence of Mon-Khmer words in Vietnamese, after filtering out all Chinese-Vietnamese commonalities, suggests that what remains may stem from a mixed stock of indigenous and proto-Viet-Muong lexical seedlings. These remnants, preserved in Muong, reflect the shared heritage of Viet and Muong before their linguistic split, just as their speakers diverged biologically, some mixing with Han, others with Mon-Khmer. It is also possible that Viet-Muong words re-entered Mon-Khmer languages, since their speakers may have originally migrated into the Red River Delta from the southwest (Nguyễn Ngọc San 1993).
The similarities between Chinese and Vietnamese are thus parallel, concurrent, and plausible, without requiring detailed discussion of shared features such as tonality and phonology. If we continue tracing beyond what Maspero and Haudricourt (1954) provided through Old Chinese reconstructions and tonegenesis based on Annamese as further evidence emerges. As Shafer's Sino-Tibetan etymologies will show in the next chapter, many more Vietnamese words can be related to Chinese, often surfacing spontaneously in the mind of the researcher, confirming the depth of their historical connection.
Figure 3 – View of the hypothesis of lexical interpolation of respective languages
| Tibetan | Unknown extinct foreign elements before the Chinese | Mon- | ||||||||
| Chinese | Zhuang, Miao, Yao, etc. | Vietnamese | Mường | Khmer | ||||||
Conclusion
Similarity across linguistic families is a crossroads, not a proof of kinship. Vietnamese basic vocabularies demonstrate that resemblance alone cannot establish genetic relation; only systematic sound change and historical context reveal true inheritance. This cautionary principle extends beyond Vietnam, reminding comparative linguists that substratum, contact, and coincidence often masquerade as lineage.
References
Foundational works
-
Terrien de Lacouperie. The Languages of China Before the Chinese: Researches on the Languages Spoken by the Pre‑Chinese Races of China Proper Previously to the Chinese Occupation. London: D. Nutt, 1887; Taiwan reprint, 1966.
-
Swadesh, Morris. Lexico‑Statistical Dating of Prehistoric Ethnic Contacts. Proceedings of the American Philosophical Society 96(4), 1952.
Austroasiatic / Mon–Khmer
-
Thomas, David D. Basic Vocabulary in Some Mon‑Khmer Languages. Mon‑Khmer Studies (1960).
-
Sidwell, Paul. Austroasiatic Dataset for Phylogenetic Analysis: 2015 Version. Mon‑Khmer Studies 44. Mahidol University / SIL International.
-
Alves, Mark J. An Updated Overview of the Austroasiatic Components of Vietnamese. Languages 9(12), 377 (2024).
Vietnamese origins
-
Alves, Mark J. Linguistic Research on the Origins of the Vietnamese Language: An Overview. Journal of Vietnamese Studies 1(1–2), 2006.
-
Ha, Le Thanh. Code Mixing and Loan Words in the Vietnamese Vocabulary. Eurasian Journal of Applied Linguistics 8(1), 2022.
Sino–Vietnamese layering
-
Alves, Mark J. Sino‑Vietnamese Grammatical Vocabulary and Sociolinguistic Conditions for Borrowing. SEALS 17 Proceedings.
-
Sa, Quoc Hoang. Study on the Understanding and Use of Sino‑Vietnamese Words: Perspectives from Secondary School Students in Ho Chi Minh City. Sprin Journal of Arts, Humanities and Social Sciences 4(5), 2025.
-
Sino-Vietnamese Vocabulary. Wikipedia (overview of Sino‑Vietnamese morphemes and their role in Middle Chinese reconstruction).
Comparative methodology
-
Campbell, Lyle. Historical Linguistics: An Introduction. Edinburgh University Press, 2013.
-
Haspelmath, Martin. Comparative Linguistics and the Problem of Spurious Similarities. Linguistic Typology 9(1), 2005.
FOOTNOTES
(1)^ "The Phùng Nguyên culture of Vietnam (c. 2,000-1,500 B.C. ) is a name given to a culture of the Bronze Age in Vietnam during the Hong Bang Dynasty which takes its name from an archeological site in Phùng Nguyên, 18 km (11 mi) east of Việt Trì discovered in 1958. It was during this period that rice cultivation was introduced into the Red River region from southern China. The most typical artifacts are pediform adzes of polished stone." Source (as of March 2018): https://en.wikipedia.org/wiki/Phùng_Nguyên_culture
(2)^ List of he 23 identified fundamental basic words for which we
could plug in all Vietnamese and Chinese cognates into place without
much difficulty. Let's save this for worksheet practice in the end, and
wait and see what the Austroasiatic Mon-Khmer camp will come up with.
- Thou:_____________
- Not:______________
- To give:___________
- Man/male:_________
- Mother:___________
- Bark:_____________
- Black:____________
- I:________________
- That:_____________
- We:______________
- Who:_____________
- This:_____________
- What:____________
- Ye:______________
- Old:_____________
- To hear:__________
- Hand:____________
- Fire:_____________
- To pull:___________
- To flow:__________
- Ashes:____________
- To spit:___________
- Worm:___________
See "Ancient Languages Have Words in Common" by Zachary Stieber, Epoch Times (May 6, 2013).Source (as of Jan. 2017): http://www.theepochtimes.com/n3/42284-ancient-languages-have-common-words-in-common/
(3)^ As previously discussed, those Cantonese and Hokkien subdialects
of the common ancestral Yue language are officially classified as of
Sino-Tibetan language family.
[In the meanwhile, ] the
Tai–Kadai languages, also known as Daic, Kadai, Kradai, or Kra–Dai, are a
language family of highly tonal languages found in southern China and
Southeast Asia. They include Thai and Lao, the national languages of
Thailand and Laos respectively. There are nearly 100 million speakers of
these languages in the world. Ethnologue lists 95 languages in this
family, with 62 of these being in the Tai branch.
The diversity
of the Tai–Kadai languages in southeastern China, especially in Guizhou
and Hainan, suggests that this is close to their homeland. The Tai branch
moved south into Southeast Asia only about a thousand years ago, founding
the nations that later became Thailand and Laos in what had been
Austroasiatic territory.
[...] The Tai–Kadai languages were
formerly considered to be part of the Sino-Tibetan family, but outside
China they are now classified as an independent family. They contain large
numbers of words that are similar in Sino-Tibetan languages. However,
these are seldom found in all branches of the family, and do not include
basic vocabulary, indicating that they are old loan words.
Several
Western scholars have presented suggestive evidence that Tai–Kadai is
related to or a branch of the Austronesian language family. There are a
number of possible cognates in the core vocabulary. Among proponents,
there is yet no agreement as to whether they are a sister group to
Austronesian in a family called Austro-Tai, a backmigration from Taiwan to
the mainland, or a later migration from the Philippines to Hainan during
the Austronesian expansion.
The Austric proposal suggests a
link between Austronesian and the Austroasiatic languages. Echoing part of
Benedict's conception of Austric, who added Tai–Kadai and Hmong–Mien to
the proposal, Kosaka (2002) argued specifically for a Miao–Dai family.
In
China, they are called Zhuang–Dong languages and are generally considered
to be related to Sino-Tibetan languages along with the Miao–Yao languages.
It is still a matter of discussion among Chinese scholars whether Kra
languages such as Gelao, Qabiao, and Lachi can be included in Zhuang–Dong,
since they lack the Sino-Tibetan similarities that are used to include
other Zhuang–Dong languages in Sino-Tibetan.
[...]
Tai–Kadai
consists of five well established branches, Hlai, Kra, Kam–Sui, Tai, and
the Ong Be (Bê) language:
- Ong Be (Hainan; Lin'gao (臨高) in
Chinese)
- Kra (called Kadai in Ethnologue and Gēyāng (仡央) in
Chinese)
- Kam–Sui (mainland China; Dong–Shui (侗水) in Chinese)
-
Hlai (Hainan; Li (黎) in Chinese)
- Tai (southern China and Southeast
Asia)
(Source (as of Jan. 2017: https://en.wikipedia.org/wiki/Tai%E2%80%93Kadai_languages
(4)^ As for that modern broad grouping of languages in the Austroasiatic linguistic family, except for the same concept that is used to refer to a smaller scale of a linguistic sub-family to include only those Mon-Khmer languages while separately the Vietnamese language and its Vietic sibling descents, e.g. Muong, Tha, Vung, Ruc, etc., all originated from those ancestral speeches which originated from a proposed ancient proto-Taic language – which "were once spoken much more widely in China" (Norman, ibid.) – and that their variants have been explicitly referred to as remotely diverged from Taic forms that gave birth to the Yue languages which in turn gave rise to all those contemporary languages that are classed as of Sino-Tibetan linguistic family, such as Cantonese and Hokkien dialects. That is how nominally the Yue languages have come to fit into a much larger picture. Note that "Vietnamese" and "Muong" are specifically not grouped into the Mon-Khmer languages (Norman, ibid.), which indicates that Norman was also aware of the problems in their affirmative classification.
(5)^ Without the mastery level of "linguistic
feelings" that a specialist needs with near native level of the target
language due to lack of first-hand experience in modern Chinese, both
standard and colloquial, they would never know the roots of many
Vietnamese words such as:
- 'đầunậu' (ring leader) 頭腦 tóunăo
(SV đầunão),
- 'dàydạn' (experienced) 經驗 jīngyàn (SV
kinhnghiệm),
- 'láibuôn' (merchant) 大販 dăipán (Cant.
/tai2pan3/),
- 'lẻtẻ' (trivial) 零星 língxīng (SV linhtinh,
'miscellaneous'),
- 'ănnhậu' (social engagement) 應酬 yìngchóu (i.e.,
'eat and drink'),
- 'cụngly' (raise glasses and cheers) 碰盃
bèngbèi,
- 'đừnghòng' (don't you ever) 甭想 péngxiăng,
-
'luônluôn' (always) 牢牢 láoláo,
- 'lạcloài' (solitude) 落落 luòuò [
~ '失落 shìluò (SV thấtlạc) ],
- 'đượclắm' (pretty good) 得來
délái,
- 'đượclòng' (pretty good) 心得 xīndé,
- 'giờgiấc' (time)
時間 shíjiān [ while 'thuở (thủa)' (a period of time), a contraction of
phonetic sandhi of 時候 shíhòu (SV thờihậu) ],
all that match
exactly the same usage and meanings of the Chinese counterparts, not to
mention in-depth knowledge required for the Chinese phonological
historical linguistics to appreciate the roots of basic lexicons such
as
- 'chỉ' 線 xiàn (thread) and 'chỉ' 錢' (ancient monetary
unit weighed approximately a 10th of a Chinese unit of 兩 tael) [ cf. 錢
qián (SV tiền) 'money' ],
- 'đường' 唐 táng (road, as apposed to 途
tú (SV đồ), to 道 dào (SV đạo),
- 'lá' 葉 yè (leaf) [ the pattern
/j-/ ~ */l-/ is very common in Chinese. ],
- 'lúa' 來 lái (paddy, as
opposed to 稻 dào 'gạo' rice) [ cf. 麥 mài (SV mạch) ],
- 'cá' 魚 yú
(fish) [ /ke-/ and /ca-/ in English 'ketchup' and 'catsup' is cognate to V
'cá' ],
- 'sông' 江 jiāng (river) as opposed to 川 chuān (SV xuyên) [
cf. 水 shuǐ (SV thuỷ) 'water', another word for 'river' ],
- 'mây' 霧
wù (cloud), as opposed to 雲 yún (SV vân),
- 'mưa' 雨 yǔ (rain) [ the
pattern /y-/ ~ /m-/ is very common in Chinese ~ Vietnamese. ],
-
'nắng' 陽 yáng (sunshine) [ Who says there is no Chinese word for
'sunshine'? ],
- 'cóng' 寒 hán (chilly) [ Hai. /kwɔ5/ ],
-
'biển' 海 hăi (sea) as opposed to both VS 'bể' and 'khơi' [ SV 'hải', for
'khơi', cf. Cant. /hoj3/; it is not hard to associate the 2 related
sounds. Ex. 海外 hăiwài V 'hảingoại' (overseas) vs. VS 'ngoàikhơi' (out in
the seas) ],
- 'bữa' 飯 fàn (Hainanese /buj2/ 'meal' as opposed to SV
'buổi' (period of the day),
- 'ăn' 唵 ăn (eat) [ cf. 吃 chī (cf. 乙
yǐ (SV ất) as opposed 'xơi 食 shí (SV thực) ],
- 'uống' 飲 yǐn
(drink) as opposed to 'hớp' 喝 hè (SV hát) 'sip',
- 'đi' 去 qù (go)
as apposed to 走 zǒu (SV tẩu) 'run' for 'chạy',
- 'đứng' 站 zhàn
(stand),
- 'ỉa' 屙 é (to poo), 'đái' 尿 niào (to pee, same as VS
'tiểu' connotatively as 'urinate', cf. 尿尿 niàoniào 'điđái'),
-
'ngủ' 臥 wò (lie down to rest, hence 'sleep', as opposed to 睡 shuì,
connotatively 'somnus'),
- 'đụ' 嫖 piáo (fuck, a derivative of VS
'đéo', colloquially 他媽 Tāma ('Your mother's fucker'),
- 'đẻ' 生
shēng (Hainanese /te1/) 'give birth to', in addition to 'tái' (Hai. /ta5/)
'uncooked',
- 'việc' 活 huó (work) as apposed to 務 wù (SV vụ),役 yì
(SV dịch),
and of a great number of other words cited in this paper.
For the same reason, due to lack of first-hand experience in modern
Vietnamese the same authors will never know dissyllabic words such as
-
'đốivới' (with respect to) 至於 zhìyú giving rise to 'đếnnổi' (to such a
degree that) as apposed to 對於 duìyú,
- 'vòmtrời' 重圓 chóngyuán (SV
trùngviên 'sky vault') instead of 宇宙 yúzhōu (SV
vũtrụ 'universe'),
- 'gỏi' 膾 (鱠) kuài (SV khoái 'mince meat
(fish) salad') instead of 'chopped meat or fish',
- 'quà' 饋 kuì (SV
quỹ 'gift') instead of 禮物 lǐwù,
- 'cảirỗ' 菜蘭 càilán (Chinese
brocolli) instead of 'cảilàn' or 'cảilan',
- 'dưahấu' 塊瓜 kuàiguā
(SV khốiqua 'watermelon') in stead of 西瓜 xīguā,
- 'ănmày'
要飯 yàofàn (beggar) in stead of 乞丐 qǐgài,
- 'thầymô' 巫師 wùshì as
opposed to 'phùthuỷ' (shaman), etc.,
all are cognates.
(6)^ Yet, all such events occurred at much later times not long ago with less impact in terms of cross-cultural influence, though, as compared with what could have come from another indigenous kingdom called Nanzhao 南詔 (Namchiếu) where half of today's North Vietnam's territory to the west belonged to it, which flourished between 649 and 902 during the Tang Dynasty.