Reassessing Vietnamese Core Lexicon Beyond the Mon‑Khmer Hypothesis
by dchph
Vietnamese basic vocabulary cannot be explained solely by Mon‑Khmer affiliation. The evidence points to a layered genealogy in which Yue and Sino‑Tibetan roots play a central role, with Mon‑Khmer influence as a significant but secondary substratum. This reassessment underscores the need to move beyond entrenched Austroasiatic premises and toward a broader comparative framework.
A fresh look at basic vocabulary shows that many items long attributed to Mon‑Khmer substrata also display clear cognacy with Chinese and Sino‑Tibetan forms. Words such as chó ‘dog’, gà ‘chicken’, and lúa ‘paddy’ demonstrate that Vietnamese shares fundamental etyma with Chinese, often predating Middle Chinese. These correspondences challenge the Austroasiatic hypothesis and point instead to a deeper, shared ancestry within the Sino‑Tibetan–Yue continuum.
I) The Austroasiatic hypothesis: revisited correspondences
Western scientific methodology, by its very nature, is expected to yield correct theories most of the time; otherwise, it ceases to be scientific. Yet theories inevitably change and are eventually replaced, especially in a field as dynamic as linguistics. In the Vietnamese case, as the preceding discussion has shown, many early authors, though undeniably pioneers, often took shortcuts. They relied on the limited data available to them at the time, while avoiding the more demanding path that required rigorous study of both Chinese and Vietnamese in their historical and phonological dimensions (see Ding Bangxin, ibid., 1977, p. 263). (See more on the Vietnamese tonegenesis paper by Graham Thurgood, http://www.csuchico.edu/~gthurgood/Papers/Vietnamese_tonegenesis.pdf - as of Jan. 2017)
Haudricourt's theory of tonegenesis provided a convenient framework for Austroasiatic Mon‑Khmer theorists, who frequently cited his work. Yet, given the limitations of their time, and in light of the advances in Old Chinese and Sino‑Tibetan studies over the past sixty years, their conclusions now require serious re‑evaluation, if not outright revision. When addressing the genetic affiliation of Vietnamese with other Mon‑Khmer languages, their insufficient mastery of both Chinese and Vietnamese historical phonology led them to overlook, or fail to recognize, the deeper connections between Vietnamese and Chinese.
For instance, among the words Haudricourt used in his illustrative examples, the case of chó 'dog' (Norman 1988) is revealing: it derives from Proto‑Miao‑Yao and is cognate with Chinese 狗 gǒu, demonstrating that Vietnamese and Chinese share basic vocabulary at the most fundamental stratum. Other parallels such as 雞 jī ~ gà ('chicken'), 來 lái ~ lúa ('paddy'), 為 wéi ~ voi ('elephant') 熊 xióng ~ gấu ('bear'), etc., further attest to this early relationship in the core lexicon.
Table 1 - Comparative grid of basic vocabulary correspondences
| Vietnamese (VS) | Sino‑Vietnamese (SV) | Old Chinese (OC) | Mon–Khmer parallels | Gloss |
|---|---|---|---|---|
| chó | cẩu 狗 gǒu | koʔ | Khmu choʔ | ‘dog’ |
| gà | kê 雞 jī | ka | Khmer moən | ‘chicken’ |
| lúa | lai 來 lái | ləʔ / lɒp | Riang luah | ‘paddy’ |
| cá | ngư 魚 yú | ŋa | Khmer trey | ‘fish’ |
| lá | diệp 葉 yè | leb / hljeb | Khmu laʔ | ‘leaf’ |
| ngày | nhật 日 rì | ŋit | Khmer thngai | ‘day’ |
| mẹ | mẫu 母 mǔ | mʷaʔ | Khmer mae | ‘mother’ |
Notes:
-
Vietnamese ↔ Old Chinese: Shows deep cognacy, often with tonal correspondences.
-
Sino‑Vietnamese layer: Reflects later borrowings that reinforce the OC connection.
-
Mon-Khmer parallels: Present, but often divergent in form, suggesting substratum influence rather than primary inheritance.
-
Takeaway: Vietnamese basic vocabulary is layered – Yue/Sinitic roots at the core, Mon–Khmer parallels as secondary.
Haudricourt's argument about tonal development in Vietnamese also rested on the etymology of many such basic words, which is crucial for understanding the Sino‑Vietnamese lexical layer. As shown in the cases above, these words often have clear Chinese cognates. I will examine these issues in greater detail below, and expand the discussion in the following chapter on Sino‑Tibetan etymologies.
To begin, let us revisit Haudricourt's basic word lists, focusing first on his examples from Khmu and Riang, two Mon‑Khmer languages, where words ending in a glottal stop [ʔ] correspond to Vietnamese words bearing the sắc or nặng tones (Norman 1988, pp. 55–56; 1991, p. 206), as listed in the following table:
Table 2 - Comparanda on basic word correspondences
| Việt | Khmu | Riang | Notes on Chinese correspondences |
| lá ('leaf') | hlaʔ | laʔ |
葉 yè (leaf) (SV diệp) [ M 葉 yè, dié, shè, xiè < MC jiap, ɕiap < OC *leb,
*hljeb | Note: The pattern OC /*l-/ ~ MC /j-/ is very common in
Mandarin as /j-/. Most of the Tibetan languages carry the the
sound near lá. For example, Tibetan: ldeb lá, tờ, Burmese:
ɑhlap cánhhoa (floral petal), Kachin:
lap2 lá, Lushei: le:p búp, Lepcha:
lop lá, Rawang ʂɑ lap lá (used to
wrap rice pastry) ; Trung ljəp1 lá, Bahing lab. (Shafer p.138;
Benedict, p. 70.) Per Starostin, Proto-Austro-Asiatic: *la, Proto-Katuic: *la, Proto-Bahnaric: *la, Khmer: sla:, Proto-Pearic: *laʔ.N, Proto-Vietic: *laʔ, s-, Proto-Monic: *la:ʔ, Proto-Palaungic: *laʔ, Proto-Khmu: *laʔ, Khasi: sla-diŋ, Proto-Aslian: *sǝlaʔ, Proto-Viet-Muong: *laʔ, ʔ-, Thomon: la.343ʔ, Tum: la.212 ] |
| gạo ('rice') | rənkoʔ | koʔ | 稻 dào (SV đạo) [ Starostin posited this etymon as "lúa" (paddy) in Vietnamese. See also etymology of "gạo" and "lúa" in previous sections. ] |
| cá ('fish') | kaʔ | -- | 魚 yú (SV ngư) [ M 魚 yú < MC ŋɨə̆ < OC *ŋa | According to Starostin, ST fish. For *ŋh- cf. Xiamen hi2, Chaozhou hy2. | Protoform: *ŋ(j)a. Meaning: fish. Chinese: 魚 *ŋha fish. Tibetan: ɳa fish. Burmese: ŋah fish, LB *ŋhax. Kachin: ŋa3 fish. Lushei: ŋha fish, KC *ŋhɑ. Kiranti: *ŋjə. Comments: PG *tàrŋa; BG: Garo năk, Bodo ŋa ~ na, Dimasa na; Chepang ŋa ~ nya; Tsangla ŋa; Moshang ŋa'; Namsangia ŋa; Kham ŋa:ɬ; Kaike ŋa:; Trung ŋa1-plăʔ1. Simon 13; Sh. 36, 123, 407, 429; Ben. 47; Mat. 192; Luce 2. | OC *ŋh- ~ k- (ca-) || See AWhat Makes CHinese So Vietnamese? -Appendix M on the case of "ketchup" or "catsup", where "ke-", ca-" is "cá" ('fish'), while '-tsup, -tchup' is 汁 zhí ('sauce'), etymologically. ] |
| chó ('dog') | soʔ | soʔ | 狗 gǒu (SV cẩu) [ ~ VS 'cầy' | QT 狗 gǒu < MC kjəw < OC *ko:ʔ | Note: In Chinese 狗 gǒu (Proto-Viet **kro, Mon-Khmer *klu) might be a loanword from the Yue. cf. 犬 quán (SV khuyển) ~ VS 'cún' ('poppy') which could be a cognate with 狗 gǒu if both forms descended from the same source, either of the Yue or Sino-Tibetan languages. ] |
| chí ('louse') | -- | siʔ | 虱 shī (SV siết, sắt) [ M 虱 (蝨) shī < MC ʂit < OC *srit || Note: Etymologically, from From Proto-Sino-Tibetan *srik ('louse'). The case of /-ʔ/ ~ "sắc" is similar to "lá": 葉 (yè, SV diệp) ] |
Observation:
The shift from final /‑ʔ/ to /‑k/, /‑t/, or even to the sắc tone is hardly remarkable. In fact, in many central and southern Vietnamese dialects, mắt 目 mù 'eye' is still pronounced as mắc (cf. SV mục). Similarly, mắt (目 mù) did not yield the Mandarin reading /mu5/, nor did cắt 割 (SV cát 'cut') become Mandarin gē–and there are hundreds of comparable cases. Thus, the much‑discussed Mon‑Khmer–Vietnamese correspondences involving final glottals are far less exceptional than sometimes claimed.
Etymologically, several points are clear:
- Lá 葉 yè 'leaf': Vietnamese lá corresponds to Chinese 葉 yè, traced through AC *lhap < OC *lap < PC **lɒp. Cognates across Tibeto‑Burman languages preserve initial l‑ with minor semantic variation. In Khmu and Riang, which lack tones, the Vietnamese sắc tone was replaced by a final glottal stop /‑ʔ/.
- Gạo 稻 'rice': The Chinese word 稻 is thought to be borrowed from a Yue‑type language, likely Austroasiatic, spoken by ancestors of present‑day minorities in southern China, of which the early Vietnamese were a part. As with lá, the Vietnamese nặng tone here corresponds to a glottal stop in Khmu and Riang. But that is not the case with lúa ('paddy') which is postulated by Starostin and that is cognate to 稐 lǔn and 來 lái.
- Cá 魚 yú 'fish': Vietnamese cá is plausibly cognate with Old Chinese *ŋa. The shift from OC *ŋh‑ to Vietnamese /k‑/ is not difficult to explain, and the case parallels that of lá 葉 yè (SV diệp).
- Chó 狗 gǒu 'dog': According to Norman (1988), Chinese 狗 gǒu is an early loan from Proto‑Miao‑Yao klu (cf. Mon kle, written Mon kluiw). Vietnamese chó is also known as cầy. This, too, follows the same pattern as lá 葉 yè (SV diệp).
As Tsu‑lin Mei and others have noted, these correspondences point not to isolated borrowings but to a deeper stratum of shared vocabulary linking Vietnamese with Chinese and neighboring language families.
The Shuo-wen says 南越名犬#### “Nan-yüeh calls 'dog' *nôg **g.” This explanation occurs under the entry for ## which implies that the meaning “dog” is attached to this character. The first character of the compound probably represents a pre-syllable of some kind. Tuan Yü-ts'ai mentioned in his Commentary to the Shuo-wen that this word was still used in Kiangsu and Chekiang, but did not give any further detail.
Karlgren gives **gas the OC value for ## (GSR 109 7h). At the time of the Shuo-wen (121 A.D.), -g had probably already disappeared; in Eastern Han poetry, MC open syllables (OC –b, -d, -g) seldom rhyme with stopped syllables (OC –p, -t, -k); in old Chinese loan words in Tai (specifically, the names for twelve earth's branches 地支 ti-chih), probably reflecting Han dynasty pronunciation, Proto-Tai –t corresponds to OC –d, but no trace can be found for –g. The proper value for our purpose is therefore **ô.
This is the AA [Austroasiatic] word for “dog,” as the following list shows: “dog”: VN chó; Palaung shɔ:; Khum, Wa soʔ, Riang s'oʔ; Kat, Suk, Aak, Niahon, Lave có; Boloben, Sedang có; Curu, Crau ʃŏ; Huei, Sue, Hin, Cor sor; Sakai cho; Semang cû, co; Kharia sɔ'lɔʔ, ; Ju solok; Gutob, Pareng, Remo guso; Khasi ksew; Mon klüw; Old Mon clüw; Khmer chkɛ.
The forms after VN represent almost all the major groups spoken in the Indo-China and Malay Peninsulas, as well as the Palaung-Wa, Khmer, and Mal groups. The proto-form for these languages appears to be soʔ or coʔ, preceded perhaps by k- (cf. Khasi, Gutob, etc.). On the basis of Mon, Haudricourt suggested that VN ch- < kl-.** But there is another possibility, namely, VN ch- < kc-; “to die” *kcət, VN chết, Kuy kacet, Kaseng sit. And even if VN ch- did come from kl-, this change must have occurred quite early, since in all the AA languages except Mon, the initial is either a sibilant fricative or affricate.
II) Toward a Yue–Sinitic framework
The essential point is that Vietnamese chó and Chinese 狗 gǒu stand in direct correspondence, traceable to more than 2,250 years BP, when the indigenous Yue peoples were already in contact with early Chinese. At the same time, Chinese 犬 quán (SV khuyển) served as the native term for 'dog.' [M 犬 quăn < MC kʰʷen < OC *kʰʷeːnʔ. Note that 犬 quǎn and 狗 gǒu may be cognates or doublets, as noted by Tsu‑lin Mei following Tuan Yü‑ts'ai's commentary on the Shuowen, which records 犬 still in use in Jiangsu and Zhejiang.] Both words coexisted, with 狗 gǒu eventually becoming the more frequent form, a fact that helps reconcile the evidence showing that many other basic words in Vietnamese and Chinese share common roots, whether from Old Chinese, Yue, or Austroasiatic sources, encompassing modern Dai, Zhuang, Miao, Yao, and related languages.
Similarly, in other citations in cases where final /‑s/ or /‑h/ correspond to the Vietnamese hỏi or ngã tones, the Chinese-Vietnamese correspondences as follows are less transparent and require closer scrutiny:
Table 3 - Questionable correspondences
| Việt | Mon | Mnong | Chinese correspondences by dchph |
| mũi ('nose') | muh | mǔh |
鼻 bí (SV tỵ) [ M 鼻 bí (tị, tỵ) < MC biɪ < OC *blids | Note: Based on other Chinese ~ Vietnamese solid cognates of
human body parts, for this item, we can posit the pattern ¶ /b- ~ m-/
(See footnotes below.) According to Pulleyblank, the Yuan and modern
Mandarin readings as well as many other modern dialects, e.g., Taiyuan
/piə'/, Amoy literary /pit/, imply E. /bjit, L. pɦjit./ | Etymology:
The word derives from Proto‑Sino‑Tibetan bi 'nose'
(cf. Nuosu ꅳꁖ hnap bbit 'nose; snot'). An
alternative derivation traces it to
Proto‑Sino‑Tibetan s‑brit 'sneeze; nose; swallow,'
which is reflected in Tibetan སྦྲིད (sbrid 'sneeze'),
though Chinese shows no trace of r in this root
(Schuessler 2007).
In several modern lects – including Mandarin, Gan, Jin, Wu, Xiang, and even the literary layer of some Min dialects – the word points to a form with final ‑t. Thus in Standard Mandarin it is pronounced bí, suggesting an old entering‑tone reflex, rather than bì, which would be expected from the Middle Chinese departing tone. This irregularity is explained either as an early northwestern loss of ‑s in the ‑ts cluster before final simplification (Baxter 1992), or as a dialectal shift from ‑s to ‑t (Pulleyblank 1998). Originally, 自 denoted 'nose' but later shifted to mean 'self', leaving 鼻 (OC blids) to carry the sense of 'nose'. Some scholars interpret 鼻 as depicting a nose (自) together with two lungs (畀), though oracle‑bone evidence shows 畀 representing an arrow rather than lungs. ] |
| rễ ('root') | rɜh | ries | 蒂 dì (SV đế) [ M 蒂 (蔕) dì, dài, zhài < MC tei < OC *te:ds | ¶ /d- ~ r-/ ] |
| bảy ('seven') | tpah | poh |
七 qī (SV thất) [ M 七 (柒) qī < MC tsʰit < OC *sn̥ʰid | Note: all dialects, like M, have longer retain the final /-t/
| Starostin's reconstruction: Protoform nit (s‑) 'seven'.
Chinese: chit < snhitʔ, Burmese: khu‑natɕ, Kachin: sjənit², Lushei:
KC s‑Nis, Limbu: nu‑si, Proto‑Garo: ɲi(s), Garo: sni;
Dimasa: sini, Rawang: sanit;
Trung: sjə³‑ɲit¹, Kanauri: stiʂ, Mantshati: nyiz/‑i, Rgyarung: ʂnis, ʂnes, Namsangia: iŋit, Andro: sini. (Refs: Sh. 123, 134, 411, 429; Ben. 16; Mat. 203) (For further elaboration on this etymology, see What Makes Chinese So Vietnamese? - Chapter Ten on Sino‑Tibetan etymologies.) ] |
- 鼻 bí for VS "mũi"?: The Ancient Chinese sound of 鼻 bí for VS "mũi" is reconstructed by different linguists as biuzj (MC) < OC *bjiwer (Chou 1973), b'ji- (MC) < OC *b'òcd (Karlgren 1957), bi (MC) < OC *bjidh (Li 1971), bi (MC) < OC *bjcs (Schuessler 1987), phjì (MC) < OC *bjis (Pulleyblank 1991). While Chou's MC /biuzj/ is the closest sound of VS /muj4/ by way of /b-/ > /ʔɓ-/ > /m-/, any of the proposed sound changes above could have given rise to similar sounds in othe Chinese dialect, for example, bei6 (Cantonese, Wenzhou dialects), pó (Xiamen and Chaozhou dialects) and p'ei6 (Fuzhou dialect), but, amusingly, it became tị [tej6] (conditioned by -j-) in SV along with other irregular patterns in Sino-Vietnamese ¶ /b-, p- ~ t-, th-/ where there exist no similar Fanqie spellings in Kangxi dictionary. However, if it could become /bei6/, it could be nasalized (fronted due to the original labial like /b-/) to become /mej6/, giving rise to /mwoj6/ then /mwoj4/ (fronted due to a rounding effect of the glide -w-). Compare the pattern of /-ej/ ~ /-uj/ as follows.
- 酸梅 suānméi 'salted dried plum' (VS xímuội ~ mechua, SV toanmai) [ cf. 梅 méi (SV mwoj6, mai), Chaozhou /bhuê5/ ¶ /m- ~b-/ ],
- 每 měi 'each' (VS mỗi, SV mỗi) [ M 每 méi < MC mɔj < OC *mjə:ʔ | Dialects: Cant. mui22, Amoy muĩ2. Chaozhou mue21, Fuzhou muei2. cf. 母 mǔ (SV mẫu, VS mẹ), Chaozhou /bho2/. ],
- 妹 mēi 'younger sister' (VS em, SV muội) [ VS 'em' /ēim/ (contraction) <~ 妹妹 mēimēi | M 妹 mēi < MC moj < OC *mhjə:ts < PC *mjət | According to Starostin, Burmese: mat husband's younger brother, younger sister's husband. Comments: Kham mama mother's younger brother. For *mh- cf. Xiamen be6, Chaozhou mue6, Fuzhou muoi5, Jianou mue ]
- 魅 méi 'obscure' (VS mờ , SV muội) [ M 魅 mèi < MC mɔj < OC *mjə:ts ],
-
味 wèi 'smell' (VS mùi, SV vị) [ M 味 wèi < MC mʊj < OC *mjəts | FQ 無沸 |
According to Starostin: Standard Sino-Viet. is vị. Since the Chinese
word also means (in later times) 'interest', Viet. muồi
'interesting' may be traced back to the same source. For *m- cf.
Xiamen, Chaozhou bi6, Fuzhou muoi6, Jianou mi6. | cf. 未 wèi (SV
'mùi'), 'mìchính 味精 wèijīing (SV vịtinh) 'MSG'],
and correspondence ¶ /b- ~ m-/ between Middle Chinese ~ Sino-Vietnamese and Mandarin and Vietnamese can be found, such as - 疲 pì 'tired' (VS mệt, SV bì) [ M 疲 pì < MC be < OC *bhaj | ¶ b- ~ m- ],
- 肥 féi 'fat' (VS mập, mỡ, phệ, phị, SV phì) [ M 肥 féi < MC bwyj < OC *bjəj | According to Starostin, ST be fat, rich. Viet. phệ is a colloquial reading (cf. also reduplicated: phềphệ); standard Sino-Viet. is phì (reduplicated: phìphị).],
- 秘 mì 'secret' (SV bí /bei5/) [ M 秘 bì < pi < OC *prits],
- 忙 máng 'busy' (VS bận , SV mang) [ M 忙 máng < MC mjəŋ < OC *ma:ŋ | Dialects: Amoyu boŋ12 (lit.), baŋ12; Chaozhou maŋ12; Fuzhou mouŋ12; Shanghai mã32 ], and
- 悶 mèn 'sad' (VS buồn, SV muộn) [ M 悶 mèn < MC mɔn < OC *mjə:ns | Dialects: Amoy bun32, Chaozhou buŋ32. According to Starostin, 悶 mèn means 'melancholy, sorrow', absent from Schuessler's dictionary, although attested already in Yijing. The character is also used (since L.Zhou) for *mjə:n, MC mon, Mand. mén 'to be stuffy, stifling, close, airless' (both readings may be actually related). cf. Viet. 'ngộp' (stuffy) \ m- ~ ŋ- ]
- The appearance of { 蒂 dì ~ SV đế ~ VS rễ } corresponds to the patterns of
- 婿 xù 'son-in-law' (SV tế ~ VS rể) [ M 婿 xù < MC siej < OC *sas. Also: *sēs (Zhou zyxlj p.256), Karlgren: OC *srir, TB *krwy | cf MC *sa 胥. MC siej could be from OC *sēs. MC description 解開四去 ],
- 鬚 xū 'beard' (SV tu ~ VS râu ) [ M 鬚 xū < MC suə̆ < OC *so | ¶ /x-, s- ~ r-/ : Ex. 蛇 shé (SV xà ~ VS rắn), 'snake', 縮 suō (SV thúc) rút 'shrink' ],
- 縮 suò 'shrink' (VS rút ~ SV thu) [ Also, VS 'co', 'thụt' | M 縮 suò < MC ʂʊk < OC *sruk ],
- 菜 cài 'vegetable' (VS rau ~ SV thái) [ Also, SV 'thể', VS 'cải' | M 菜 cài < MC chɤj < OC *shjə:ʔs ],
- 愁 chóu 'sad' (SV sầu) ~ VS rầu) [ cf. 秋 qīu VS thu | M 愁 chóu < MC ʐjəw < OC *dhu | Dialects: Suzhou zoy12; Wenzhou zau12; Changsha cou12; Nanchang chɜu12 ; Cant. sʌu12 ],
- (及)速 (jí)sù 'hasty' (VS (gấp)rút ~ SV (cấp)tốc ) [ M 速 sù < MC suk < OC *so:k ]
3. Etymology
Khmer lacks a native morpheme for 'seven.' The Vietnamese form may derive from Proto‑Vietic *pəs, ultimately traceable to Proto‑Mon‑Khmer *d₁puulh, *d₁puəlh, *d₁pəlh. Cognates include Bahnar tơpơh and Bolyu pei⁵⁵.
-
Shafer's hypothesis To account for Old Bodish bdun 'seven'–in
contrast to s‑Nis in most Tibeto‑Burman languages
and nwi in Karenic–Shafer proposed an original
form sibdunis. With accentual variation (sibdúnis),
this yielded O.B. bdun. Since Old Bodish disallowed clusters
such as sbd‑, the initial consonant was dropped, as in other
examples:
- Sino‑Tibetan m‑lt'ei 'tongue' → O.B. ltśe
- Sino‑Tibetan p‑l‑ŋa → O.B. lŋa
- Siamese: tśěţ₃
- With accent sibdunís, the development may have proceeded as sibunís > siwunís > sinwis (Karenic nwi), while sibdunís > sunís > s‑Nis in most Tibeto‑Burman languages. Metathesis frequently preserved consonants that would otherwise have been lost, particularly in Bodish dialects, and a similar process may explain the Karenic forms.
Comparative forms
- Western Bodish (Sbalti): bdun
- Burig: ŕdun
- Kharao: tśă‑ri
- Vietnamese:
- Sino‑Vietnamese: thất /t'ɐt7/, as in đệ thất 'seventh'
- Vernacular Vietnamese: thứ bảy 'Saturday; seventh'
Some of the Sinitic‑Vietnamese examples above may suggest that their sound changes were derived from Sino‑Vietnamese, itself a development from Middle Chinese. Yet the reverse scenario is at least as plausible: basic vocabulary is more likely to preserve deeper connections with Old Chinese, or even with proto‑Chinese, than with the later strata of Middle Chinese.
Conclusion
The evidence reviewed in this chapter makes clear that Vietnamese cannot be reduced to a late‑tonalized offshoot of Mon‑Khmer. The substratum of basic vocabulary, the tonal correspondences with Old and Middle Chinese, and the persistence of Sino‑Vietnamese etyma across centuries all point to a deeper and more complex history. Haudricourt’s theory of tonegenesis, while pioneering in its time, inverted the actual process: rather than Vietnamese acquiring tones belatedly, it is more likely that certain Mon‑Khmer languages lost them, while Vietnamese developed in parallel with Chinese within the broader Sino‑Tibetan–Yue continuum.
The correspondences in basic vocabulary – chó ~ 狗 gǒu, gà ~ 雞 jī, lúa ~ 來 lái, voi ~ 為 wéi, gấu ~ 熊 xióng – demonstrate that Vietnamese shares fundamental etyma with Chinese and related languages at the deepest lexical stratum. These parallels cannot be explained away as late borrowings; they reflect long‑standing contact and shared inheritance.
From the Han conquest in 111 B.C. through nearly a millennium of colonization, Vietnamese evolved within the Chinese linguistic sphere, absorbing, adapting, and elaborating tonal categories that remain central to its identity today. The tonal system of Vietnamese is thus not an isolated innovation but part of a regional continuum that includes Old Chinese, Middle Chinese, and the Yue‑Daic languages of southern China.
In sum, the Vietnamese language emerges as a product of layered interaction: indigenous Vietic roots, Austroasiatic affiliations, and sustained Sinitic influence. Its tonal system and basic vocabulary testify to a history of convergence rather than divergence, demanding a reassessment of long‑standing assumptions about its genetic affiliation. Vietnamese is best understood not as a peripheral Mon‑Khmer language, but as a central participant in the Sino‑Tibetan–Yue nexus that shaped the linguistic landscape of East and Southeast Asia.
References
Foundational Works
Maspero, Henri. Études sur la phonétique historique de la langue annamite. Les initiales. Paris: Imprimerie Nationale, 1912.
Haudricourt, André‑Georges. L’origine des tons en vietnamien. Journal Asiatique 242, 1954, pp. 69–82.
Haudricourt, André‑Georges. Problèmes de phonétique diachronique: la nasalisation vocalique en vietnamien. Bulletin de la Société de Linguistique de Paris 49, 1954.
Tonogenesis and Comparative Studies
Mei, Tsu‑lin. Tones and Prosody in Middle Chinese and the Origin of the Rising Tone. Harvard Journal of Asiatic Studies 30(1), 1970, pp. 86–110.
Pulleyblank, Edwin G. Middle Chinese: A Study in Historical Phonology. Vancouver: University of British Columbia Press, 1984.
Chen, Shu‑Fen. Vowel Length in Middle Chinese Based on Buddhist Sanskrit Transliterations. Language and Linguistics 4(1), 2003, pp. 29–45.
Thurgood, Graham. Vietnamese and Tonogenesis: Revisiting Haudricourt. Paper, California State University, Chico, 2002.
Vietnamese and Austroasiatic Context
Norman, Jerry. Chinese. Cambridge University Press, 1988.
Starostin, Sergei. Sino‑Tibetan Etymological Dictionary and Thesaurus. Moscow, 1991.
Thomas, David D. Basic Vocabulary in Some Mon–Khmer Languages. Mon–Khmer Studies, 1960.
Sidwell, Paul. Austroasiatic Dataset for Phylogenetic Analysis: 2015 Version. Mon–Khmer Studies 44, Mahidol University / SIL International.
Alves, Mark J. An Updated Overview of the Austroasiatic Components of Vietnamese. Languages 9(12), 2024.
Sinitic–Vietnamese Layering
Sa, Quoc Hoang. Study on the Understanding and Use of Sino‑Vietnamese Words: Perspectives from Secondary School Students in Ho Chi Minh City. Sprin Journal of Arts, Humanities and Social Sciences 4(5), 2025.
Comparative Methodology
Campbell, Lyle. Historical Linguistics: An Introduction. Edinburgh University Press, 2013.
Haspelmath, Martin. Comparative Linguistics and the Problem of Spurious Similarities. Linguistic Typology 9(1), 2005.