Cultural And Polysyllabic Approaches
by dchph
Vietnamese vocabulary has often been grouped under the Austroasiatic Mon‑Khmer umbrella, but this classification risks flattening cultural distinctions and ignoring polysyllabic evidence. This article reframes the issue by distinguishing Vietnamese vocabulary through cultural embedding and polysyllabic analysis, showing how Vietnamese diverges from Mon‑Khmer while retaining substratal echoes.
A polysyllabic (or disyllabic) methodology uncovers correspondences obscured by the traditional monosyllabic lens. Vietnamese compounds and fixed expressions often reveal one-to-many relationships with Chinese etyma, demonstrating semantic flexibility and phonological adaptation beyond what Mon-Khmer parallels can explain. This approach highlights the cultural and linguistic integration of Chinese elements into Vietnamese, from classifiers and grammatical markers to idiomatic expressions.
I) The problem
Conventional comparative linguistics often treats Vietnamese as a Mon‑Khmer language, emphasizing shared roots.
Yet many Vietnamese words are culturally embedded in ways that Mon‑Khmer parallels cannot account for.
Polysyllabic forms in Vietnamese reveal unique structural and semantic patterns absent in Mon‑Khmer cognates.
Most of cited Austroasiatic Khmer by various authors have little to do with Austroasiatic Mon-Khmer languages. One may play both the role of historical linguist and judge, but it is more important to provide explanations for those words that are genuine cognates across all languages under consideration.
When reaching a final judgment, cultural factors must be given high regard. In the case of the etyma "chas", "già" and "cha", the Chinese 爹 diè offers a better fit in the overall picture. The etymology of a word is not only about its phonetic shell or semantic core (for example, tía for "daddy"), but also about its cultural story. The distinction between tía ('daddy') and cha ('father') reflects different layers of meaning, even though both trace back to the same root 爹 diè ~ SV đa versus 爸 bā ~ Shaanxi ta. These doublets are basic words that all languages must have possessed independently, without the need for borrowing, as recognized in historical linguistics. Their cognacy can be explained as etyma of the same human root, belonging to the class of universally basic words, much like /mat/ with Vietnamese mắt ("eye"), which appears across many Asian languages. Yet only Chinese 目 mù (SV mục) "eye" is culturally aligned with Vietnamese, as shown in shared idiomatic usage: 盲目 mángmù ~ VS mùquáng ("blindly"), 目擊 mùjí (SV mụckích) ~ VS mắtthấy ("witness"), and so forth.
In other words, unless proven otherwise, Sino-Tibetan etymologies, which show stronger cognacy with Vietnamese, should take precedence over Mon-Khmer comparisons. This is because Chinese-Vietnamese cognates carry not only linguistic features but also embedded cultural implications, displayed syllable by syllable in both tonal languages. For example, 葉落歸根 yèluòguīgēng corresponds directly to Vietnamese lárụngvềcội ("the dying leaf falls back to the tree root," figuratively "people long to die in their birthplace"), a sentiment shared collectively by both Chinese and Vietnamese. Similarly, 衣架飯囊 yījiàfànnáng ~ VS giááotúicơm ("good-for-nothing bum") has no equivalent in Mon-Khmer.
From the perspective of historical linguistics, looking back 2,000 years and considering the millennium of Chinese rule, it is logical to assume that many modern Vietnamese words were either derived from Chinese or evolved from the same roots. This holds regardless of whether individual Mon-Khmer matches exist for words like lá ("leaf"), rụng ("fall"), về ("back"), and cội ("root"), since only in Chinese-Vietnamese correspondence do they combine into complete idiomatic expressions.
Forrest (1948, p.25) put it well when he paraphrased
Karlgren's words (1) in his work that:
"it is faulty method to compare [..] an isolated word in each of the languages; rather must the comparison begin with related groups of words in one and in the other language, words which, linked in both form and meaning, involve a buried phonetic element common to their group, beside which may be placed a similarly constituted group in other language."
In considering lexical grouping, cognates should be examined in clusters of words that belong to the same semantic domain. Within the anatomical class, for example, we already recognize clear Sinitic‑Vietnamese correspondences such as đầu 頭 tóu 'head', mặt 面 miàn 'face', mắt 目 mù 'eye', tim 心 xīn 'heart', and trán 顙 săng 'forehead'. These parallels are so evident that they require little elaboration here. By extension, it is highly plausible that additional anatomical terms can likewise be traced to cognate etyma. Thus, if phổi 肺 fèi 'lung', gan 肝 gān 'liver', and thận 腎 shèn 'kidney' align with their Chinese counterparts, then related lexemes such as bụng 腹 fù 'abdomen' and dạ 胃 wèi 'stomach' are equally strong candidates for cognacy.
Furthermore, Chinese often preserves parallel native entities within each lexical class that have been fully Sinicized but remain distinct in usage due to their geo-historical origins. Some of these may plausibly reflect southern sources. Examples include: 'river' 江 jiāng vs. 河 hé, 'water' 水 shuǐ vs. 淂 dé, 'creek' 川 chuān vs. 泉 quán, 'dog' 犬 quǎn vs. 狗 gǒu, 'face' 面 miàn vs. 臉 liǎn, 'blood' 血 xuè vs. 衁 huāng, 'head' 首 shǒu vs. 頭 tóu, 'eye' 目 mù vs. 眼 yǎn, 'leg' 足 zú vs. 腳 jiǎo. Within each class, extended forms can often be identified from monosyllabic roots. For example, 犬坐 quǎnzuò may plausibly correspond to VS "chồmhỗm" (squat), rather than the Khmer 'chrohom' (sit in front) (Nguyen Ngoc San, ibid., p. 49).
II) Polysyllabic approaches
Grouping by nucleus: Vietnamese disyllables (xinlỗi, cảlũ, đồngloã) show systematic sound change patterns distinct from Mon‑Khmer monosyllables.
Semantic pairing: Disyllabic forms often encode relational meaning (cảlũ ‘whole group’, đồngloã ‘accomplice’), unlike Mon‑Khmer single‑root terms.
Cognate differentiation: Vietnamese ungthư vs Mon‑Khmer medical terms demonstrates divergence in polysyllabic survivals.
Our new approach emphasizes dissyllabicity, incorporating not only Forrest's "categorical principle" but also derived disyllabic formations. For example: 'dạ' 胃 wèi > 'baotử' 胃子 wèizi (stomach), and 'tâm' 心 xīn (heart) > 'tấmlòng' 心腸 xīncháng (inner heart).
As noted earlier, earlier scholarship on Vietnamese etymology was generally conducted under the assumption that Vietnamese is a monosyllabic language. We will retain the valuable achievements of that tradition, such as Thomas' axiom of sound change, or more precisely, "phoneme shifts and mergers, as their imprint is indelible" (Thomas, ibid.). In comparison with monosyllabic roots, we will analyze how subsequent sound changes unfolded under different phonetic and semantic conditions, and how dissyllabic variants emerged in cultural contexts.
In other words, Vietnamese etyma should be studied within a contextual framework that embraces their full wholeness, both monosyllabic and dissyllabic forms, linguistic peculiarities, and cultural accentuation. Only then can specific words be reliably assessed for kinship, original meaning, and even approximate timelines of borrowing. Even without explicit chronological markers, linguists can often secure an etymon with confidence by analyzing lexical residues scattered across different topologies. From the examples cited below, we can identify archaic Vietic and Old Chinese etyma within a cultural framework. Except for the first five Vietnamese vocables taken from Nguyen Ngoc San's wordlists (ibid., pp. 98, 161), the etymologies of the other cases may be more complex than they appear here.
| Vietnamese | Chinese | Pinyin | Meaning | Notes |
|---|---|---|---|---|
| anhtam | 兄弟 | xiōngdì | brothers | ~> 'anhem' (possibly from 兄妹 xiōngmēi 'older brother and younger sister') |
| cổlỗ | 古老 | gǔlǎo | antiquated | 'cổ' 古 gǔ (SV cổ) 'ancient' + 'lỗ' 老 lǎo (SV lão) 'old'; Vietic /klũ/ |
| cáisọ | 骷髏 | gǔlóu | skeleton | <~ 'càlồ' 骷髏 gǔlóu (cổlâu) = 骨骼 gǔgé (SV cốtcách); Vietic /kro/ |
| thiêngliêng | 神靈 | shénlíng | sacred | <~ SV 'thầnlinh', Vietic /tliêng/ ~> 'thiêngliêng' ~> 'lành'; ex. đấtlành: 地靈 dìlíng (good earth) |
| chồmhỗm | 犬坐 | quǎnzuò | squat | (literally) 'squat like a dog' |
| hiềnlành | 善良 | shànliáng | kindness | #<~ SV 'lươngthiện' |
| sumvầy | 團聚 | tuánjù | reunion | <~ SV 'đoàntụ' |
| nóichuyện | 嘮嗑 | làokè | chat | (Chinese northeastern dialect), cf. 聊天 liáotiān 'talk' |
| luitới | 溜達 | liùdá | stroll | (Chinese northeastern dialect) |
| xarời | 疏離 | shūlí | stay away | |
| điđám | 隨錢 | suíqián | give a monetary present | also VS 'đitiền' |
| đầunậu | 頭腦 | tóunǎo | big shot | SV 'đầunão' ('head' = VS 'đầunão' 'headquarter') = 首腦 shǒunǎo (SV thủnão) ~ VS 'đầunão' > 'sọnão' (brain) |
| khốnnạn | 混蛋 | húndàn | insult / hardship | SV 'hỗnđản'; cf. 困難 kùnnán: SV 'khốnnạn' (hardship) |
| hỗnhào | 溷肴 | húnxiáo | confused | ~> 'impolite' in Viet.; cf. 溷淆 húnxiáo, 溷肴 hùnyáo, 渾殽 hùnyáo, 混淆 hùnxiáo, all 'hỗnhào' in V. |
To adopt this new holistic approach, we must look beyond phonological resemblance and semantic equivalence, and also consider cognates within related categorical sets. Forrest's concept of "related groups of words" can be extended further: we apply it not only to words tied to a shared conceptual domain, but also to their expanded dissyllabic variants. These forms often preserve both phonological continuity and distinctive semantic nuance. This is especially important because, in Forrest's time, many specialists of Vietnamese still debated the role of dissyllabicity as a defining feature of the lexicon in languages such as Vietnamese and Chinese.
As an illustration of how dissyllabicity influences categorical sound change, we may begin with etyma for human body parts and their dissyllabic derivatives. From there, the analysis can be expanded to other fundamental domains, such as kinship terms, which are widely regarded in historical linguistics as among the most stable elements of vocabulary, resistant to change across time.
While each word has its own distinct denotation, it is also connotatively linked to other polysyllabic word-concepts within the same lexical class. For example:
-
目 mù (SV mục, VS mắt 'eye') must be considered a cognate, since it belongs to a larger lexical set within its semantic sphere and makes peculiar sense in fixed connotative expressions:
-
目光 mùguāng (SV mụcquang): VS # ánhmắt ("the look of one's eyes") [reverse order (#) with 光 guāng as ánh; M 光 guāng < MC kwɑŋ < OC kʷa:ŋ. According to Starostin: also read kʷa:ŋ-s, MC kwʌŋ, Mand. guàng "be extensive" (< kʷa:ŋʔ-s, cf. 廣). Schuessler prefers kʷa:ŋ with level tone, but Karlgren distinguishes it from kʷa:ŋ "be bright." Viet. quáng "to dazzle, blind" is a colloquial loan (reflecting *kʷa:ŋ-s > MC kwʌŋ "be bright, dazzle"); standard SV is quang.]
-
盲 máng (SV manh): VS mù ("blind") [M 盲 máng < MC maiŋ < OC mhra:ŋ].
-
盲目 mángmù (SV manhmục): VS # mùquáng ("blind, blindly, indiscriminately") [whole contextual loan with dissyllabicity, similar to 目光 mùguāng].
-
Other related forms include:
-
睇 dí [Cant. /t'aj3/]: VS thấy ("see") [modern Cantonese "gaze"].
-
瞅 chǒu (SV thiễu): VS xem ("look") [cf. M 瞅 chǒu ~ phonetic M 愁 chóu < MC ʐjəw < OC dhu].
-
瞧 qiáo (SV tiều): VS coi ("look") [M 瞧 qiáo < MC tsɦaw < OC dzaw].
-
眼 yăn (SV nhãn): VS nhìn ("look") [Mand. "eye," extended semantically].
-
親眼 qīnyăn: VS chínhmắt ("see with one's own eyes") [cf. 目擊 mùjí (SV mụchkích) → VS chínhmắt].
-
眼光 yănguāng: VS # cáinhìn ("view, sight") [innovation; cf. 目光 mùguāng "look"].
-
眼力 yănlì (SV nhãnlực): VS # sứcnhìn ("eyesight") [conceptualization].
-
眼淚 yănlèi: VS # nướcmắt ("tear") [association].
-
眼眶 yănkuāng: VS # khoémắt ("rim of the eye") [reverse order].
-
眼屎 yănshǐ: VS ghèn ("gum in the eye") [contraction].
-
眼皮 yănpí: VS # mímắt ("eyelid") [reverse order].
-
眼鏡 yănjìng: VS # kínhmắt ("eyeglasses") [also mắtkính 目鏡 mùjìng, Hainanese /mat7keng1/].
-
眼前 yănqián: VS # trướcmắt = 目前 mùqián [hence trướcmặt "at the moment," reverse order].
-
眉目 méimù: VS manhmối ("clue, lead"), vẻmặt ("countenance") [adaptation].
-
眉毛 méimáo: VS mimắt ("eyelash") [association].
-
眉梢 méishāo: VS # chânmày ("eyebrow") [reverse order].
-
刺眼 cìyăn: VS ngứamắt ("unpleasant to the eye") [also doublet gaimắt].
-
小心眼 xiăoxīnyăn: VS nhỏnhen ("narrow-minded") [contraction].
-
老天有眼 lăotiānyǒuyăn: VS # trờicaocómắt ("Heaven is watching") [association].
-
眉來眼去 méiláiyănqù: VS # liếcmắtđưaduyên ("make eyes") [innovation].
-
耳聞目見 ěrwénmùjiàn: VS tainghemắtthấy ("seeing and hearing in person") [association].
-
耳聞不如目見 ěrwén bùrú mùjiàn: VS # trămnghe đâubằng mắtthấy = 耳聞不如眼見 ěrwén bùrú yănjiàn ("seeing for oneself is better than hearing from others") [association].
-
果報眼前 guǒbàoyănqián: VS # quảbáonhãntiền ("karmic retribution within one's lifetime") [loan and translation of 現世報 xiànshìbào].
For the specific examples above, beyond the natural adoption of Sino-Vietnamese forms such as nhãn for 眼 yǎn, it is worth noting that the Mon-Khmer form /phnek/ may also be cognate with Vietnamese mắt ("eye"), through a historical sound change /phn-/ ~ /m-/ in either direction. This raises broader questions: how did speakers of these languages express the act of "looking" and "seeing"? How did they conceptualize "perception" through the eyes? What idioms relating to vision evolved in their languages, and how were the meanings of "eye" extended into compounds that moved beyond monosyllabicity into the polysyllabic realm?
In all such cases, Chinese and Vietnamese cognates align seamlessly, both phonologically and semantically, because they ultimately derive from the same etymological source. This shared inheritance predates the separation of Vietic and Mường, which occurred at least 2,100 years ago, and persisted despite the influx of successive Chinese dialects brought by repeated Han incursions during the millennium of Chinese rule. Even after Vietnam's independence in the 10th century, Chinese forces attempted invasions almost every decade, leaving a profound linguistic imprint.
The historical parallel is clear: just as the Spanish conquest of South America in the 15th century resulted in more than half the continent speaking Spanish long after independence, so too did centuries of Chinese domination leave Vietnamese deeply interwoven with Sinitic vocabulary and idioms.
From this perspective, we can continue to trace countless other examples where Chinese and Vietnamese forms correspond, both monosyllabically and dissyllabically, and where etyma share the same contextual associations. To illustrate further, let us now turn to additional examples drawn from the semantic field of human body parts and related concepts that appear in both languages.
- zuǐ 嘴: môi 'lip',
- zuǐbā 嘴巴:VS # 'bờmôi' (lip)
- zuǐyìng 嘴硬 : VS # 'rángcải' (long-tongued)
- duōzuǐ 多嘴: VS 'giàmồm' (talk back verbosely) [ for 'già', cf. 多 duō (SV 'đa') ~ 'già' <~ 'cha' <~ 'tía' 爹 diè (SV 'đa') ]
- dòuzuǐ 鬥嘴: VS 'đấukhẩu' (quarrel ) [ associate of 嘴 with a symnonym 口 kǒu (mouth), a common linguistic phenomenon. ]
- wāizuǐ 歪嘴: VS 'méomõ' (wry mouth) ]
- piězuǐ 撇嘴: VS 'bĩumôi' (curl one's lips) [ Aslo, doublet 'trềmôi' ]
- wénzuǐ 吻嘴: hônmôi 'lip kissing', [ 吻 wén: VS: 'hôn' ~> 'hun' (kiss). Also, wén 吻: mồm 'mouth' (doublet of 'miệng') ]
- 嘴
-
dìng 腚: đít 'buttocks'
- dìng 腚: VS 'đít' (buttocks)
- tún 臀: VS 'trôn' (buttocks) [ cf. SV 'đồn' ]
- pì 屁: VS địt (fart) [ cf. SV 'tí '| M 屁 pì (tí, thí, thỉ) < MC pʰi < OC *pʰis ]
- pìgǔ 屁股: VS 'phaocâu' (chicken's butt) [ modern M 'buttocks' vs. VS 'lỗđít' (anus) ],
- shǐ 屎: VS 'cứt' (feces) [ cf. SV 'thiệt'. Also, VS 'dử', 'ráy' (excrement) | M 屎 shǐ < MC shǐ < OC *ʂij < PC *kijh, ʂijh (Zhou zyxlj ]., p.251) | Shafer: TB *kip, Burmese: khjijh excrement, Kachin: khji3 excrement, Dimasa: khi, Garo: khi, Bodo: kí, Kham kī; Kanauri khoa, Bahing khl, Digaro: klai. Simon 19; Sh. 44; Ben. 39; Mat. 191.];
- èshǐ 屙屎: 'ỉa(cứt)' (poop)
- gǒushǐ 狗屎: VS # 'cứtchó' (dog's feces)
- ěrshǐ 耳屎: VS # 'cứtráy' (ear wax)
- ěrduo 耳朵: VS lỗtai (ear) [ by association and assimilation ]
- ěrlóng 耳聾: VS # 'lãngtai' (partially deaf) [ VS 'lãng' <~ lãngtai <~ @ ®M 耳聾 ěrlóng ('deaf') | M 聾 lóng < MC ləwŋ < OC *ro:ŋ ]
- tīng 聽: VS 'nghe' (hear) [ cf. (Hainanese /k'e1/) ]
- jiăo 腳: VS 'chân' (leg) [ M 腳 jiăo, jué < MC kɨak < OC *kaɡ | cf. zú 足: VS 'giò' ]
- bājiăo 巴腳: VS 'bànchân' (foot sole) [ (literally) "a 'panel' of the foot". cf. 腳板 jiăobăn (dialectal) 'sole'; cf. bàntay 手板 shǒubăn (palm) ],
- què 瘸: què 'limp' (SV cài) [ M 瘸 què, qué < MC gwa < OC *ɡʷal ];
-
shǒu 手 (SV thù) ~ zhăng 掌 (SV chưởng): tay 'hand', which makes the
- shǒubăn 手板: VS 'bàntay' (palm) [ (literally) "a 'panel' of the hand". cf. bàntay 巴掌 bāzhăng (dialectal) 'hand' ].
Like 掌 zhăng (palm) in bāzhăng 巴掌 'hand', a later development 手板 shǒubăn after 手 shǒu 'tay' (hand), at the same time 手 appears in other related compounds that have given rise to many Vietnamese words with the same structure as in:
- kōngshǒu 空手: VS 'taykhông' (empty-handed)' [ Also,' (bare hand', cf '空手道 Kōngshǒudào': SV 'Khôngthủđạo' (Karate) ] ,
- xiàshǒu 下手: VS 'hạthủ' (put one's hand to) [VS 'ratay' ],
- shǒuxià 手下: VS 'thủhạ' (subordinate) [ VS 'taydưới' (underdog) ],
- chàshǒu 插手: VS 'ratay' (put hand in),
- dòngshǒu 動手: VS 'độngthủ' (put one's hand to),
- shǒuruăn 手軟: VS 'nươngtay' (lenient),
- qiáoshǒu 巧手: VS 'khéotay' (skillfu)',
- gāoshǒu 高手: VS 'caotay' (upper hand),
- shùnshǒu 順手: VS 'thuậntay' (handy, at one's convenience),
- qīngshǒu 輕手: VS 'nhẹtay' (light-handed),
- zhòngshǒu 重手: VS 'nặngtay 'heavy-handed',
- shǒuzhú 手足: VS 'taychân' (hands and feet) [ cf. 手腳 shǒujiăo 'taychân' (in the context of 'close kinship'), ]
- yīshǒu 一手: VS 'mộttay' (connoisseur) [ Also, VS 'mộtcây' which might have evolved from yī shǒu 一手, literally meaning 'one hand' or 'single-handed' (subsituting 'cây' for shǒu while in Chinese it means 'he himself, he with his hand... doing something). cf. cây 樹 shù (SV thụ) 'tree' ],
- shuǐshǒu 水手: VS 'thuỷthủ' (sailor) ~> 'taychèo' (rower),
- qiáoshǒu 巧手: VS 'khéotay' (skillful) ~> 'hoatay' (magic hand),
- xiàshǒu 下手: VS 'ratay' (act with one's hands) ~> 'xuốngtay' (put one's hand to),
- shǒuxià 手下: VS 'kẻdưới' (assistant) ~> 'dướitay' (subordinate),
- chàshǒu 插手: VS 'xíavào' (interfere),
- dòngshǒu 動手: VS 'nhúngtay' (have one's hand in),
- shǒuruăn 手軟: VS 'nhẹtay' (lenient),
- shùnshǒu 順手: VS 'sẵntay' (handy, at one's convenience),
- gēshǒu 歌手: VS 'casĩ 'singer' [ <~ Viet. @ 歌星 gēxīng; substituting shǒu 手 SV 'thủ' or xīng 星 for 'sĩ ' 士 shì, that is a common affix in building Vietnamese composite words, such as 'hoạsĩ' 畫家 huàjiā (painter), 'thisĩ' 詩人 shīrén (poet), etc. ]
and sometimes with alternations as the result of local innovation while the main core meaning still remains sticky, such as
of which 耳 ěr is associated with
which makes
We can further expand further into different categories such as family relationship, or kin terms, for instance
-
fù 父: bố 'dad',
- fùqīn 父親: VS # 'bốruột' (biological father) [ SV 'phụthân'. cf. qīnfù 親父 (SV thânphụ) with both Sino-Vietnamese compounds used interchangeably in Vietnamese while associating @ M 親 qīn < MC chjin < OC *shjən with 'ruột' (blood-related) ],
- qīndiè 親爹: VS 'charuột' (biological father),
- diè 爹: VS 'tía' (daddy) [ also, VS 'cha' is a doublet of 'tía' that is evolved from 'ba' 爸 bā: Shanxi dialect /tá/ ],
- bā 爸: VS 'ba' (father) [ Shaanxi dialect: /tá/, a doublet of '爹 diè (SV 'đa'): VS 'tía' and 'cha' (daddy) ],
- mǔ 母: VS 'mẹ' (mother) [ VS 'mệ', 'mợ', 'mạ', 'mái', 'cái'... ],
- mā 媽: VS 'má' (mother),
- mǔqīn 母親: VS # 'mẹruột' (biological mother) [ SV mẫuthân | cf. M 親母 qīnmǔ (SV thânmẫu). Note that the Sino-Vietnamese forms for 'thânmẫu' or 'mẫuthân' for 'mother' and 'thânphụ' or 'phụthân' #父親 for 'father' are also in common usage, yet, a bit more, not only using the Sinitic-Vietnamese ones to address parents.],
- niáng 娘: 'nạ' (mommy) [ archaic and dialectal usages ],
and other compounds such as VS 'bốmẹ' 父母 fùmǔ, VS 'chamẹ' 爹媽 dièmā, VS 'bamá' 爸媽 bāmā (parents), etc.
III) Comparative evidence
Vietnamese disyllabic vocabulary aligns more closely with Sino‑Vietnamese doublets than with Mon‑Khmer roots.
Cultural specificity and polysyllabic structure provide stronger explanatory power than Austroasiatic classification.
Mon‑Khmer lexical parallels fail to capture Vietnamese semantic nuance.
A. Kinship lexemes: Sino‑Vietnamese and Vietnamese parallels
We can extend the discussion aforementioned to other items such as anh 兄 xiōng (older brother), con 子 zǐ (child), chị 姊 zǐ, chế 姐 jiě (older sister), em 妹 mēi (younger sister), for example;
- anhtam 兄弟 xiōngdì (SV huynhđệ, 'brothers')
- concháu 子孫 zǐsūn (SV tửtôn, 'posterity')
- anhem 兄妹 xiōngmēi (SV huynhmuội, 'brother and sister')
- chịem 姊妹 zǐmēi (SV tỷmuội, ‘'sisters')
- achế 姐兒 jiěr (SV thưnhi, 'sister')
- emgái 阿妹 āmèi (SV amuội, 'sister')
and many additional genetically affiliated forms. Taken together, these words are interconnected within categorically grouped lexical sets.
Forrest's concept of "related groups of words" can be expanded beyond monosyllabic correspondences to include polysyllabic formations, as illustrated above. This framework can also be applied to derivatives that arise through processes of corollary, association, and analogy (as reflected in the What Makes Chinese So Vietnamese - Case study worksheet.) Derived words from the same root often appear quite different from their original sound base, to the point that their shared ancestry is obscured. Without a dissyllabicity approach to substantiate the evidence, these sound-changed variants are rarely recognized as belonging to the same etymological family. By situating them within polysyllabic groupings, however, their kinship becomes clearer, just as with the monosyllabic core-rooted words, such as
- 'tập' vs. 習 xí (practice) and its derivatives 'tậpdượt' #演習 yănxí (drill), 'họchỏi' 學習 xuéxí (learning), 'thóiquen' 習慣 xíguàn (habit), 'tậtxấu' #陋習 lòuxí (bad habit), etc. (1)
- dòu 逗: VS 'đùa' [ Also, variant doublets: ''chọc', trêu', 'tếu' (funny, make fun of). cf. SV 'đậu' ~ M 逗 dòu < dow < OC *dos, *do:s. For 'tếu', by associating 逗 dòu with 笑 xiào (SV tiếu) ],
- dòuxiào 逗笑: VS 'trêughẹo' (make fun of) [ variants 'chọcghẹo', 'đùacợt', 'chọccười' (joke), 'thọclét', 'cùlét' (tickle) (Hai. /ka1lɛt7/) ],
- dòuwán 逗玩: VS 'đùagiỡn' (play) [ variants 'chơigiỡn', 'giỡnchơi' , 'đùabỡn' (VS # 'bôngđùa'), 'đùadai' (play a trick on),
- zhēndòu 真逗: VS 'tếulâm' [ being associated with '笑林 xiàolín (SV tiếulâm)', non-extant word in Chinese, via localization as in "這個 人 真逗! Zhègè rén zhēndòu!: 'Cái anhnày tếulâm quá!' (This person is so funny!) ],
- diăn 點 /tjen2/: 'tiếng' (hour), 'châm' (ignite), 'chấm' (dip), 'tí' (a bit), 'điểm' (point), 'đếm' (count), 'đốm '(dot), 'chọn', etc.,
- zhòngdiăn 重點: VS 'điểmchính' (SV 'trọngđiểm') (main point),
- diănmíng 點名: VS 'đọctên' (SV 'điểmdanh') (roll calling),
- diănxīn 點心: VS 'lótlòng' (SV 'điểmtâm') ("dimsum", snack, breakfast),
- kuàidiăn 快點: VS 'maulên' (hurry up), 'mauđi' (Be quick!),
- màndiăn 慢點: VS 'chậmtí' (slow down),
- diándiăn 點點: VS 'títi' (a little bit, sparingly) [ variant doublets: 'tíxíu', 'chútxíu', 'lèotèo' ], etc.
Similarly,
or
(Note: Elaboration on etymologies of the words above have been cited throughout the previous chapters. )
The key point here is that the semantics of each disyllabic item help reveal the etymology of its constituent morphemes, since each two-syllable word aligns with a broader set of related concepts within the same lexical category.
II) Cultural approaches
Embedded practices: Terms for rituals (ăntấtniên, vuquy, sínhlễ) reflect Vietnamese cultural specificity, not Mon‑Khmer inheritance.
Festive vocabulary: Words like TânMão, TânHợi show calendrical embedding tied to Sino‑Vietnamese cosmology rather than Mon‑Khmer.
Social institutions: thànhphố, chợbúa, khaigiảng illustrate urban and educational concepts absent in Mon‑Khmer rural lexicons.
All of the items cited in this survey conform closely to Chinese phonological contours and display the same distinctive linguistic attributes, even when they shift semantically or syntactically from one category to another. To be precise, most of the etyma under investigation are indeed loanwords from Chinese. Yet they have long been accepted as indispensable and integral parts of the Vietnamese lexicon, particularly in the case of grammatical prepositions, conjunctions, and adverbials. These elements either derive directly from Chinese or evolved in parallel with Chinese function words (虛辭), for example: 於是 yúshì → VS vìthế ("hence"), 由於 yóuyú → VS bởivì ("because").
As noted earlier, the polysyllabic approach provides historical linguists with a powerful tool for uncovering further possibilities. From these findings, rules can be formulated that, much like early observations on Vietnamese and Chinese grammar, establish a baseline for deeper exploration. Such rules can serve as a springboard for advancing from one discovery to the next, extending into other lexical domains and fixed expressions that carry strong cultural resonance, particularly idiomatic etyma. This is precisely what underscores the closeness of Vietnamese and Chinese.
Beyond the cognates already identified, we also find a wealth of idiomatic sayings and fixed expressions shared by both languages, which correspond with striking precision, as illustrated below.
- 早 zăo "chào" (Hello!),
- 成 chéng "xong" (Okay!),
- 行 xíng "Vâng" (Fine!),
- 牛 níu "ngầu" (macho),
- 個啥 gèshà "cáigì" (what),
- 賴我 lài wǒ "tại tôi" (my bad),
- 罪過 zuìguò "cólỗi" (made mistake),
- 道歉 dàoqiàn #"xinlỗi" (apology),
- 倒是 dàoshì "đúngthế" (yes, it is!),
- 隨錢 suíqián "đitiền" ('monetary gift'),
- 無聊 wúliáo "vôduyên" (silly),
- 聊天 liáotiān "nóichuyện" (chat),
- 天遣 tiānqiăn "trờikhiến" (karma),
- 忙活 mánhuó "bậnviệc" (busy),
- 扣帽子 kòumàozi "chụpmũ" (brand label on somebody),
- 受不了 shòubúliăo "chịukhôngnổi" ('cannot hold it'),
- 受得了 shòudéliăo "chịuđượcnổi" ('can take it'),
- 說中了 shuōzhòngle "nóiđúngrồi" (It's correct!),
- 沒關係 méiguānxi "đâucóchi" ('it's nothing'),
- 忍不住 rěnbúzhù "nhịnkhôngđược" ('cannot stand'),
- 聊天聊地 liáotiānliáodì "nóichuyệntrờiđất" (chat),
- 什麽東西 shénmedōngxī "đồthứgìđâu!" ('what a jerk!),
- 葉落歸根 yèluòguīgēn "lárụngrụngvềcội" (literally: 'the dying leave falls back to the tree root', metaphorically: 'sentimental attachment to one's root'),
- 飲水思源 yínshuǐsīyuán "uốngnướcnhớnguồn" ('be graceful for what one got'),
- 衣架飯囊 yījiàfànnáng "giááotúicơm" (good for nothing bum),
- 傾國傾城 qīngguóqīngchéng "nghiêngthànhđỗnước" ('The beauty that would overthrow a kingdom!'),
- 含笑九泉 hánxiàojǐuquán "ngậmcườichínsuối" ('rest peacefully in the Heaven'),
- 含血噴人 hánxiěfènrén "ngậmmáuphunngười" ('to wrongly accuse'),
- 後會有期 hòuhuìyǒuqí "hẹnngàygặplại" ('so long'),
- 木已成舟 mùyǐchéngzhōu "vánđãđóngthuyền" (the die is cast'),
- 破鏡重圓 pòjìngchóngyuán "gươngvỡlạilành" ('unbroken one's heart'),
- 井蝸之見 jǐngwòzhījiàn "ếchngồiđáygiếng" ('have a brain of a chicken'),
- 螳臂擋車 tángbìdăngchē "châuchấuđáxe" ('it's a suicidal fight'),
- 長氣短嘆 chángqìduăntàn "thanvắnthởdài" ('be depressed'),
- 結草銜環 jiécăoxiánhuán "kếtcỏngậmvành" ('be grateful even unto one's death'),
- 青天白日 qīngtiānbáirì "banngàybanmặt" ('in the broad daylight'),
- 三更半夜 sāngēngbànyè "banđêmbanhôm" ('in the depth of the night'),
- 十年樹木,百年樹人 shí nián shù mù, băinián shù rén. "Mười năm trồng cây, trăm năm trồng người." ('It takes ten years to nurture a tree, but a hundred years to cultivate a class of people.'), etc.
Our proposed polysyllabicity, hence disyllabicity, approach implicates that, on the one hand, isolated words are considered as displaced lexical orphans, similar to the case of Mon-Khmer words that float about in the Vietnamese vocabulary, such as,
Viet. Khmer unknown thelè tlec ? đùm đum ? lu loạlu ? dong đong ? dàn đal ?
Given that both Vietnamese and Chinese contain a substantial number of Yue elements — many of which still surface in the theorized Austroasiatic Mon-Khmer, Austronesian, Austro-Thai, and Tai-Kadai strata, all ultimately linked to the broader Taic linguistic family — it follows that these elements collectively shaped southern Chinese dialects such as Cantonese and Fukienese within the Sinitic branch of the Sino-Tibetan family. From an anthropological perspective, Vietnamese should likewise be considered part of this continuum. (2)
With the aid of our new polysyllabic, or more specifically, disyllabic approach, we can identify many Chinese etyma in Vietnamese with far greater confidence in their plausibility. By examining Vietnamese phrases and fixed expressions such as those illustrated above, it becomes clear that numerous Chinese-Vietnamese correspondences emerge from the very same etymon. These correspondences often evolve into one-to-many relationships, extending well beyond the traditional one-to-one framework that has long constrained historical linguists.
The older paradigm has imposed a kind of mental block: interchange correspondences have typically been credited only to the so‑called Pre-Sino-Vietnamese (pre-SV or Tiền-HánViệt) lexical stratum, recognized as cognate with Proto- and Old Chinese roots, and to the Sino-Vietnamese (SV or HánViệt) layer aligned with Middle Chinese (MC). Yet this view overlooks additional Sinitic-Vietnamese variants that can be uncovered through a disyllabic methodology. For example, 湯匙 tāngchí → VS thìacanh ("spoon") versus 鎖匙 suǒchí → VS chìakhoá ("key") illustrate how a single morpheme (匙 chí > VS 'sĩ') can branch into multiple phonosemantic realizations in Vietnamese.
Phonologically and semantically, as said, a careful examination of these interchanges shows that their phonetic contours were systematically reshaped to align with Chinese counterparts, while their meanings may have shifted in accordance with established patterns of sound change and syntactical order. This process holds true regardless of the original form, underscoring the value of the disyllabic approach in revealing hidden layers of cognacy, for example,
- M 除 chú (SV trừ) ~ Pre-SV 'chừa' ~ VS 'chia' (division),
- M 嘲 cháo (SV trào) ~ Pre-SV 'chèo' ~ VS 'trêu' (laugh at),
- M 朝 cháo (SV trào) ~ Pre-SV 'triệu' ~ VS 'chầu' (attend in the imerial court),
- M 遲 chí (SV trì) ~ Pre-SV 'chầy' ~ VS 'chậm' (slow),
- M 傳 chuán (SV truyền) ~ Pre-SV 'chuyền' ~ VS 'sang' (transit),
- M 利 lì (SV lợi) ~ Pre-SV 'lời' ~ VS 'lãi',
- M 染 răn (SV nhiễm) ~ Pre-SV 'nhuộm' ~ VS 'lây' (contract),
- M 順 shùn (SV thuận) ~ Pre-SV 'suôn' ~ VS 'xuôi' (smoothly),
- M 師 shī (SV sư) ~ Pre-SV 'thầy' ~ VS 'sãi' (monk) ,
- M 時 shí (SV thì) ~ Pre-SV 'thời' ~ VS 'giờ' (time),
- M 似 sì (SV tự) ~ Pre-SV 'tựa' ~ VS 'tợ' (just like),
- M 斬 zhăn (SV trảm) ~ Pre-SV 'chém' ~ VS 'chặt' (chop),
etc.
For sound changes affecting syllabic clusters within a polysyllabic word, the transformation applies to the entire chained sequence of sounds as a unit. This differs fundamentally from the one‑to‑one correspondence model, where each element is treated in isolation at the phonemic or syllabic level, e.g.,
- 傳染 chuánrăn (SV truyềnnhiễm) ~> VS 'lâysang' (infect),
- 順利 shùnlì (SV thuậnlợi) ~> VS 'suônsẻ' (smoothly),
- 巫師 wūshī (VS 'thầymô', also, 'phùthuỷ' (thầymô) ) (shaman)
etc.
On-the-spot modification often preceded, or actively overrode, what had been conveyed in the original form. This could occur through phonemic substitution with localized adaptation, through metathesis (morpho-syllabic re‑arrangement in reverse order), inversion, clipping, contraction into derivative forms, or even playful spoonerism – e.g., " '彩虹 想 總裁' 是 越南語 中 的 一個 急口令 例子. 'Cǎihóng xiǎng zǒngcái' shì Yuènányǔ zhōng de yīgè jíkǒulìng lìzi. 'Mốngchuồng đang muốnchồng' is an example of a spoonerism (nóilái) in Vietnamese." – As we have observed in numerous polysyllabic Sino‑Vietnamese examples throughout this study, disyllabic Sinitic‑Vietnamese items in particular display even greater flexibility. They may be reversed, inverted, contracted, clipped, associated, identified, diversified, differentiated, or shaped by combinations of these processes. Let us now examine a few representative cases.
- 爸爸 bāba <= '爸 bā' => Shaanxi /tá/ => 'tiá' 爹 diè: SV 'ta' => 'cha' (daddy),
- 兄弟 xiōngdì => 'anhtam' => 'anhem' (brothers), 俺兄 ăn xiōng => 'ônganh' => anh em (my older brother) => 'anh' (brother),
- 阿妹 āmèi => 'emgái' (younger sister) => 'em' ('younger sister' > 'miss') [ For 妹 mēi: SV 'muội' ~> VS 'bậu' ]
- 姑娘 gūniáng (SV cônương) => 'cônàng' (the girl) => 'côem' (young lady) => 'cô' (miss),
- 亮子 Cantonese /liāngzéi/ => 'xinhtrai' (handsome boy) vs. Hainanese /liānggē/ => 'xinhgái' (pretty girl) => 'trai' (boy), 'gái' (girl) [ cf. also, 'trái' (fruit). Note: in Ancient Chinese, among other dozen meanings, 子 zǐ, cf. Fukienese 仔 /kẽ/ 'con' (offsring) : 古代 指 兒、女,現在 專指 兒子。 ]
- 家公 jiāgōng (SV giacông) => 'ôngnhà' (my man) => 'ôngxã' (my husband) => 'chồng' (husband), and the same etymon could give rise to 'ôngcụ' (father-in-law),
- 主公 zhǔgōng (SV chúacông) 'my Lord' => 'ôngchủ' (master) => 'ông' (mister. mr.) [ cf. VS 'ôngchúa' (lord) ],
- 叔叔 shùshù (SV thúcthúc) 'my uncle' => 'chú' (uncle) => 'cácchú' (Chinamen) => 'chệt' => ("my Chinese uncle"),
etc.
The form or pattern that occurs with the greatest frequency among disyllabic expressions will ultimately prevail. Phonetically, much like pre-Han loanwords (假借; see Bernard Karlgren, Loan Characters from Pre-Han Texts II, 1964), many morphemic changes within syllabic strings are no longer governed strictly by the principle of regular sound change. This is especially true for secondary final syllables, which are normally treated under the rules of monosyllabic correspondence in the scholarly reconstruction of the Sino-Vietnamese phonological system.
Regarding irregular sound changes, Haudricourt observed that "at first sight it may seem dangerous to abandon the principle of regular phonetic change, even in specified cases, but one is forced to admit that the blind application of the principle of regular correspondence leads to the proliferation of reconstructed phonemes and hence to the proliferation of purely accidental coincidences." (Haudricourt 1966: 328–29). A close examination of Mon-Khmer wordlists reveals numerous such cases, for example, Mon-Khmer chas → Vietnamese già and cha; Mon-Khmer cho → Vietnamese chaumày.
The recognition of multiple derivatives from disyllabic forms represents a decisive challenge to the long-entrenched belief that Vietnamese is inherently monosyllabic. Vietnamese polysyllabic words, most of which originated in Chinese, were themselves once mischaracterized as monosyllabic. This view is even more problematic when compared with Khmer polysyllabic forms, such as chromuh (proposed for Vietnamese lỗmũi 'nostrils') or tamboi (for muối 'salt', cf. Chinese 鹽巴 yánbā, 'table salt'). The persistence of the "monosyllabic Vietnamese" fallacy has acted as a regressive force, hindering progress in the historical study of Vietnamese etymology.
By contrast, the disyllabic approach has positively identified a vast body of Chinese-derived etyma, marking a turning point in the field. Beyond acknowledging the crucial role of Chinese cultural influence in shaping the Sinitic-Vietnamese lexicon, this approach has also revealed a new layer of fossilized disyllabic forms, stabilized expressions in which two syllables consistently occur together. Such forms demonstrate that the Vietnamese vocabulary stock should be regarded as independent of debates over genetic affiliation.
No Mon-Khmer cognates in Vietnamese exhibit the same range of multifunctionality as the word elements described above. Yet advocates of the Sino-Tibetan hypothesis still confront the long-standing issue of basic cognates of undeniable Mon-Khmer origin. While the Mon-Khmer stratum in Vietnamese must be acknowledged for historical reasons, particularly the resemblance of certain basic words, such as the numerals one through five, which align closely with Khmer, this does not in itself validate the Austroasiatic Mon-Khmer theory of Vietnamese origins. In fact, many lexical correspondences in Mon-Khmer are more closely aligned with Mường than with modern Vietnamese.
It is important to note that contemporary Vietnamese and Mường are no longer identical, having diverged from their common Viet-Mường base centuries ago. This split coincided with the admixture of early Han Chinese settlers and local populations, which gave rise to the Kinh people. The Han colonization beginning in 111 B.C. and the subsequent centuries of Sinicization profoundly altered the demographic and cultural composition of the region. Historically, the Kinh even referred to Mon-Khmer speakers with pejorative terms such as Mọi ("barbarian"), ngườimọi, tụimọi, Mán, or Mường Mán—paralleling the Chinese use of 蠻 Man to label southern minorities such as the Maonan (冒南), Nanman (南蠻), Meng (猛), Shui (水), Yao (瑤), Miao (苗), and Dong (侗).
The presence of Mon-Khmer substratum features in Vietnamese does not, however, negate the many confirmed connections between Vietnamese and Chinese. Numerous etyma are demonstrably cognate with both Chinese and Mon-Khmer, as well, which reflects layered historical interactions.
At the same time, caution is warranted when evaluating Mon-Khmer–Vietnamese correspondences. As Forrest (1958) warned, "too close a likeness is even more suspicious than too distant a one." For instance, a loanword from Mường, long separated from the Viet-Mường subfamily, may have entered a Mon-Khmer language, spread across its subdialects, and then re-entered Vietnamese under a new guise. Such a process would inevitably produce look-alike cognates in Vietnamese and Mon-Khmer that appear to share a common root, thereby reinforcing the Austroasiatic hypothesis. This dynamic is evident in several of the basic words sampled below.
- 衁 huāng = 'máu' (blood) [ M 衁 huāng, nǜ < MC hwaŋ < OC *hmaːŋ | PNH: QĐ fong1 | cđ MC 宕合三平陽微 | Pt 武方 | Shuowen: 血也。从血亡聲。 | Kangxi: 《左傳·僖十五年》士 刲 羊,亦 無衁 也。 《韓愈詩》衁池 波風 肉陵 屯。 | Guangyun: 衁 荒 hu光 曉 唐合 唐 平聲 一等 合口 唐 宕 下平十一唐 xwɑŋ xuɑŋ xuɑŋ xuɑŋ hwɑŋ hʷɑŋ hwaŋ huang1 huang xuang 血也 || Wiktonary: Phono-semantic compound (形聲, OC *hmaːŋ): phonetic 亡 (OC *maŋ) + semantic 血 ("blood"). Etymology: Borrowed from Austroasiatic. Compare Proto-Mon-Khmer *ɟhaam ~ *ɟhiim ("blood"), whence Khmer ឈាម (chiəm, "blood"), Mon ဆီ (chim, "blood"), Proto-Bahnaric *bhaːm ("blood"), Proto-Katuic *ʔahaam ("blood"), Proto-Khmuic *maː₁m ("blood"). Chinese has final -ŋ because initial and final m are mutually exclusive (Schuessler, 2007). This word's rare occurrence in a traditional saying indicates that it is not part of the active vocabulary of OC, but a survival from a substrate language.|| Note: Bodman, Nicholas C. 1980. 'Proto-Chinese and Sino-Tibetan,' (in Frans Van Coetsem et al. (eds.) Contributions to Historical Linguistics) (p.120) : 'An interesting hapax legomenon for 'blood' appears in the Dzo Zhuan which has an obvious Austroasiatic origin: Proto-Mnong *mham, Proto-North Bahmaric *maham, 衁 hmam > hmang > ɣuáng.' || chardb.iis.sinica.edu.tw/char/21663: (1.) 血液。 , (2) 蟹黃。|| Guoyu Cidian: 血液。 ]
There are not many Chinese and Vietnamese basic words that are analogous to the case for 衁 /hmam/ for 'máu', but the reverse course appears to be likely such as 'tiết' 血 xiè (SV huyết) as in 'tiếtcanh' 血羹 xiègēng (blood pudding), 'huyếtthanh' 血清 xiěqīng (serum), 'xơitái' 吃生 chīshēng (eat raw meat), 'treo đầu dê bán thịt chó' 掛 羊 頭 賣 狗肉. Guà yáng tóu mài gǒu ròu. ('Hanging goat heads but selling dog's meat.'), etc., which is not the case for those Mon-Khmer Vietnamese cognates for each and every word that counts as a cognate for cultural reasons.
The phonological closeness between Vietnamese and Chinese in several shared etyma within the basic lexical sphere is often interpreted as evidence of Chinese loanwords in Vietnamese. Yet many of these items were originally Yue loanwords into Chinese, as attested in classical sources such as the Éryá 爾雅 glossary and Yang Xiong's Fāngyán 方言 dictionary. Their resemblance remains clear in examples such as sông 江 jiāng ("river"), chuối 蕉 jiāo ("banana"), dừa 椰 yě ("coconut"), gạo 稻 dào ("rice"), and đường 糖 táng ("sugar"). These words have long since become part of the Chinese lexicon, just as many Old Chinese words have become indispensable in Vietnamese. Such plausibility in Vietnamese-Chinese cognates supports not only the theory of Vietnamese affiliation with Chinese but also broader Sino-Tibetan etymological connections.
Basic vocabulary has traditionally been the starting point for linguists of Southeast Asian languages seeking to establish genetic affinities among supposedly related groups. Yet beyond these basic correspondences, many of which appear in Mon-Khmer wordlists, it is equally important to examine a wider range of extraordinary lexical items, particularly those spoken by ethnic groups inhabiting the western highlands of present-day Vietnam.
This observation suggests that minority groups sharing Mon-Khmer linguistic features, distributed widely across southern China and Southeast Asia, were themselves descendants of the proto-Taic family, with ancestral ties to the Yue who once occupied the entire southern Chinese region before the arrival of proto-Chinese populations. In other words, the ancestors of Austroasiatic Mon-Khmer peoples were affiliated with the ancient Taic stock as well.
As for the composition of the modern Vietnamese nation, while the majority identify as Kinh, others descend from earlier Yue groups such as the Lạc Việt and Âu Việt. According to Nguyễn Ngọc San, around 4,000 years ago large populations of Mon-Khmer speakers migrated into the northern regions of present-day Vietnam, where they resettled, replaced, and intermingled with Taic-Kadai natives. Linguistically, he proposed that the languages spoken during the era of the Hùng kings constituted a proto-Viet-Mường stage. He further theorized that after the split of Viet-Mường, under the heavy influence of Ancient Chinese following the Han conquest, the direct ancestral form of Vietnamese as a distinct language only began to emerge about 1,000 years ago (Nguyễn Ngọc San, pp. 12–13).
To summarize historically: the "ancient Annamese", the early ancestors of the Kinh, were already a mixed population. This process of admixture accelerated during the Han occupation beginning in 111 B.C. and continued throughout subsequent centuries. The colonization of northern Vietnam brought waves of Chinese immigrants who integrated into an already diverse population of Taic-Kadai speakers and Yue emigrants from southern China. These included the indigenous peoples of the former Nam Việt kingdom ruled by Triệu Đà, whose territory encompassed what is now northeastern Vietnam as well as Guangxi, Hunan, Guangdong, and Fujian provinces of China (see Lacouperie 1887/1963). (3)
After nearly a millennium as a Chinese prefecture, extending beyond the Tang dynasty and into the era of the NamHan kingdom (南漢帝國), which expanded from present-day Guangdong and Guangxi into the northeastern regions of modern Vietnam, the formation of Vietnam as a nation represented a continuation of the ancient Annamese identity. This identity had been profoundly shaped, both anthropologically and linguistically, by the legacy of successive Chinese dynasties during their long domination. (4).
The process of cultural and linguistic transformation continued long after Annam achieved independence in 939 A.D., several years before the collapse of the NamHan Kingdom.
Linguistically, James Campbell, in Vietnamese Dialects, captures the issue succinctly:
"I originally included Vietnamese in this study/website because of the fact its phonological makeup is very similar to Chinese and, indeed, its tonal system matches the Chinese one. Originally I wrote at this site: 'Vietnamese is neither a Chinese language nor related to Chinese (It is an Austroasiatic > Mon-Khmer language more closely related to Khmer/Cambodian). Besides having a very similar phonological system, and due to the heavy Chinese influence on the language, it also has a tone system that matches the Chinese one.' However, after reading and conducting a bit more research, it appears that Vietnamese' affiliation with Việt-Mường, Mon-Khmer, and Austroasiatic may in fact be a faulty case."
He further observes:
"[Vietnamese] may not be considered a Sinitic language or one of the Chinese dialects, but the Kinh have a lot in common with the Chinese culture, and the language leaves little to doubt. … Vietnamese shares many traits in common with Chinese: 60–70% Sinitic vocabulary, another 20% of vocabulary is substrata of proto-Sinitic vocabulary, much of the grammar and grammatical markers share similarities with Chinese, along with classifiers. One would find it very difficult to draw similar parallels between Chinese and other Mon-Khmer languages. It seems that after considering all of this, what is left that is Mon-Khmer is actually very little, and probably acquired over time through contact with bordering nations. For example, the numbers are of distinct Mon-Khmer origin; however, in many compound words Vietnamese instead uses Chinese roots (as is common in the other Sino-Xenic languages, Japanese and Korean)."
As emphasized throughout this chapter, the purpose here is not to argue for a strict genetic affinity between Vietnamese and Chinese, but rather to highlight their extensive etymological commonalities and unique shared features, peculiarities unmatched elsewhere in the Sino-Tibetan family. What distinguishes Vietnamese from Mon-Khmer is precisely this distinctiveness: the structural and lexical features that align it more closely with the Sinitic sphere. (5)
Based on the comparative analysis of the basic wordlists presented above, many items should be regarded as loanwords exchanged with neighboring Mon-Khmer languages. Yet the overwhelming proportion of cognates with Chinese, together with the shared grammatical and semantic peculiarities, suggests that Vietnamese is best understood as belonging within the Sino-Tibetan family, or at least as a sub-branch alongside southern Sinitic varieties such as Cantonese and Fukienese.
This question of classification, and the broader Sino-Tibetan connections it implies, will be explored in greater detail in the following chapter.
x X x
Conclusion
Distinguishing Vietnamese vocabulary from Austroasiatic Mon‑Khmer requires moving beyond root‑based comparisons. Cultural embedding and polysyllabic analysis reveal Vietnamese as a language shaped by Sinitic–Yue contact and indigenous innovation, not reducible to Mon‑Khmer inheritance. This approach reframes Vietnamese origins and underscores the need for comparative methods that honor cultural specificity and structural complexity.
Taken together, these findings challenge the entrenched view of Vietnamese as a "monosyllabic Austroasiatic" language. Instead, they underscore its layered history: Yue substrata, Taic affiliations, and extensive Chinese influence converged to shape the modern language. Vietnamese should therefore be understood not as a Mon-Khmer derivative, but as a language deeply embedded in the Sinitic sphere. Its distinctive features – phonological, lexical, and cultural – set it apart from other Austroasiatic tongues and call for a reassessment of its place within the broader Sino-Tibetan context.
FOOTNOTES
(1)^ (1) tật, (2) lắp, (3) lặp, (4) lề, (5) vỗ, (6) thói, (7) tật, (8) trấc,
(9) thụt, (10) sụt, (11) nếp, (12) nết, (13) xí: 習 xí (SV tập, 'do
repeatedly, practice, exercise, drill, flapping wings. Also: habit, be
used to, custom, behavior, good habit, good behavior'). [ M 習 xí < MC zip < OC *ljub || Example: 演習 yănxí: VS 'tậpdượt' (drill), 性習 xìngxí: VS 'tínhnết'
(personality), 習習 xíxí: VS 'nếtna' (good character), 習慣 xíguàn: VS
'thóiquen' (habit), 陋習 lòuxí: VS 'tậtxấu' (bad habit) ]
The Vietnamese language has evolved with continuous influx of Chinese
loanwords and that still going on as late as the present day. At the
same time, dissyllabic words had already progressively formed in
parallel with the same development that had occurred previously in the
language popularized by the Tang Dynasty. Therefore, disyllabism ought
to be also taken into account for the fact that basic words could not
possibly be limited to only those monosyllabic Vietnamese words cited in
Haudricourt's work.
For disyllabic basic words we have innumerable examples that are
cognate to those of Chinese and virtually non-existent in any Mon-Khmer
forms (Note that some need to be seen in reverse order to recognize their
relatedness): mặtgiời 太陽 tàiyáng (the sun), mặtgiăng 月霸 yuèbà (the moon),
vìsao 星宿 xīngxìu (star), banngày 白日 báirì (daytime), bantrưa 白天
báitiān (noontime), nóngbức 炎熱 yánrè (stuffy hot), rétmướt 淒涼
qīliáng (chilly), giábuốt 淒薄 qībó (frigid), giôngtố 颱風 táifēng (taipoon), heomay 寒風 hánfēng (breeze), giómáy
風寒 fēnghán (weather elements), lạnhcóng 寒冷 hánlěng (freezing),
mưarào 驟雨 zòuyǔ (showers), sôngngòi 江川 jiāngchuān (river), đòngang
渡船 dùchuán (ferryboat), ốcđảo 塢島 wùdăo (islet), bểcả 大海 dàhăi
(ocean), ngoàikhơi 海外 hăiwài (at seas), đánhcá 打魚 dăyú (net
fishing), mỏác 胸骨 xiōnggǔ (sternum), chânmày 眉梢 méishāo (eyebrow),
màngtang 太陽穴 tàiyángxué (temple), sóngmũi 鼻樑 píliáng (bridge of the
nose), bàntay 手板 shǒubăn (plam), bảvai 臂膊 bèibó (shoulder), cánhtay
胳臂 gēbèi (arms), cùichỏ 胳膊肘子 gēbózhǒuzi (elbow), đầugối 膝蓋 xīgài
(knee), bànchân 腳板 jiăobăn (foot), đầunậu 首腦 shǒunăo (leading
figure), đànbà 婦道 fùdào (woman), traitráng 壯丁 zhuàngdīng (young
men), yêuđương 戀愛 liàn'ài (love), âuyếm 親熱 qīnrè (affectionate
fonding), đámhỏi 訂婚 dìnghūn (marital engagement), điđám 隨錢 suíqián
(give monetary presents), bàxã 媳婦 xífù (wife), ôngxã (ôngnhà) 家公
jiāgōng (husband), thôinôi 周年 zhōunián (first birthday shower), ởvậy
守寡 shǒuguă (widowed), phùthuỷ (thầymô) 巫師 wūshī (shaman), điđái 拉尿
làniào (pee), đáidầm 尿床 niàochuáng (bedwetting), táobón 便秘 biànbì
(constipation), tiêuchảy 瀉肚 xièdù (diarhea), đồngruộng 田地 tiándì
(farmland), tấmcám 糝糠 sănkāng (rice husky chirps), chănnuôi 種養
zhòngyăng (raise cattle), trồngtrọt 種植 zhòngzhì (planting), vườntuợc
家園 jiāyuán (garden), chợbuá 市舖 shìpǔ (market), lánggiềng 鄰居 língjū
(neighbor), đườngxá 街道 jièdào (roads), đườngđi 走道 zǒudào (path),
siêngnăng 勤勉 qínmiăn (industrious), ẩutả 苟且 gǒuqiě (careless),
làmlụng 勞動 láodòng (laboring), lamlũ 勞碌 láolù (ragged), rữngmỡ 情趣
qíngqù (flirting), dêxồm 淫蟲 yínchóng (lecherous), ănmày 要飯 yàofàn
(beggar), đánhcắp 打劫 dăjié (robbery), bắtcóc 綁架 băngjià (kidnap),
etc., and not to mention virtually all grammatical functioning compounds such as và 和 hé (and), đốivới 對於 duìyú (for), vìthế 於是 yùshì
(therefore), etc..
(2)^ It is noted that that both Cantonese and Fukienese (Amoy) have their own Yue substratum underneath the heavy weight of more than 2,200 years of active Sinicization and Chinese assimilation as of now throughout the Han dominion. Hypothetically, had the ancient Annam continued to be a prefecture of China and not gained her independece from the Middle Kingdom since the 10th century onward, there would be little doubt that the Vietnamese language would have been regarded as just another Chinese dialect for sure. Just imagine, if Canton, or approximately today's Guangdong Province of China, had won for the status of sovereignty like Vietnam in the same period from the Southern Han's ruling period in the early 10th century, then guess what would have become of this nation today? What are about Hainan and Fukien provinces if they were out of China's control just like Taiwan after 1949? People in those places would have spoken their very own 'language' and the 'Chinese' influence would have been just like non-native admixture.
(3)^ As discussed in Hypothesis Of Common Yue Origin Of Vietnamese And Chinese
(4)^ Cognacy in numbers only will certainly not make languages genetically related. For example, we see that in Thomas' wordlist above numbers from one to ten in other Mon-Khmer languages are cognate only to those limited to the set of 1 to 5 in the Khmer counting system, whence they could have been loanwords from Vietnamese. In fact, those numbers from 6 to 10 exist in some other Mon-Khmer languages other than 'Cambodian' or modern Khmer used in today's Cambodia. If they are cognate at all, could they probably have been borrowed from Vietnamese, that is, from a tonal language to a toneless one, following the wisdom of those who domineer? In this case the argument whether the Vietnamese numbers actually have any connection with Chinese or not is irrelevant.
(5)^ In general, they are those of lexical building blocks with subtle semantic specificity (such as 'ănmày' 要飯 yàofàn 'beggar', 'nhàxí' 廁所 cèsuǒ (toilet), 'đáidầm' 尿床 niàochuáng (bedwetting), or 'táobón' 便閉 biànbì (constipation), etc., similar structures in make-up blocks such as morphology (e.g., prominently CVC structured class), intensity of tonal levels (e.g. 8 tone levels fit to any tone in any Chinese dialects), and even grammatical instrumental markers (e.g., virtually all classifiers, articles, prepositions, particles, etc., in both Vietnamese and Chinese being of the same origin). Mark J. Alves (2001) in his paper entitled "What is so Chinese about Vietnamese?" in Papers from the Seventh Annual Meeting of the Southeast Asian Linguistic Society has touched on this subject but not as deeply and elaborately as what I am trying to do here.