Reassessing Vietnamese Numerals In Sino‑Tibetan Perspective Issues
by dchph
Cardinal numbers illustrate the methodological pitfalls of earlier Mon‑Khmer scholarship. Vietnamese–Mon‑Khmer correspondences are scattered across unrelated isoglosses, forming collateral rather than linear relationships. This lack of uniformity undermines the claim of a direct Mon‑Khmer affiliation.
By contrast, Vietnamese numerals show stronger and more systematic alignment with Sino‑Tibetan etyma. Many forms correspond more closely to Old Chinese or Middle Chinese than to Mon‑Khmer. This mirrors the broader lexical situation: Vietnamese shares tonal, semantic, and structural correspondences with Sino‑Tibetan, while Mon‑Khmer parallels remain inconsistent and fragmentary.
The numerical evidence thus reinforces the argument that Vietnamese belongs within the Sino‑Tibetan framework, not as an Austroasiatic outlier.
It is clear that one cannot rely solely on similarities in counting systems to draw definitive conclusions about genetic affiliation. Cross-borrowing of basic vocabulary is not uncommon, particularly in cardinal numbers. Modern Burmese provides a good example of divergence, while Chinese loans in Korean and Japanese, as well as the ordinal numbers widely used in Vietnamese, illustrate how borrowing can obscure genetic relationships. There is no linguistic principle that compels us to believe otherwise. In other words, even a complete set of cognate numerals from 1 to 10 in two languages does not, by itself, establish genetic kinship.
I) Methodological pitfalls
Earlier Mon‑Khmer specialists relied on local informants and speculative reconstructions.
Errors included misspellings, misclassifications, and conflation of Sino‑Vietnamese (SV) with Sinitic‑Vietnamese (VS).
Numerals illustrate these pitfalls clearly: Mon‑Khmer parallels are collateral, not linear.
In Vietnamese, the numbers from 1 to 5 – một, hai, ba, bốn, năm – resemble their counterparts in Mon-Khmer languages. For instance, in modern Khmer they appear as /mùəy/, /pì:(r)/, /bɤy/, /buən/, /pram/. The Khmer forms are toneless, but the phonological similarities in 1, 3, and 4 suggest cognacy, and by analogy, 2 and 5 are often included as well. Recognition of these correspondences tends to preclude attempts to relate them to Chinese numerals, which would amount to little more than speculative play.
Comparative analysis of Chinese and Vietnamese numerals, whether against major Sino-Tibetan languages or Chinese alone, reveals persistent difficulties. The two sets do not display consistent patterns of sound change across the full range of numerals, even when considering subsets such as 1 to 2 or 5 to 10.
The case of the numeral 2 is particularly instructive. In Khmer it diverges from Vietnamese hai, yet in historical linguistics it is not unusual for genetically related languages to share as few as two numerals as cognates, often consecutive ones such as 1 and 2. These are thought to have originated from the concept of "two hands" in a binary system. Interestingly, Vietnamese and other Sino-Tibetan languages also exhibit such consecutive similarities, which extend the pattern into a binary framework encompassing both 1 to 5 and 6 to 10.
Merritt Ruhlen in his The Origin of Language: Retrospective and Prospective (pp. 6-7) summarizes his findings on arbitrary vocables for number '2', in the world's languages many of them start with /p-/ or /b-/ sound:
"Dixon [R. M. W. 1980. The Languages of Australia. Cambridge, Eng. ] reconstructs *bula ‘2’ for Proto-Australian, and Blake (1988) shows how this number has been used to form dual pronouns in the Pama-Nyungan subgroup: *nyuN-palV ‘you-2’ and *pula ‘they-2’. Two of the extinct Tasmanian languages (considered by Dixon unrelated to Australian languages) exhibit similar forms, Southeastern boula ‘2’ and Southern pooalih ‘2.’ In the context of his Austro-Tai hypothesis Paul Benedict (1975) pointed out the similarity of the number 2 in all of the major families of Southeast Asia. Benedict reconstructs *ʔ(m)bar ‘2’ for Proto-Austroasiatic (cf. Santali bar, Jeh bal, Khmu’ bār, Old Mon ʔbar) and *(a)war ‘2’ for Proto-Miao-Yao. He also considers Daic forms like Mak wa ‘twin’ and Austronesian forms like Javanese kěmbar ‘twin’ to be cognate with the preceding. In Africa one of the pieces of evidence that Edgar Gregersen (1972) offered in support of Congo-Saharan (his proposal for joining Niger- Kordofanian and Nilo-Saharan in a single family) was forms for the number 2 that hardly differ from those we have seen so far. In Niger-Congo we have Temne (kë)bari ‘twin’, Nimbari bala ‘2’, Mano pere ‘2’, and Proto-Bantu *bàdí ‘2’; Nilo-Saharan has forms such as Nubian bar(-si) ‘twin’, Merarit warē ‘2’, and Kunama barā ‘ pair.’ In Eurasia one of Illich-Svitych's Nostratic etymologies appears related to the forms discussed so far, but in these families the meaning has shifted from ‘2’ to ‘half’, ‘side’, and ‘part’. Specifically, Illich-Svitych (1967) connects Proto-Indo-European *pol ‘half, side’ (cf. Sanskrit (ka-)palam ‘half’, Albanian palë ‘side, part, pair’, Russian pol ‘half’,) with Proto-Uralic *pā-lä/*pole ‘half’ (cf. Yurak Samoyed peele‘half’, Hungarian fele ‘half, one side of two’, Vogul pāäl ‘side, half’, Votyak pal ‘side, half’) and Proto-Dravidian *pāl ‘part, portion’ (cf. Tamil pāl ‘part, portion, share’, Telugu pālu ‘share, portion’, Parji pēla ‘portion’). Finally, cognate forms are found in Amerind languages of North and South America (cf. Wintun palo(-l) ‘2’, Wappo p’ala ‘twins’, Huave apool ‘snap in two’, Colorado palu‘2’, Sabane paʔlin ‘2’).
Based on the postulations outlined above, the Austroasiatic Mon-Khmer numeral for '2' may share a remote common root with Vietnamese hai. A similar phenomenon can be observed in Chinese. For example, 分 fēn (SV phân) means 'division' or 'portion', 半 bàn (SV bán) means 'half', 掰 bāi (SV bai, VS bẻ) denotes "to break apart with both hands" or "to split into two halves," and 拜 bài (SV bái, VS vái) refers to "praying with two hands pressed together." The Vietnamese /haj/ for "two" (SV nhị) could plausibly derive from the High Chinese form /nhej/ for 二 èr /ə:/.
In addition, Chinese provides a range of related concepts tied to the notion of "two": "second" 二 èr (SV nhị, VS nhì), "twin" 雙 shuāng (SV song, VS cặp), "pair" 對 duì (SV đôi), "couple" 倆 liăng (SV lưỡng, VS lứa), and "twice, again" 再 zài (SV tái, VS hai, lại). These examples illustrate the semantic breadth of the dual concept across Chinese and Vietnamese, as further noted in the following discussion.
II) Sino‑Tibetan alignments
Vietnamese numerals show stronger systematic correspondences with Old Chinese and Middle Chinese.
Examples:
một ‘one’ ↔ OC məʔ (一 yī)
hai ‘two’ ↔ OC ɡˤajʔ (二 èr)
ba ‘three’ ↔ OC pˤra (三 sān)
bốn ‘four’ ↔ OC pˤ-luʔ (四 sì)
năm ‘five’ ↔ OC ŋˤamʔ (五 wǔ)
These alignments are more consistent than Mon‑Khmer parallels, which vary across unrelated isoglosses.
For number 1 to 10 in all Sino-Tibetan languages, let us review some etyma of what we found with their cardinal numbers in Shafer's list as follows:
- 1 to 10 [ OB g-tśig 1, g-nyis 2, g-sum 3, b-źi 4, l-ńa < *b-l-ńa 5, d-rug 6, b-dun 7, b-rgyad 8, d-gu 9, b-tśu 10 ] (Shafer, pp. 21-23, 29-33, 37, 41, 56)
- 1 'một' [ M yī 一 ʾit, M Bur. ʾatś, Siamese ʾět_3 || **** Note: cf. the Vietnamese ordinal number: SV 'nhất' /ɲɐt7/ vs. VS 'một' /mot8/ ]
- 2 'hai' [ M èr 二 nyi\, O Bur. *k-in-hnis, M Bur. hnatś, Luśei hniʾ, Kapwi ka-ni, Aimol ăn-ni, Purum ă-ni, Kom ǐ-hni, Anal ă-hni, Śo hni\, Yawdwin, Tśinbok hni, S. Khami ni, Maram hań-na, Kabui (Si) kă-hnai, Kabui (Mc) kă-nai, Khoirao (Mc) hań-nai, Sopvama ka-hē, Śongge a-nai, Siamese yī\1, Lao nī_ || **** Note: VS ordinal number 'nhì' /ɲhej2/ (second); SV 'nhị' /ɲhej6/} vs. VS 'hai' /haj1/ (two), also, Vietnamese Central subdialect 'huơ' /hwə1/ vs. modern M èr /ə:4/ ]
- 3 'ba' [ M sān 三 sām, O Bur. *k-in-tʿum\, S. Khami tʿuń, Ukhrul, Khoirao kʿă-tʿum, Phadang, Kupome, Khunggoi ka-tʿum, Rengma ke-śan, Tengima, Kehena se, Kwoireng sam, Chin sām-, Siamese sām/ || **** Note: VS ordinal number: SV 'tam' /tam1/ vs. VS cardinal 'ba' /ɓa1/ (cf. Hainanese /ta1/, M 仨 sā). Also, Vietnamese variation /băm-/ in tens as in "35"="bămnhăm"="bămlăm"="balăm", cf. Hainnanese /ta1tap8lan2/. ]
- 4 'bốn' [ M sī 四 si\, OB bźi < *bźli, Siamese sī_1, M Bur. le\, Luśei li || Archaic West Bodish dialects Sbalti bźi, Burig zbźi (p. 78), West Himalayish languages Kanauri pö, Buman, Themor, Mantśati, Almora pi, Jangali pari (p. 134), West Central and East Himal. Dumi bʿyal, Khaling bʿal, Rai bʿalu, Thulung bli (p. 152) || Shafer: The only indication of primitive prefix b- being preserved are in the word "four" in certain dialects: Thulung bli, Tśaurasya pʿi, Dumi bʿyal, Khaling bʿal Rai bʿalu compared with OB bźi < *bźli. (p. 157) while in Northern Assam Taying kă-prei, Midu ka-pi having the ka- prefixes which are preserved from a Kukish *k- ancient prefix has been lost in other Tibeto-Burmic languages due to the following consonantal complex. (p. 186) Other N. Ass. languages: Kukish b-n-d'li\, Miśing, Abor a-pi, Yano, C. Nyising a-pli, E. Nyising a-pl, Apa Tanang pulyi (p. 193), Old Kukish Lamgang, Anal p-il-li (p. 252), Mara, Tlongsai, Sabeu -pali (p. 267), Luhupha Branch Kukish *b-n-dʿli\, Tśungli pezo, Longla pʿé-zé, Monsen 'pʿé-li, Khari pa-li, Tśangki pʿé-li, Tengsa pʿa-l4, Rong fă-li, Hlota mě-zú (p. 304), Dayang, Zumomi bi-di, Keźma pedi, Imenai pa-di (p. 305), Tśairelish, Andro pi-, Sak pri, Kadu pi- (p. 396), Melam a-bli, Khanang ă-bri, Meklam -bə-li (p. 400) | Baric Garo bri, Atong bǐ-ri, Ruga -bri, Tipora brui, Bodo broi, Metś bre, Dimasa biri, Mośang băli, Namsangia běli (p. 441) || *** Note: VS ordinal number SV tứ /tɪ5/ ~ 'tư' /tɪ1/ (fourth) vs. VS cardinal 'bốn' (four) in comparison of all the Sino-Tibetan etymologies cited above for this item "4". ]
- 5 'năm' [ M wǔ 五 *ńo/ < *ńa ~ OB lńa < Sino-Tibetan *p-l-ńa, Bahing, Tableng ńa, Burmese na\, Luśei ńa\, Dwags liańe, Anal pă-ńa, Purum, Kohlreng, Kom ră-ńa, Lamgang pă-ră-ńa, Abor pǐ-la-ńǒ, Needham p-l-ń@, Siamese hā\ || Southern Bodish Lhoskad, Śarpa ńa (p. 91), Eastern dialects Khams lńa (p. 111), Dwags liańe (p. 115), other Bod. languages Tsangla ńa (p. 117), Gurung, Murmi, Thaksya ńa (p. 123), W. Himal. lang. Bunan, Themor ńa-1, Almora ńa-ii (p.134), Minor group Dhimal na (p. 166), OK Mara -pəna¯ (p. 267) | Baric Garo, Awe bri, Abeng biri, Bodo broi, Metś -bre, Dimasa biri, Hojai -bri, Wanang bri, Atong bǐ ri, bərəi, Ruga -bri (p. 428) || *** Note: VS ordinal number 'năm' as in 'thứnăm' (fifth) vs. SV ngũ /ɲou4/ as in 'đệngũ' 第五 dìwǔ. Also, in Vietnamese there are variations in posterior position when '5' is used in tens, that is, '-nhăm', '-lăm', e.g., "25"="hămnhăm"="hămlăm"="hailăm". Cf. 廿 niàn ="hăm-" (20); Hainanese /-lan2/ ]
- 6 'sáu' [ M līu 六 luk, O Bur. *t-r1uk, M Bur. kʿ-rok, Kukish *t-r2-uk, Luśei ruk, Mara tśa-ru, Tlongsai tśa-ru (=8?), Maram să-ŕuk, Kwoireng tśă-ruk, Empeo (S) su-ruk, Tengima sǔ-ru, Kehena sě-r@, Chin. luk (the initial *r- < Ch. l-) (p. 32), Old Kukish Sabeu -tśa-ru, Miram -tsə-ŕu(ʾ)-, Lailenpi -tsəŕuʾ\, Lothu tsər(v)ị\ (p. 268), Meithlei tă-ruk (p. 280), Luhupa Branch Rengma se-ŕo, Keźma sa-ŕ, Imemai tśo-ro, Zumomi tso-ɣa, Dayang tsu-gwo, Tśakrima su-ru (p. 298), Tengima su-ru, also Zumomi so-ɣoʿ (p. 320) | Shafer: If the occlusive of *t- prefix had come into direct contact with the r in the Kukish and proto-Chinese words for 'six', as its phonetic correspondent d- does in Old Bodish drug 'six', we should have had Luśei ţuk instead of the ruk we find and perhaps Chin. t'uk instead of luk. (p.32) | Karenic Pwo tśu38, Sinhma sot, Thangthu sʿu (p. 423) || (Haudricourt) Daic *tśr@k, Siamese h@k, Lao, Shan, Tay noir, Tay Blanc, Nung hok, Tho sok, Dioi rok, Sui lyok, Mak, Bê lok (p. 504) || *** Note: VS ordinal number 'sáu' as in 'thứsáu' (sixth) vs. SV lục /luwk8/ as in 'đệlục' 第六 dìlìu. ]
- 7 'bảy' [ M qī 七 tśʿit, Kharao tśă-ri, Siamese tśěţ_3 || A W. Bod. Sbalti bdun, Burig ŕdun (p. 78) || * Note: V ordinal number: SV 'thất' /t'ɐt7/ as in 'đệthất' (seventh) vs. VS 'thứbảy' ]
- 8 'tám' [ M bā 八 pat, O Bur. *t-r1iat, Luśei rat, M Bur. hratś, Tarao ti-rit6, Langang tǐ-ret, Amal tă-rik, Tlongsai tśa-ru (=6?), S. and N. Khaimi tă-ya, Hlota ti-za, Tśungli ti10 || A W. Bod. Sbalti bgyad, Burig ŕgyad, -pgyad, -bgyad (p. 78), W. Himal. lang. Kukish t-rkyat?, Almora dźyad (p. 136), Norther Branch *tə-ryat, Matupi -Xŗēt (p. 251), OK Kukish *t-r1iat, Meithlei tă-rēt (p. 284) | Baric Garo, Abeng, Wanang tśet, Atong tśat- Ruga -tśet, Tipora tśa, Bodo źat, Mets dźat, Dimosa, Hojai dźa, Mośang tă-tśat, Sangge ta-tśat, Mulung tʿutʿ, Angwanku tat, Tśang sat (pp. 437, 438) || ** Note: V ordinal number: SV bát /ɓat7/ vs. cardinal VS 'tám' ]
- 9 'chín' [ M jǐu 九 kǔ/, O Bur. *t-kua, M Bur. kui\, Siamese ko\2, Luśei kua, Mara tśa-ki, Urkhrul tśǐ-ko, Phadang tśǐ-ku | Baric Garo sku, Wanang dźu, Atong tśiku, Ruga -sku, Tipora tśuku, Bodo sʿko, Metś sku (p. 441) || *** Note: V ordinal number: SV cửu /kɪw3/ vs. cardinal VS 'chín' ]
- 10 'mười' [ M shí 十 || A W. Bod. Sbalti pʿtśu, Burig śtśu (p. 78) || ** Note: V ordinal number: SV thập /t'ɐp/, VS 'chục' /tśuwk8/ ]
- 20 'hăm' [ Baric Muthun tśa, Angwanku ta, Tśang ha (p. 438) || ** Note: 廿 niàn VS 'hăm' (SV nhập) ]
- 100 'trăm' 'hundred' [ OB brgya, M Bur. -rya ( Bur.) || Other Bod. languages: Gurung, Thaksya bʿra (p. 123) || ** Note: VS /ʈɐm1/, cf. '一刀 草 紙' Yīdāo căo zhǐ: VS 'mộttrăm tờgiấy' (one hundred sheets of paper). ]
and, as complements to the postulation of V 'bốn' (4) and 'bảy' (7) note the comment made by Shafer for the 2 numbers distance themselves from those known Chinese articulation and tend to approach those of the Karenic language:
"We may have traces of other labial prefixes in the Karenic words for 'four' and 'seven' both of which have 'infixed' w which is not found in other Sino-Tibetan languages. But a b- prefix found in both these words in Old Bodish. Consequently we may legitimately inquire whether or not there is some connection between the infixed w in these words in Karenic and the b- prefix in Old Bodish."
"From Old Bodish bźi four, Dwags pli, Gurung bʿli, vli, etc. I have tentatively reconstructed Bodish bźli and from the Kukish languages the Kukish reconstruction *b-n-d'li\. Actually I can only say that the prefix in this word was a labial which differed from *m- and *p- prefixes. It may have been *v- and *w-, and the Karenic form, let us say *vli, the prefix dropping in Pwo and Bräʾ li and through metathesis becoming lwi in Sgaw and in most of other Karenic languages."
"A more daring suggestion to account for O. B. bdun 'seven' – in most other Tibeto-Burmic languages *s-Nis, but *nwi in Karenic – is that the form for 'seven' something like *sibdunis which with an accent *sibdúnis became O. B. *bdun. The combination sbd cannot occur in Old Bodish, and when some phoneme had to give way in Old Bodish it seems to have been the first: Sino-Tibetan *m-lt'ei tongue, O. B. ltśe, Sino-Tibetan *p-l-ŋa O. B. lŋa. But when the accent was *sibdunís, we may infer the development *sibunís > *siwunís >* sinwis Karenic *nwi and the *sibdunís – *sunís > *s-Nis in the majority of Tibeto-Burmic languages. Metathesis has frequently preserved consonants that otherwise would have dropped, as is particularly clear in Bodish dialects, and we may infer a similar preservation in these words in Karenic."
For our purpose, as we would certainly run into all the difficulties unsettled with the Sino-Tibetan numerical forms /b-/, /w-/, /m-/, etc., as noted above, the Sino-Tibetan numerical cognates in Vietnamese are challenged by the likeness among those Vietnamese and Mon-Khmer cardinal numbers despite of the fact that the Mon-Khmer numbers overall are based on the system of five and in both Old and Modern Khmer there exist portions of populated loans from Thai counting system, namely,
- 10 dɔp (cf. SV 'thập', VS 'chục'),
- 20 mphei (cf. SV 'nhịthập', VS 'haichục'),
- 30 sa:msɤp (cf. SV 'tamthập', VS 'bachục'),
- 40 saesɤp (cf. SV 'tứthập', VS 'bốnchục'),
- 50 ha:sɤp (cf. SV 'ngũthập', VS 'nămchục'),
- 60 hoksɤp (cf. SV 'lụcthập', VS 'sáuchục'),
- 70 cɤtsɤp (cf. SV 'thấtthập', VS 'bảychục'),
- 80 paetsɤp (cf. SV 'bátthập', VS 'támchục'),
- 90 kausɤp (cf. SV 'cửuthập', VS 'chínchục'),
- 100 roy (cf. SV 'bách', VS 'trăm'),
- 1000 pean (cf. SV 'thiên', VS 'ngàn'),
- 10000 mɤ:n (cf. SV 'vạn', VS 'muôn'),
which in turn clearly were derived from those of Chinese, that is, Chinese > Thai > Khmer. With the whole counting system standing on one foot, one may wonder why the Vietnamese numerical system is the ten-based one.
Meanwhile, for the Vietnamese ordinal numbers that count on Chinese for the concept of 1st (nhất 一 yī ~ SV nhất), 2nd (nhì 二 èr ~ SV nhị), 3rd (ba 仨 sā ~ SV tam), and 4th (tư 四 sì ~ SV tứ) and so on as they have been in active usage, we can also take into consideration of all other related counting concepts such as
- 'chục' 十 shí 'tens' [ M 十 shí (SV thập) < MC dʑip < OC *ɡjub
- 'mười' [ <~ 'mươi' <~ 十 shí ~ 'mươi' {/m-/ + /-wj/ <~ /m- ~ -wk/ <~ 'mộtchục' 一十 yīshí (nhấtthập)'}/. Cf. Sound interchange ¶ /ch- ~ m-/: 吵 chăo, miāo, 'VS chùachiềng' 寺廟 sìmiào (SV tựmiếu) \ 廟 miào ~ VS 'chiềng', cf. 朝 zhāo (SV chiêu), ¶ /m- ~ ch-/ || M 一 yī, yí, yì, yāo < MC ʔjit < OC *qliɡ ]
- 'trăm' 百 băi 'hundreds' [ cf. 一刀草紙 yīdāo căo zhǐ: VS 'mộttrăm tờgiấy' (one hundred sheets of paper). ],
- 'ngàn' 千 qiān 'thousands',
- 'vạn' 萬 wàn 'ten-thousands',
- 'triệu', 兆 zhào 'million' [ cf. modern Chinese 一百萬 yībăiwàn (1 million) ],
- 'ức', 'ý' 億 yì 'hundred billion' [ cf. modern Chinese 一億 yīyì (100 billion) ],
- 'tỷ' 秭 zǐ 'billion' [ cf. modern Chinese 十億 shíyì (1 trillion) ], respectively,
- số 數 shù (numbers),
- đếm 點 diăn (count),
- tính 算 suàn (calculate),
- cộng 共 gòng (add),
- trừ 除 chú ('subtract' in modern Chinese it means 減 jiăn, though, while 除 chú actually is 'divide' in Chinese but becomes 'chia' 支 zhī in Vietnamese; cf. 分支 fēnzhī: SV 'phânchi'),
- nhân 乘 chéng (multiply),
- mộtvài 一切 yīqiè (a few),
- haiba 再三 zàisān (literally, twice and thrice, again and again),
- nămbalượt 三番五次 sānfānwǔcì (literally 'thrice and five times', several times),
- 'Chúanhật' 主日 zhǔrì (Sunday) [ Also, VS 'Chủnhật', literally, 'the Day of the Lord', the same concept as in Chinese in modern Mandarin. The day is masked under the associative form 周日 zhōurì (SV châunhật), Cantonese 禮拜日 lǐbàiirì /lej4bai1jaht8/ (literally, VS 'ngàylễbái', or 'Day of Ceremonial Prayers')],
- 'thứhai' 周二 zhōu'èr in Vietnamese that is the second day of the week after 'Chủnhật' (or 'Chúanhật' 主日 zhǔrì, 'Sunday'). Meanwhile, in the Chinese language the first day of the week starts with 周一 zhōuyī for 'Monday', and 周二 zhōu'èr is 'Tuesday' and so on, but remember that the 7-day week concept is relatively recently as opposed 'tuần' 旬 xún 'period of 10 days' and 'tuần' means 'week' in Vietnamese,
- 'thứba' 周三 zhōusān (Tuesday),
- 'thứtư' 周四 zhōusì (Wednesday),
- 'thứnăm' 周五 zhōuwǔ (Thursday)
- 'thứsáu' 周六 zhōulìu (Friday)
- 'thứbảy' 周七 zhōuqī (Saturday),
- 一月 yīyuè ('thángmột' or the first month of Lunar calendar),
- 二月 èryuè ('thánghai' or the second month),
- 三月 sānyuè ('thángba' or the third month),
- 四月 sānyuè ('thángtư' or the fourth month), etc.,
- 'thánggiêng' 正月 zhēngyuè or 元月 yuányuè (the first month of the lunar calendar, or 'January'),
- 'ngàyrằmthángtám' 八月十五 bàyuèshíwǔ (full moon of the eigth month of the lunar calendar, of 'Moon Festival day'),
- 'thángchạp' 臘月 làyuè (the twelth month of lunar calendar, or 'December'),
- 'bamươithángchạp' 臘月三十 làyuèsānshí (the thirtieth day of the twelth month of the lunar calendar or 'Lunar New Year Eve'), etc.,
and
including the following ordinal concepts of days of the week:
as well as the months, e.g.,
including those peculiar names such as
respectively, it is apparently that all those words are modified loanwords from similar concepts in the Chinese language.
Grammatically, Vietnamese numerical usage diverges sharply from the Mon-Khmer five-based system, revealing fundamental differences in numerical arrangement. In Khmer, for example, when a number functions as a classifier or lexical coefficient, it is typically placed after the modified noun, whereas in Vietnamese it must precede the noun. This syntactic contrast underscores how Mon-Khmer numerical usage departs from the supposed common ground of cognacy in the numerals 1 to 5.
It is not we who have exaggerated the etymological significance of Vietnamese numerals 1 to 5 as deriving from a Mon-Khmer stock. Such emphasis originated with scholars in the Austroasiatic camp, who sought to attach genetic importance to these correspondences in order to argue for a Mon-Khmer–Vietnamese affinity. Yet this does not oblige us to accept the Mon-Khmer theorization at face value, especially without considering the syntactic differences in numerical usage between Vietnamese and Mon-Khmer languages.
As a sidenote, the English numerical system originally developed around twelve counting numbers, with the additional forms eleven and twelve. This duodecimal tendency is also reflected in other measures, such as twelve inches equaling one foot, and in the Julian calendar, where the names of the ninth through twelfth months—September, October, November, and December—literally mean the seventh, eighth, ninth, and tenth months.
French, by contrast, preserves a different pattern shaped by the Roman numerical system. From 11 to 16, French employs a fused base-sixteen sequence: onze (11), douze (12), treize (13), quatorze (14), quinze (15), seize (16). Beyond that, the system continues analytically with dix-sept (17), dix-huit (18), dix-neuf (19), and so forth. These forms derive directly from High Latin. Yet, amusingly, French—like English—still uses septembre, octobre, novembre, and décembre to designate the same months, even though their names no longer align numerically with their positions in the calendar.
For Vietnamese speakers, the most natural framework for numbers is the ten-based, or decimal, system. This intuitive orientation coincides with and conforms to the Chinese numerical mindset. As a result, the adoption of Sino-centric ordinal numbers such as nhất, nhị, tam, tứ, ngũ, lục, thất, bát, cửu, thập (1st to 10th) has been both natural and widespread. These forms have long coexisted in mixed usage alongside the native Vietnamese counting system một, hai, ba, bốn, năm, sáu, bảy, tám, chín, mười (1 to 10). The two systems cross-reference one another, as seen in pairs such as nhì vs. nhị, tư vs. tứ, and chục vs. thập.
This dual system is not unique to Vietnamese. The same Chinese-derived numerical framework has also been widely adopted in Japanese and Korean, where it coexists with native numerals to mutual advantage. Vietnamese, in parallel, has maintained both systems productively, drawing on each according to context.
In contrast, the Mon-Khmer peoples developed a counting system fundamentally based on five digits, a framework deeply embedded in their linguistic and cultural mindset. For them, the five-counting system appeared more natural and logical than any alternative. It is therefore unlikely that they would have borrowed an additional half of the ten-digit set to extend their system, and in fact, they did not.
Such a five-based scheme, however, does not align with Vietnamese usage. The additive pattern of 5+1, 5+2, and so forth, which characterizes Khmer numeration for 6 to 9, has no parallel in Vietnamese. If Vietnamese speakers had been content with a five-digit system, as Mon-Khmer speakers were, they would not have needed to supplement their numerals with an external source. To do so would have imposed a significant cognitive burden, comparable to how we today perceive the binary system in computing: functional but requiring conversion into decimal form for intuitive comprehension.
If we accept the Mon-Khmer numeration theory, then Vietnamese speakers must initially have shared the same 1 to 5 set. Since the sub-set of 6 to 10 was absent, they would have had to borrow sáu through mười from another source, most plausibly Ancient Chinese, to complete their system. This implies that they never employed additive constructions such as 5+1, 5+2, 5+3, 5+4 for 6 through 9, as Khmer speakers still do with forms like bramuoy, brapir, brabei, and brabuon.
Alternatively, if we imagine a linguistic scenario in which the Vietnamese numerals for one through five were cognate with their Chinese counterparts, then the numeration of 6 to 10 would naturally have formed part of a complete decimal system from the outset. In this light, the extant ten-based Vietnamese system suggests that ancient Vietic speakers may already have possessed the full decimal set, unlike the Mon-Khmer speakers whose system remained five-based.
This fundamental difference explains the difficulty in reconciling the cognitive frameworks of the two groups. To assume that ancient Vietic speakers began with a five-based system and later borrowed 6 to 10 to construct a decimal system would be both implausible and inconsistent with the evidence. From a linguistic standpoint, the Vietnamese system appears to have been ten-based from the beginning, making the Mon-Khmer perspective both untenable and illogical. (U)
If the ancient Annamese root had truly belonged to the same Mon-Khmer stock, both racially and linguistically, its speakers would have been able to function naturally within a five-based counting system. Otherwise, such a framework would have seemed illogical, just as it does when compared with other Mon-Khmer neighboring languages. These groups were likely the result of admixture between Proto-Vietmuong populations and earlier Mon-Khmer migrants from the southwest, in what is now the lower region of Laos (see Lacouperie [1887] 1963; Nguyen Ngoc San, 1993). Yet numerically, they employed the same decimal system as the Vietnamese (see Thomas, 1966; Luce, 1965).
This fact cannot be explained simply as the outcome of linguistic contact between Mon-Khmer speakers in the highlands and the Kinh in the lowlands, as suggested by Austroasiatic theorists. According to that hypothesis, Mon-Khmer speakers originally used only their first five cardinal numbers and later extended their system by adopting a borrowed subset of 6 to 10 from the Vietnamese decimal framework. Such a scenario, however, seems implausible, since the Cambodian-Khmer counting system remains cognitively rooted in a five-based structure, which has proven resistant to change.
Thus, we return to the starting point: at present, we are not in a position to prove this matter satisfactorily in terms of etymology. Nor, for that matter, have the Austroasiatic Mon-Khmer theorists themselves adequately accounted for the same issue. (2)
Having gone to considerable lengths to bring Sino-Tibetan etymologies into the discussion of Vietnamese basic vocabulary, the Sino-Tibetan camp is now in a stronger position to challenge the hypothesis of a Mon-Khmer origin for Vietnamese. Numerical correspondences represent only a fractional aspect of the linguistic base and cannot outweigh the broader etymological affinities between Vietnamese and Sino-Tibetan. If necessary, however, numerical evidence can still be employed to demonstrate phonemic affiliations between Vietnamese and Sino-Tibetan, including Chinese, as illustrated by Shafer’s data on the etymology of numbers 1 to 10.
To extend this argument, the following section attempts to build a numerical case through cross-reference. As noted, numeration is only a minor component of linguistic affinity. Whether or not this attempt proves persuasive, it does not alter the overall balance between Sino- and Sinitic-Vietnamese elements on the one hand and Austroasiatic Mon-Khmer elements on the other. In this respect, numbers are simply numbers: they cannot serve as decisive evidence for classifying Vietnamese as genetically Austroasiatic. Rather, the exploration of irregular sound change patterns in numerals may provide useful cues for linking Vietnamese forms to Sino-Tibetan or Chinese etymologies. Such an effort may also serve as groundwork for an analogical framework applicable to other sets of basic etyma, particularly those that form semantic chains within the same category.
From the perspective of historical phonology, if sufficient patterns of sound change can be identified across related words, typically more than six items within the same lexical category, then two possibilities arise: the etyma may share a common origin, or they may represent loans. By "origin" we mean words derived from the same root, while "loan" refers to borrowings, including those in the fundamental lexicon. Such cases, whether abstract or concrete, are well attested in the comparanda of Luce (1963) and Shafer (1970s), of which the data reveal plausible cognates across Chinese and other Sino-Tibetan languages.
Returning to the Sino-Tibetan hypothesis, the similarities are abundant, as Shafer’s listings demonstrate. This allows us to raise the question of the origin of the Vietnamese numerals sáu (6) through mười (10), and subsequently to revisit một (1) through ba (3). In any case, it is reasonable to suspect that Chinese numerals share lexical connections with their Vietnamese counterparts, connections that merit closer attention. (T)
Let us examine these patterns:
- 六 lìu 'six' sáu [ M 六 lìu, lù, líu < MC luwk < OC *rug | FQ 力竹 | Etymology: According to Starostin, for *rh- cf. Jianyang so8, Shaowu su7. Shafer: Old Tibetan *drug, Middle Burmese *kʿrok, Lusei ruk || **** Note: ¶ /l- ~ s-/ is a common pattern with Chinese and Vietnamese correspondences, Ex. 力 lì (SV lực): VS 'sức' (strength), 蓮 lián (SV liên): VS 'sen' (lotus), etc., and the notable correspondence is the rounded labial ending /-w/ which suggests some timeframe very near, less than 1,000 years perhaps? || See elaboration below and more of Sino-Tibetan etymologies in the Shafer's list above. ],
- 七 qī 'seven' bảy [ M 七 (柒) qī < MC tsʰit < OC *sn̥ʰid | FQ 親吉 || ** Note: Like Mandarin, most of other Chinese dialects are no longer retaining the final /-t/. See more of Sino-Tibetan etymologies in the Shafer's discussion above on O. B. bdun 'seven' to postulate the possible change for the interchange ¶ /q-(S-) ~ b-(P-)/, a common correspondence between Vietnamese and Mandarin, e.g., 巨 jù: SV 'cự', VS 'bự' (big), 耜 sì: SV 'cử', VS 'bừa' (plow), etc. (Compare the elaboration below on 三 sān: VS ba, 'three', 四 sì: VS bốn, 'four'). For the ending interchange ¶ /-t ~ -j/, hence, /-k ~ -j/, Bernhard Karlgren in his Word Families in Chinese (1933. pp. 25-32 ) established some correspondences form Archaic Chinese > Ancient Chinese > Middle Chinese > Modern Chinese (Mandarin) with which we can easily map them to Vietnamese sounds, e.g, 死 sǐ = SV 'tử', VS 'chết' (die) , 水 shuǐ =SV 'thuỷ', VS 'nước' (water), 尸 shǐ = SV 'thi' /t'ej1/, VS 'thây' (corpse), 屎 shǐ = SV 'thử', VS 'cức' (feces)', etc. (Refer to Sino-Tibetan etymologies in the Shafer's list above and Table 1 in Chapter 8.) ],
- 八 bā 'eight’ tám [ M 八 bā < MC pɯæt < OC *pre:d | FQ 博拔 || Etymology: Per Shafer, Old Tbetan *brgyad, Middle Burmese *hrats, Lusei riat, Sbalti bgyad, Burig rgyad. || ** Note: ¶ /b- ~ t-/ is a common interchange between Vietnamese and Mandarin, for example, 便 biàn ~ SV 'tiện' (convenient), 彼 bǐ ~ VS 'đó' (that), 必 bì ~ SV 'tất', 比如 birù ~ SV 'tỷdụ' (example), 道 dào ~ VS 'bảo' (tell). Besides, Shaanxi dialect call 爸 bā as 'tā' (dad). (See more of Sino-Tibetan etymologies in the Shafer's list above.) ],
- 九 jǐu 'nine’ chín [ M 九 jǐu, jīu, qíu (cửu, cưu) < MC kuw < OC *kuʔ || Note: See more of Sino-Tibetan etymologies in the Shafer's list above. ],
- 十 shí ‘ten’ mười [ Also, VS 'chục' | M 十 shí < MC dʑip < OC *ɡjub || Note: See more of Sino-Tibetan etymologies in the Shafer's list above. ]
Let us examine some corresponding patterns for those numbers:
1) ¶ { L- ~ S- } class correspondences – liquid and fricative interchanges – are numerous:
- 蠟 là (wax) ~ SV sáp,
- 臘 là (the 12th month in lunar clendar) ~ SV chạp,
- 藍 lán (indigo) ~ VS chàm,
- 郎 láng (man) ~ VS chàng [ M 郎 (郞) láng, làng < MC laŋ < OC *ra:ŋ ],
- 浪 làng (wave) ~ VS sóng [ M 浪 làng, láng, lăng, làn < MC laŋ < OC *ra:ŋ, *ra:ŋs],
- 愣 lèng (stupefied) ~ VS sửng,
- 力 lì (force) ~ VS sức,
- 理 lǐ (texture) ~ VS sớ,
- 犁 lí (plow) ~ VS xới [ M 犁 (犂) lí < MC liej < OC *rəj ],
- 亮 liàng (bright, pretty) ~ VS sáng, xinh [ M 亮 liàng < MC lɨaŋ < OC *raŋs | Hainanese /siaŋ/ | Cf. 浪 *ra:ŋs, 景 *kraŋʔ (bright), 爽 *sraŋʔ (bright, dawn) which appear to be doublets. ],
- 螺 luó (clam) ~ VS sò [ M 螺 luó < MC lwa < OC *ro:l ],
- 蓮 lián (lotus) ~ VS sen [ M 蓮 lián < MC lian, len < OC *re:n, *renʔ ],
- 率 lǜ (rate) ~ SV suất,
- 羅 luó (net fishing) ~ SV chài,
- 鼻梁 bíliáng (bridge of the nose) ~ VS sóngmũi,
- 風浪 fēnglàng (stormy waves) ~ VS sónggió,
- 榴槤 líulián (durian) ~ VS sầuriêng [ Note: Both modern Chinese and Vietnamese share the same Malayan root 'durian' (duri = 'thorn') dated some time in the 16th century.],
- 綢 chóu (silk) ~ VS lụa,
- 叢 cóng (bush) ~ VS lùm [ M 叢 cóng < MC tsuŋ < OC *tsoŋ | ¶ /c- ~ l-/ ],
- 久 jǐu (long time) ~ VS lâu,
- 撿 jiăn (pick up) ~ VS lượm,
- 潛 qián (submerge, furtive) ~ VS lặn [ Also, VS 'lén', 'lẫn', 'lánh' (hide) | M 潛 qián < MC dziam < OC *zlom, *zloms | ¶ /q- ~ l-, ng-/ (OC */d- ~ l-/) | cf. 潛逃 qiántăo (SV tiềmđào) ~ VS 'lẫntrốn', # 'trốnlánh' (to hide away) ],
- 刷 shuā (rub) ~ SV loát [ VS 'chà', Ex. 印刷 yìnshuā (ấnloát) ],
- 鄉 xiāng (village) ~ SV làng,
- 翔 xiáng (glide) ~ VS lạng [ Also, VS 'lượn' ],
- 心 xīn (heart) ~ VS lòng,
- 長 zhǎng (grow) ~ VS lớn,
- 澤 zé (swamp) ~ VS lầy,
- 擇 zé (select) ~ VS lựa [ M 擇 zé, zhái < MC ɖajk < OC *ɫhak, *rla:g | ¶ /z- ~ l-/ ],
etc., and the reverse, i.e., the { S- ~ L- }. The { S- } class includes those fricatives and affricates { j-, q-, x-, sh-, c-, ch-, zh-,...}. This sound change pattern correspondences are plentiful as well:
etc.
2) ¶ { Q-(zh-, ch-, c-, s-, x-, j-...) ~ B-(p-, ph-...) } (affricate, fricative, and labial interchanges) : Words with these patterns are similar to those of 三 sān for VS ba (three) and 四 sì for VS bốn (four) as speculated based on Shafer's comments regarding { OB bźi < *bźli }.
- 池 chí (pool) ~ VS bể [ M 池 chí, chè, tuó < MC da, ɖjiə̆ < OC *l'al, *l'a:l ],
- 津 jīn (river bank) ~ VS bến [ M 津 jīn < MC tsin < OC *ʔslin ],
- 七 qī (seven) ~ VS bảy,
- 三 sān (three) ~ VS ba,
- 嫂 săo (sister) ~ VS bậu [ M 嫂 (㛮) sǎo (tẩu) < MC saw < OC *saw, *suːwʔ || Note: VS 'bậu' is more likely derived from 妹 mèi: SV 'muội' (younger siter), though. SV 'tẩu' ~ VS 'bậu' so posited because there is a usage in modern Mandarin that a man may want call a woman as 'săo', 'asăo' 阿嫂, or 'săozi' 嫂子, similar to the English 'miss', which is in the same connotation as that in Vietnamese. So it is noted here to demonstrate the capacity of possible interchange between Mandarin 嫂 săo and VS 'bậu'. ],
- 曬 shài (sun dry) ~ VS 'phơi' ~ 'sấy' [ M 曬 (晒) shài, shī < MC ʂai, ʂaɨj < OC *srels, *sreːls ],
- 扇 shàn (fan) ~ SV phiến [ M 扇 shān, shàn (thiên, phiến, thiến) < MC ɕiɛn, ɕian < OC *hljen, *hljens | FQ 式連 || Also, SV 'thiên', 'thiến' ],
- 煽 shān (to fan) ~ SV phiến [ M 煽 shān, shàn < MC ɕian < OC *hljen, *hljens],
- 商 shāng (trade) ~ VS buôn [ M 商 shāng < MC ʂaŋ < OC *taŋ | ¶ /sh- ~ b-/ || Ex. 商人 shāngrén: VS 'conbuôn' (merchant) ],
- 筮 shì (divination) ~ SV phệ [ M 筮 shì, yì < MC tʂej < OC *dhats || Also, SV 'thệ' ],
- 四 sì (four) ~ VS bốn [ M 四 sì < MC sjɨ < OC *slhijs ],
- 耜 sì (plough) ~ VS bừa [ M 耜 sì (SV tỷ, cử) < MC zjɨ < OC *lhǝʔ ],
- 艘 sōu (large boat) ~ VS bầu [ M 艘 sōu < MC səw < OC *suːw, *sɯːw || Ex. 艘船 sōuchuán: VS 'ghebầu' (freighter) ],
- 餿 sōu (distasteful) ~ VS bựa [ 餿 sōu < MC ʂuw < OC *sru ],
- 小 xiăo (little) ~ VS bé [ M 小 xiăo < MC siaw < OC *smewʔ || cf. 微 wéi (SV vi)' \ ¶ w- ~ nh-, b- ],
- 渣 zhá (dregs) ~ VS bã, etc.
- 疤 bā (scar) ~ VS sẹo [ M 疤 bā < MC pa < OC *pra: ],
- 板 băn (floor) ~ VS sàn [ M 板 băn < MC pɑn < OC *pra:nʔ ],
- 比 bǐ (compare) ~ VS so,
- 並 bìng (parallel with) ~ VS sánh [ cf. 並肩 bìngjiǎn (VS sánhvai, 'side by side'), 並行 pìngxíng: VS 'songhành' \ @ 並 pìng ~ 雙 shuāng \ ¶ /p- ~ s-/,]
- 怕 pà (afraid) ~ VS sợ [ M 怕 pà, pò, bó < MC pʰaɨ < OC *pʰraːɡ, *pʰraːɡs ],
- 派 pài (dispatch) ~ VS sai [ M 派 pài < MC phaj < OC *phre:ks | cf. 差 chāi: SV 'sai' (depatch) ],
- 聘 pìng (betroth) ~ SV sính [ M 聘 pìng, pìn < MC phjiaŋ < OC *phjiaŋh ],
- 別 bié (do not) ~ VS chớ [ M 別 bié, biè < MC biat, piat < OC *bred, *pred || Note: 別 bié is a contraction of 不要 búyào, to be exact. ],
- 騁 chéng (gallop) (cf. 娉 pìng) ~ SV sính, VS phóng,
- 秤 chéng (steelyard) (cf. 平 píng) ~ SV bình, VS cân,
- 津 jīn (river bank) (cf. 筆 bǐ) ~ SV tân, VS bến,
- 走 zǒu (run) ~ 跑 păo (modern Mand.), VS chạy,
- 霄 xiāo (vault of sky) ~ SV tiêu; also, 霄 reads báo, bó, VS bầu, as in 'bầutrời'.
and the invert of labial and affricate interchanges {P- (b-...)} ~ {S- (ch-...)}:
etc., and these shifting patterns, naturally, appear internally in the Chinese language:
The same pattern also appears in dissyllabic forms:
- 并肩 bìngjiān (shoulder by shoulder) ~ VS sánhvai,
- 比方 bǐfāng (compare) ~ VS sosánh,
- 比肩 bǐjiān (side by side) ~ VS sátcánh,
- 並非 bìngfēi (do not) ~ VS chẳngphải,
- 傍晚 bángwăn (dusk) ~ VS chạngvạng,
- 分享 fēnxiăng (share) ~ VS chiasớt,
- 聘禮 pìnglǐ (betroth) ~ SV sínhlễ,
- 起源 qǐyuán (originate) ~ VS bắttnguồn,
- 起頭 qǐtóu (start) ~ VS bắtđầu,
etc.
The intermediate patterns { /s-/ ~ /t-/ } and { /q-/ ~ /th-/ } are acting agents { t-(th-...) ~ b-(p-, ph-) } for the Chinese 七 qī and SV thất to change into 'bảy' /bej3/ (seven) as speculated through the invert pattern { B(p)- ~ T(th)- }, that uniformly occurred in the process of sound change from Middle Chinese to Vietnamese in the 10th century. Speculation of 'ba' ~ 'tam', 'bốn' ~ 'tứ', and 'bảy' ~ 'thất' will be illustrated in the list below, including dissyllabic words.
- 甭 béng (do not) ~ VS đừng,
- 碰 pèng (collide) ~ VS đụng,
- 嫖 piáo (intercourse) ~ VS đéo[ Cantonese /tjew3/ ],
- 婊 biăo (whore) ~ VS đĩ,
- 笨 bèn (stupid) ~ VS đần,
- 匹 pǐ (mate) ~ SV thất [ M 匹 (疋) pǐ, pī < MC pʰit < OC *pʰid ],
- 必 bì (have to) ~ SV tất [ VS 'phải' (must, have) | M 必 bì < MC pjit < OC *plig ],
- 比 bǐ (compare) ~ SV tỉ [ Also, VS 'so' ],
- 譬 pì (compare) ~ SV thí [ ex. 譬如 pìrú: SV 'thídụ' (for instance) ],
- 頻 pín (channel) ~ SV tần,
- 幣 pì (currency) ~ SV tệ,
- 俾 bēi (inferior) ~ VS tệ[ SV tỳ | M 卑 bēi < MC pje < OC *pe ],
- 鄙 pì (vile) ~ VS tệ,
- 卑 bèi (mediocre) ~ SV tì,
- 畢 bì (finish) ~ SV tốt[ M 畢 bì < MC pjit < OC *pit ],
- 濱 bīn (river bank) ~ SV tân,
- 賓 bīn (guest) ~ SV tân,
- 髮 fā (hair) ~ VS tóc ~ SV phát, bị,
- 道 dào (tell) ~ VS bảo [ SV đạo ],
- 燙 tàng (burnt) ~ VS bỏng [ SV thang ],
- 談 tán (discuss) ~ VS bàn [ SV đàm ],
- 投 tóu (put in) ~ VS bỏ [ SV đầu | M 投 tóu < MC dəw < OC *do: | Ex. 投票 tóupiào: VS 'bỏphiếu' (cast a ballot), 投資 tóuzī: VS 'bỏtiền' (invest) ], etc.
- 劍柄 jiànbǐng (sword) ~ VS #thanhgươm,
- 奔波 bènbó (busy oneself for) ~ VS tấttả [ SV bônba | M 奔 (犇) bēn, bèn, fèn < MC puon < OC *pɯːn, *pɯːns | ¶ /-n ~ -t/ ],
- 圈套 quāntào (trap) ~ VS cạmbẫy,
- 突然 tùrán (suddenly) ~ VS bỗngdưng,
and, again, dissyllabic words,
etc., all of which loosely give us the { t(h)- ~ p(h)- } correspondence that we need to establish the phonemic correlation of 'bảy' and 'thất'.
This type of analogy is questionable, nevertheless, particularly when we consider the cases of ba (three) and bốn (four). It seems that no well-defined correlation can be established between the Chinese and Vietnamese cardinal numbers, though parallels may be found in other Sino-Tibetan languages. (See Shafer’s discussion of four in his comment on the Sino-Tibetan form Old Bodish bźi.) We have also speculated about the origin of the Vietnamese bảy (seven), since it does not appear to derive from a decimal-based system. If bảy is not of Austroasiatic Mon-Khmer origin, then other numerals, such as ba and bốn, and possibly the rest, may likewise stem from different roots.
The only tentative evidence linking Vietnamese ba and Sino-Vietnamese tam (three) 三 sān is the correspondence between Vietnamese ba /ba1/ and Hainanese /ta1/, a small but suggestive clue preserved in a deep substratum. The sound pattern {B(p)- ~ T(th)-} (hence, {S- ~ B}) was a major feature in the reduction of 41 Middle Chinese initials into the 20 Sino-Vietnamese initials during the 10th century (Nguyen Ngoc San, 1993). If Chinese /sān/ and Vietnamese /ba1/ are indeed cognate, ba may have resulted from the loss of final -m and the shift of /s-/ (or alternately /t-/) to /b-/, or it may have evolved from an ancient Yue form, as reflected in Minnan varieties such as Hainanese. If Hainanese /ta1/ is accepted as a plausible cognate of Vietnamese ba /ba1/ (cf. Mandarin 仨 sā), then other Chinese isoglosses may reveal similar patterns. For example: Mandarin 山 shān ~ Hainanese /twa1/ vs. Sino-Vietnamese /san1/ ~ Sinitic-Vietnamese /non1/ > /nui5/ (mountain), and possibly đồi /doj2/ (hill), which illustrate two major interchange patterns: /sh- ~ n-/ and /n- ~ d-/, along with the shift /-n ~ -i/.
If that is the case for ba, it is more likely that the form underwent a dissimilatory process in which a rounded final /-wm/ was transferred forward and labialized, eventually yielding /bw-/ and /ɓ-/ in later stages of Vietnamese internal development (cf. MC sam < som, Proto-C **/sawm/, Tibetan gsum, gsum-po, 'third'). In the Quảngnam sub-dialect, for example, tam1 is pronounced /towm1/ and ba1 as /bwa1/. The phenomenon of rounding transfer from a final labial to the corresponding initial is not uncommon, as noted by Baxter and later by Bodman (1980). If this line of reasoning holds, ba must be very archaic, predating even the emergence of the Kingdom of NamViệt.
The shift from Chinese labials to Vietnamese dentals has been observed and discussed by several linguists, including Maspero and Karlgren (1939), Arisaka Hideyo, Paul Nagel, Pulleyblank (1984), Nguyen Ngoc San (1993), and Nguyen Tai Can (2000). Pulleyblank summarized the process as follows: Vietnamese /t-/ derives from a chain of developments { /s-/ < /ts-/ < /psi-/ < /pci-/ }, effectively as if from /ts-/. Forrest (1958) attributed this to AC /pj-/, /bj-/ and the palatalization process that occurred before certain s-initial words were borrowed into Vietnamese. By extension, the reverse process /s-/ > /p-/ can also be deduced.
If the cases of bảy and ba are meaningful in this respect, then 四 sì (four), SV tứ [tɨj] ~ VS bốn, must have undergone a similar process. (See Shafer’s elaboration on four in his comments on the Sino-Tibetan form Old Bodish bźi.)
3) If the cases of ba, bốn, bảy are correct, tám should fit into the same corresponding pattern { /b-/ ~ /t-/ } as well.
4) The pattern { j-(z-, q-) ~ ch- } seems to justify the case by itself: Chinese 九 jǐu ~ Vietnamese 'chín' (nine). In fact, the corresponding pattern is easy to find: 煎 jiān ~ 'chiên' (fry); 走 zǒu ~ 'chạy' (run), 足 zú ~ 'chân' (foot); 焦 jiāo ~ 'cháy' (burnt), 緊 jǐn ~ 'chặc' (tight), 正 zhèng ~ 'chính' (main), etc.
5) { S(h)- (x-, q-, z-) ~ m- } interchange is not rare if we examine OC intitials */s-/, */sh-/ that had given rise to MC /m-/ from the Western Han period. W. South Coblin (1982. pp 126,127) noted the following while investigating the paranomastic glosses:
- 戌 *sjwet ~ 滅 *mjiät
- 杪, 眇 *mjiäu: ~ 小 *sjiäu:
"that the word 戌 may hay an initial cluster **sm- was suggested by Li (1049:340) on the basis of Han-time paranomastic glosses and Old Chinese loans in the Tai languages. [..] Pulleyblank (162:136) has suggested that in the word 少 (MC śjäu:) 'few', which belongs to the same OC phonetic series and surely cognate to 小, MC ś- derives from earlier **mh-. [..] Perhaps 小 and 少 should be reconstructed with the same initial according to a scheme such as the follwing: 小 **smjagwx > WH *sm- > sjiäu [ ~ ] 少 **smjiagwx(?) > WH *sm- > śjau"
For modern Mandarin the pattern { S(h)- (x-, q-, z-) ~ m- } can still be established as follows:
- xiăo 小 ~ mó 尛,
- căi 裁: VS 'may' (sew) [ SV 'tài' | M 裁 cái, zài < MC dzəj < OC *zlɯː, *zlɯːs | ¶ /c- ~ k-(c-)/, Ex. 裁衣 căiyī: VS 'mayáo' (tailoring) ],
- qìng 慶: VS 'mừng' (celebrate) [ M 慶 qìng, qiāng, qīng (khánh, khanh, khương) < MC kʰiajŋ < OC *kʰraŋ ],
- shī 失: VS 'mất' (loss) [ SV 'thất' | M 失 shī, yì (thất, dật) < MC ɕit < OC *hlig ],
- xián 鹹: VS 'mặn' (salty) [ M 鹹 xián < MC ɦəɨm < OC *ɡrɯːm | Dialects: Changsha xan12, Shuangfeng ɠã12, Nanchang han12, Meixian ham12, Cant. ha:m12, Amoy ham12 ($); kiam12 | ¶ /h- ~ m-/ < OC */grj- ~ m-/ || See elaboration on this etymology for 'mắm' (anchovy). ],
- xiě 血: VS 'máu' (blood) [ SV 'huyết', also, VS 'tiết' | M 血 xiě, xiè < MC xwiet < OC *swit | According to Starostin: Viet. also has tiết 'animal blood' - an archaic loan (with t- regularly representing OC *s-, which was already lost in MC). || cf. huāng 衁 : máu 'blood' \ ¶ MC hw-(xw-) ~ m-, phonetic 芒 māng. According to Bodman (1980. p.120): M 衁 huāng, nǜ < MC hwaŋ < OC *hmaːŋ. 'An interesting hapax legomenon for 'blood' appears in Dzo Zhuan (左傳) which has an obvious Austroasiatic origin. Proto-Mnong *mham ('blood'), Proto-North Bahnaric *mham ('blood') ],
- xiāo 硝: VS 'muối' (salt) [ SV 'tiêu' | M 硝 xiāo, qiào < MC siaw < OC *sew || cf. 硭 máng (SV mang): VS 'muối' (table salt) ],
- zuǐ 嘴: VS 'môi' (lip) [ ~ VS 'mỏ' | M 嘴 zuǐ < MC tsiə̆, tswiə̆ < OC *ʔseʔ | According to Starostin, originally written as 觜 (q.v.) and also read OC *ʔseʔ, MC tswiə̆ (FQ 即移) 'a horn-shaped curl on the head of birds and cats'. Tibetan: mtʂu lip, beak.],
- qiáng 強: VS 'mạnh' (strong),
- shèng 剩: VS 'mứa' (leftover) [ Also, VS 'thừa' ~> VS 'chứa' | M 剩 (剰) shèng < MC ʑiŋ < OC *ɦljɯŋs | ex. 剩飯 shèngfàn: VS 'bỏchứa' = 'bỏmứa' (cơmthừa) 'food leftover' ],
- xīn 新: VS 'mới' (new) [ cf. 萌 méng (new sprout) ], etc.,
- qǐng 請: VS 'mời' (invite) [ M 請 qǐng, qìng, qíng, qīng < MC tsʰiajŋ, dziajŋ < OC *zleŋ, *shleŋʔ, *zhleŋs || cf. 邀 yāo /y-/ ~ /m-/ (invite) ],
- mō 摸: VS 'sờ' (touch) [ Also, VS 'mò', 'mó' | M 摸 (摹) (mô, mạc) mō, mó, māo, mú < MC muo, mak < OC *ma:, *maːɡ ]
- míng 明: VS 'sáng' (bright) [ M 明 míng < MC maiŋ < OC *mraŋ ],
- màn 慢: VS 'chậm' (slow) [ M 慢 màn, mán, miàn < MC maɨn < OC *mroːns || cf. 遲 chí (SV trì) VS 'chậm' ~ 'trễ' (tardy) ],
- miào 廟: SV 'miếu' (temple) [ cf. the interchange with M 朝 cháo: VS 'chầu' (attend the imperial court) ].
reversely, for the pattern { m- ~ S- (q-, j-, x-...) } we have:
To put it in perspective, in the case of 'ten', shí 十 may not be 'mười', but it is certainly the etymon of 'chục' in Vietnamese as attested by its isoglossal Cantonese sound /ʃʌp8/.
All assumptions regarding Chinese and Vietnamese numerical affiliation, of course, remain speculative, and they are often countered by the presence of Mon-Khmer cognates. The intention here is simply to offer Mon-Khmer specialists some additional perspectives beyond the familiar comparison of Vietnamese and Mon-Khmer numerals 1 to 5. Readers are encouraged to keep an open mind, since multiple possibilities remain. If the Vietnamese numerals sáu (six) through chín (nine), or even chục (ten), can be shown to fit into the sound change patterns associated with Chinese, then it becomes natural to question whether the first five numerals truly derive from Mon-Khmer sources.
III) Implications
Vietnamese numerals reinforce the broader lexical pattern: systematic Sino‑Tibetan correspondences.
Mon‑Khmer parallels remain fragmentary, undermining claims of direct Austroasiatic affiliation.
The evidence situates Vietnamese firmly within a Sino‑Tibetan continuum.
Etymologically, the more fundamental a word is, the more likely it has undergone drastic phonological change over time, sometimes to the point of being unrecognizable and leaving no clear historical traces. In some cases, even basic vocabulary has been entirely replaced by later borrowings, as with many Chinese-origin terms for body parts in Vietnamese (see Paul K. Benedict, Austro-Thai Language and Culture with a Glossary of Roots, 1975). In other words, the closer the resemblance between forms, the greater the likelihood that they are loanwords, as demonstrated by numerous Sinitic-Vietnamese and Sino-Vietnamese items when compared with other Sino-Tibetan etyma, including those in the Daic languages.
This perspective challenges the non-academic assumption that basic words are inherently more stable than those in higher lexical categories. Shafer’s Sino-Tibetan etymologies, as presented in this paper, demonstrate the opposite: even fundamental vocabulary can undergo significant shifts. Complex and multisyllabic words, in particular, are especially vulnerable to change and more easily influenced by factors such as dialectal variation once they are borrowed into a recipient language, for instance,
- "cùichỏ" (胳膊)肘子 (gēbó)zhǒuzi (elbow),
- "bảvai" 臂膊 bèibó (shoulder),
- "màngtang" 太陽穴 tàiyángxué (temple),
- "mỏác" 胸骨 xiōnggǔ (sternum),
- "chânmày" 眉梢 méishāo (eyebrow),
- "đầugối" 膝蓋 xīgài (knee),
- "mắtcá" 腳踝 jiăohuái (ankle) [ SV 'cướckhoả' | M 腳 jiăo, jué < MC kɨak < OC *kaɡ || M 踝 (髁) huái, huà (hoạ, coả, hoã, khoã, hoả, khoả) < MC ɦwaɨ < OC *ɡroːlʔ || cf. 踝骨 huáigǔ: VS 'mắtcá' (ankle bone) ] , etc.,
The only exceptions are those belonging to the category of fundamental vocabulary, largely expressed through simple monosyllabic sounds. Examples include "ba" 爸 bā "dad", "má" 媽 mā "mom", "mắt" 目 mù "eye", "xơi" 食 shí "eat", "uống" 飲 yǐn "drink", and "đất" 土 tǔ "soil". These forms tend to preserve their articulation more consistently, though not all languages of the world share identical phonetic realizations. This observation does not contradict the principles of tonal development: originally, clusters of consonantal initials and finals without tone evolved into tonal systems with simplified initials, as seen today, in accordance with Haudricourt's theory of tonegenesis.
One may rationalize that many Sino-Tibetan languages – Chinese, Burmic, and Daic among them –began with the same basic words at a very early stage, only to diverge along separate paths over several millennia. This is consistent with the fact that living languages are never static but remain in constant flux, evolving from primitive to more sophisticated stages. In particular, the shift from toneless consonantal clusters to tonal systems, as in monosyllabic Old Chinese, exemplifies this dynamic change. Comparable phenomena are found in Indo-European, where drastic sound changes and semantic shifts often obscure etymological connections. For instance, the English names of the 9th through 12th months – "September" (literally "seventh"), "October" ("eighth"), "November" ("ninth"), and "December" ("tenth") – no longer align with their numerical meanings. French follows the same calendar scheme, and similar divergences appear in lexical pairs such as French "route" ~ English /ru:t/, /raut/, or French "merci" (thankfulness) ~ English "mercy" (compassion).
Some may argue that the etymological postulations for "sáu", "bảy", "tám", "chín", and "mười" are not entirely convincing. Before reviewing other recurring patterns between Chinese and Vietnamese, such as "bảy", "ba", "bốn" with the correspondence {S- ~ B-}, let us proceed to examine "một" "one" and "hai" "two". This short list, like the earlier examples, is by no means exhaustive.
- một 一 yī (one): SV nhất [ Note: Cf. Vietnamese 'mốt' as in 'hămmốt' (twenty-one) | M 一 yī, yí, yì, yāo < MC ʔjit < OC *qliɡ || According to Nguyen Ngoc San (Ibid., p. 74), all the MC initial consonants /l-/, /m-/, /n-/, /nh-/, ng- had their correspondences in Sino-Vietnamese and when they were imported into the targeted language, all became words of the lower register tones, i.e., /~/ ngã and /./ nặng, except for the case of "nhất", that is supposedly nhật", hence, "một".].
- yì 溢: VS 'mứa' (spill),
- yì 蟻: VS 'mối' (termite),
- yún 雲: VS 'mây' (cloud),
- yǔ 雨: VS 'mưa' (rain),
- yăo 舀: VS 'môi'~ 'muỗng' ('scoop) [ Also, 'múc' (ladle out) | M 舀 yăo < MC jiaw < OC *jiaw || cf. 舀粥 yăozhōu: VS 'múccháo' (scoop out poridge) ],
- yóu 魷: VS 'mực' (cuttlefish) [ M 魷 yóu | Note: phonetic stem M 尤 yóu < MC jəu < OC *wjə || cf. 魷魚 yóuyú = later word 墨魚 mòyú (VS cámực) ],
- yăn 眼: (modern usage) ~ mù 目 (old usage) VS 'mắt' (eye),
- yāo 邀: VS 'mời' ~ vời' (invite) [ cf. M 請 qǐng (VS 'xin') ],
- yán 鹽: VS 'muối' (salt) [ SV 'diêm' | M 鹽 (塩) yán (diêm, diễm) < MC jiam < OC *ɡ·lam, *ɡ·lams | According to Starostin: Protoform: *jam (r-). Meaning: salt. Chinese: 鹽 *lam salt; 鹹 *grjə:m salt, salty. Tibetan: rgjam-chwa a k. of salt, like crystal, lgyjam-chwa a k. of rock-salt. Burmese: jamh gunpowder, saltpetre. Kachin: jam1 a k. of salt. Kiranti: *ru\m. Comments: Ben. 57; Mat. 184, Shafer quoted Haudricourt's posit of this word as 硝 xiāo for 'salt'; however, there also exists 硭 máng: SV 'mang' (rude salt) for VS 'muối', cf. 盲 máng: VS 'mù'. ],
and the reverse, - giây [dʒjʌj]: 秒 miăo (second) [ M 秒 miăo < MC mjɜw < OC *mews ],
- dân [jʌn1]: 民 mín (citizen),
- diện [jiən6]: 面 miàn (face),
- diệu [jiəw6]: 秒 miào (miraculous),
- di [ji1]: 彌 mí (full),
- danh [jajɲ1]: 名 míng (name),
- diệuvợi [jew6vəj6]: 渺茫 miăománg ('meagerly),
The pattern { /y-/ ~ /m-/ } :
- hai ~ 二 èr (two): SV nhị [ Also, VS 'nhì' as in 'thứnhì' (the second) | M 二 (弍 貳) èr < MC ȵiɪ < OC *njis | FQ 而至 || Note: the dropping of /ɲ- SV 'nhị' /ɲej6/ > /hei1/. Cf. 而 ér ~ SV 'nhi' /ɲej1/. Speakers of the Vietnamese subdialect of Quảngnam in Central Vietnam pronounce "hai" as "huơ" /hwə1/. Cf 至 zhì (VS tới, 'reach') ],
- năm ~ '五' (five): SV ngũ [ Also, VS 'nhăm' ~ 'dăm' ~ 'lăm' as in 'hămlăm' (twenty-five) | M 五 wǔ, wu < MC ŋuo < OC *ŋaːʔ | cf. Hainanese /lan2/ | According to Starostin: be five. For *ŋh- cf. Xiamen ŋo|6, Chaozhou ŋou4, Fuzhou ŋo6, Jianou ŋu6, ŋu8. Other dialects: Wenzhou: ŋ22, Changsha: ŋ2; u 2 (lit.), Meixian: ŋ2, Cant.: ŋ22 || Note: For Sino-Tibetan cognates, see Shafer's list in the previous section. ]
- 偎 wèi: VS 'nể' (respect),
- 味 wèi: VS 'nếm' (taste),
- 臥 wò: VS 'nằm' (lie down), Also: VS 'ngủ' (sleep),
- 握 wò: VS 'nắm' (hold),
- 國 guó: SV 'quốc' [wʌk7], VS 'nước' (nation),
- 鍋 guò: SV 'qua' [wa1], VS 'nồi' (pot),
- 話 huà [hwa4]: VS 'nói' (talk),
- 壓 yā: VS ép (suppress),
- 爺 yě: VS 'nội' (grandfather),
- 語 yǔ: SV 'ngữ', VS 'nói' (speak), etc.,
Words of the pattern of the interchange { /w-/ ~ /n-/ }are abundant:
- 'nhất' > /jãt/ > /mât/ > 'một'
- 'nhị' > /nhej/ > /hẽj/ > 'hai'
- 'tam' > /tã/ > /ta/ > 'ba' /ɓa/
- 'tứ' > /psɨĩ/ > /bữj/ > 'bốn'
- 'ngũ' > /ngâu/ > /nẫw/ > 'năm'
- 'lục' > /lũkw/ > /sũkw/ > 'sáu'
- 'thất' > /tất/ > /bẫt/ > 'bảy'
- 'bát' > /ɓãt/ > /tãt/ > 'tám'
- 'cửu' > /kjɨũ/ > /k'jữw/ > 'chín'
- 'thập' > /chẫp/ > /mập/ > 'mười'
The articulation of Vietnamese numerals appears to have evolved as late as the period from Ancient Chinese to early Middle Chinese, roughly between the 3rd and 7th centuries. They may have taken shape through the nasalization of vocalism in ancient Vietnamese, a feature still evident in northern central dialects before their speakers crossed the 16th parallel and migrated southward to resettle in the former Champa territories beginning in the 13th century. The Huế subdialect today conservatively preserves four tones in the lower register while omitting the upper four-tone register of the past, a pattern that may reflect the vocal contours of what ancient Vietnamese once sounded like.
This phenomenon can be explained by the fact that the Kinh, living in lowland and metropolitan centers, were compelled to transact with Han colonial administrators in monetary exchange from at least 111 B.C. until 939 A.D. Numerals, therefore, must have been among the earliest lexical items to crystallize in the emerging Annamese language.
In terms of articulation, one may test the hypothesis by pronouncing the Sino-Vietnamese "nhất" and "thập" with a nasalized initial /m-/. Similarly, with Sino-Vietnamese "nhị" /nhei6/, drop /n-/ from /nh-/ and retain /h-/. Continue with Sino-Vietnamese "tam", "tứ", "thất" using an initial /b-/ (cf. Hainanese /ta1/, /tej3/, /sit5/), Sino-Vietnamese "lục" with /s-/ (cf. Mandarin lìu), and "ngũ" (cf. Cantonese /ɱ4/) with /n-/ (/nh-/, /l-/). For "bát", apply /t-/, and for "cửu", apply /ch-/. All of these forms passed through cycles of nasalization and denasalization. What goes around comes around, which explains the extant Vietnamese numerals from 6 to 10, a pattern absent in Khmer.
In any case, Sino-Vietnamese numeral readings are deeply embedded in popular usage, appearing not only in ordinals but also in countless idiomatic expressions and set phrases.
Table 2 - Comparative Sino‑Vietnamese numeral idioms
| Vietnamese | Meaning | Chinese equivalent |
|---|---|---|
| nhấtnghệtinh nhấtthânvinh | "one specialized skill brings lifelong honor" | 一技之長 yījìzhīcháng |
| nhịthậptứhiếu | "twenty-four filial exemplars" | 二十四孝 èr shí sì xiào |
| bấtquátam | "never more than thrice" | 不過三 bùguòsān |
| tứđỗtường | "four addictive pleasures" | 四大嗜好 sìdàshìhào |
| mâmngũquả | "tray of five auspicious fruits" | 五果盤 wǔ guǒ pán |
| lụcsúctranhcông | "six domestic beasts vying for merit" | 六畜爭功 lìu chù zhēng gōng |
| thấttìnhlụcdục | "seven emotions and six desires" | 七情六欲 qīqínglìuyù |
| thấtđiênbátđảo | "seven mad, eight scattered (utter chaos)" | 七顛八倒 qīdiānbādǎo |
| chốncửutrùng | "nine-layered forbidden city" | 九重城 jǐuchóngchéng |
| thậpmỹthậptoàn | "tenfold beauty and perfection" | 十全十美 shíquánshíměi |
| tứhải giai huynhđệ | "everybody in every corner of the world is brother" | 四海皆兄弟 sìhǎijiēxiōngdì |
| vangdanhbốnbể | "famous all over the world" | 名揚四海 míngyángsìhǎi |
| bađầusáutay | "three heads and six arms" (extraordinary ability) | 三頭六臂 sāntóulìubì |
| tamđạiđồngđường | "three generations under one roof" | 三代同堂 sāndàitóngtáng |
| báchchiếnbáchthắng | "a hundred battles, a hundred victories" | 百戰百勝 bǎizhànbǎishèng |
| tìnhthiênthu | "love spanning a thousand autumns" | 千秋之戀 qiānqīuzhīliàn |
| vạnsựkhởđầunan | "ten-thousand endeavors begin with hardship" | 萬事開頭難 wànshìkāitóunán |
| chíntriệuchínchínchín... đoáhoahồng | "9,999,999 roses" | 九百九十九萬九千九百九十九朵玫瑰 jǐu bǎi jǐu shí jǐu wàn jǐu qiān jjǐu bǎi jǐu shí jjǐu duǒ méi guī |
| hàngtỷtỷngười | "billions of people on Earth" | 數十億人 shùshíyìrén |
| mộtphầnứcgiây | "one trillionth of a second" | 萬億分之一秒 wàn yì fēn zhī yī miǎo |
It should be noted that these expressions can be readily understood by Chinese speakers if the Vietnamese texts are translated word for word if those expressions are not directly from the Chinese idioms themselves. Vietnamese speakers can easily provide countless more of other examples of the same type, since the ones cited above are chosen at random without special selection. The list could be extended indefinitely. The essential point is that Sino‑Vietnamese numerals flow from the Vietnamese tongue as naturally as second nature. Vietnamese native numerals, though different in form, share the same character: both sets are deeply rooted in the collective linguistic consciousness of the Vietnamese people.
That said, the weakness of the hypothesis concerning the origin of Vietnamese cardinal numbers lies in the resemblance of "một" and "năm" to Mon‑Khmer forms, or doesn't it? The difficulty would vanish if we could demonstrate that the first five Mon‑Khmer numerals were in fact a subset of the full Vietnamese decimal system.
Yet who can assert with certainty that partial cognates in numerals suffice to prove genetic relationship between distant languages, historically? The direction of borrowing need not always be Mon‑Khmer into Vietnamese; the reverse is also possible. One must also recall that the ancestors of modern Vietnamese did not cross the 16th parallel until the 13th century and the Mon-Khmer people could have descended from the ancient Yue tribes!
Moreover, a cognate that appears too close may be more suspicious than one that is distant.
In the hypothetical case of Vietnamese numerals, including "tám" (8) and "mười" (10), the postulations for 1, 2, 3, 4, 5, 6, 7, and 9, as analyzed above, cannot be dismissed outright. Even if the plausibility of cognateness remains uncertain. And even if debates about five‑based versus ten‑based systems persist, the theorization retains value. Each proposal has its own merit when compared with Mon‑Khmer forms, except for Benedict’s speculative elaboration on Austro‑Thai numerical cognates (1975, pp. 29–30). What has been attempted here is only a partial exploration, offering cues for further searches into other Sinitic‑Vietnamese words with Chinese cognates.
To place matters in perspective: the correspondence between Vietnamese and Khmer numerals extends only from 1 to 5. From "sáu" onward, Vietnamese aligns with Chinese 六 lìu and continues to share higher numerals, potentially without limit, including their idiomatic uses. The notion that a stronger culture must have imposed its numerals on a weaker one cannot be upheld, for the Khmer Kingdom itself was a dominant regional power between 802 and 1432 A.D. The overlap of numerals 1 to 5 between Khmer and Vietnamese may therefore represent remnants of a prehistoric genetic affinity, if such existed, prior to the migration of the Kinh people into the southernmost regions of Vietnam. At the same time, it is reasonable to postulate that Vietnamese numerals from 6 upward were later Chinese loanwords, introduced into early Annamese speech during the period of Han colonial rule.
In any case, nothing alters the fact that between 90 and 95 percent of Vietnamese vocabulary is Sinitic‑Vietnamese. That remains the central point of this research.
Conclusion
Cardinal numbers, long considered diagnostic, reveal the inadequacy of the Mon‑Khmer framework. Vietnamese numerals align more closely with Sino‑Tibetan etyma, reinforcing the argument that Vietnamese is not an Austroasiatic outlier but part of a multi‑branch Sino‑Tibetan continuum.
References
Sino‑Tibetan Framework
Shafer, Robert. Introduction to Sino‑Tibetan. Wiesbaden: Otto Harrassowitz, 1966–1974.
Benedict, Paul K. Sino‑Tibetan: A Conspectus. Cambridge University Press, 1972.
Matisoff, James A. Handbook of Proto‑Tibeto‑Burman. University of California Press, 2003.
Vietnamese Historical Linguistics
Maspero, Henri. Études sur la phonétique historique de la langue annamite. Les initiales. Paris: Imprimerie Nationale, 1912.
Haudricourt, André‑Georges. L’origine des tons en vietnamien. Journal Asiatique 242, 1954, pp. 69–82.
Haudricourt, André‑Georges. Problèmes de phonétique diachronique: la nasalisation vocalique en vietnamien. Bulletin de la Société de Linguistique de Paris 49, 1954.
Numeral Systems and Comparative Studies
Starostin, Sergei. Sino‑Tibetan Numeral Systems. Moscow, 1989.
Matisoff, James A. The Numeral System of Proto‑Tibeto‑Burman. Linguistics of the Tibeto‑Burman Area 14(2), 1991.
Sagart, Laurent. The Roots of Old Chinese Numerals. Cahiers de Linguistique Asie Orientale 24(2), 1995.
Thurgood, Graham. Vietnamese Numerals and Tonogenesis. California State University, Chico, 2002.
Mon‑Khmer and Austroasiatic Context
Thomas, David D. Basic Vocabulary in Some Mon–Khmer Languages. Mon–Khmer Studies, 1960.
Sidwell, Paul. Austroasiatic Dataset for Phylogenetic Analysis: 2015 Version. Mon–Khmer Studies 44, Mahidol University / SIL International.
Alves, Mark J. An Updated Overview of the Austroasiatic Components of Vietnamese. Languages 9(12), 2024.
Comparative Methodology
Campbell, Lyle. Historical Linguistics: An Introduction. Edinburgh University Press, 2013.
Haspelmath, Martin. Comparative Linguistics and the Problem of Spurious Similarities. Linguistic Typology 9(1), 2005.
FOOTNOTES
(1)^ Another example is the Metric Conversion Act, passed by the U.S. Congress in 1975 to replace the English measurement system. Yet even after the target year of 2015 for full implementation, the effort seems to have faded into the void. The point here is metaphorical: people are, in a sense, born with numerical systems already encoded in their being. Once ingrained, such systems become second nature, resistant to deliberate change.
(2)^ Vietnamese speakers, when counting within the ten‑based system, often perceive numbers in paired relations, as if seeking balance and equilibrium at a deeper cognitive level. For instance, with 'cặp' 雙 shuāng ‘double’ and 'đôi' 對 duì ‘pair’, we find expressions such as 'cócặp' 有雙 yǒu shuāng ‘in pairs’ and 'songđôi' 雙對 shuāng duì ‘paired’. Similarly, 'lưỡng' 兩 liǎng ‘couple’ appears in the variant 'lứađôi' 咱倆 zá liǎng ‘the two of us’. Compound forms extend this logic: "đôimươi" (對 duì + 十 shí = 20), "đôitám" (對 duì + 八 bā = 16). The notion of wholeness is expressed in 'chẳn' 整 zhěng and 'sốchẳn' 整數 zhěng shù ‘even numbers’.
Beyond structural pairing, certain numbers carry auspicious associations, reflecting the same mindset as in Chinese tradition. The numbers 6, 8, 9, and 10 are considered lucky: 6 'lộc' 祿 lù ‘blessings’, 8 'phát' 發 fā ‘prosperity’, 9 'cửu' 久 jiǔ ‘everlasting’. 10 'thập' 十 shí (VS 'chục') ‘wholeness’, as in 'thậptoàn' 十全 shí quán ‘completeness’ and 'thậpmỹ' 十美 shí měi ‘perfection’.
By contrast, 4 is viewed as inauspicious because of their phonetic resemblance to 'chết' (death) SV 'tử' 死 sǐ. This entire framework of symbolic numerology stands in sharp contrast to the Mon‑Khmer 'pram' five‑based system, which reflects a different cognitive orientation.
For reference, samples of modern Khmer numerals from 6 to 9 and counting on illustrate this divergence.
- 6: pram-mùəy (five plus one)
- 7: pram-pì:(r) (five plus two)
- 8: pram-bɤy (five plus three)
- 9: pram-buən (five plus four)
- 18: dɔp-pram-bɤy (ten and five plus three)
- 25: mphɯy-pram (twenty plus five)
- 56: ha:sɤp-pram-mùəy (fifty plus five plus one)
and some alternative forms which certainly have nothing to relate to those in Vietnamese, neither do those Middle or Old Khmer.