Saturday, November 8, 2025

Comparative and Etymological Challenges in Vietnamese

From Ancient Roots to Modern Identity

by dchph



This study synthesizes competing linguistic and historical perspectives on the origins of This study synthesizes competing linguistic and historical perspectives on the origins of Vietnamese, reconciling evidence from the Yue substratum with successive Sinitic overlays. By harmonizing divergent theories, it reframes Vietnamese as a hybrid language, rooted in deep indigenous foundations yet profoundly shaped by centuries of Sinicization, offering a model for comparative inquiry into language contact and identity formation.

Bringing together rival accounts of Vietnamese linguistic history, the analysis traces the interplay of Yue foundations, Chinese influence, and modern reform. It demonstrates how ancient borrowings and indigenous survivals converge to produce a language that is at once uniquely Vietnamese and intimately connected to its regional neighbors.

I) Sinitic dominance in Vietnamese

Vietnamese emerges as a language born of encounters: Yue voices, Chinese courts, colonial scripts, and modern reform. This work reconciles these strands into a single narrative, showing how ancient roots continue to resonate in the rhythms of modern identity.

Comparative Sino‑Tibetan etymologies suggest that the diachronic evolution of modern Vietnamese mirrors a trajectory in which early Southern Yue populations established autonomous polities across southern China prior to the consolidation of Han imperial authority. Yet it lies beyond the scope of Sino‑Tibetan linguistics to classify Vietnamese as a member of that family, whether by subsuming it under the Sinitic branch or by drawing analogies to Cantonese or other Chinese lects. Such a classification requires a broader base of etymological evidence and a more rigorous comparative framework.

Lexical recycling has persisted into the modern era, as seen in the transregional circulation of terms such as cộnghoà 共和 (gònghé, 'republic') and dânchủ 民主 (mínzhǔ, 'democratic'). Originating as Japanese neologisms constructed from Chinese morphemes, these items were re‑borrowed into Chinese and eventually permeated Vietnamese usage. Their trajectory exemplifies the ongoing exchange of linguistic material across Sinitic, Japonic, and Vietic domains.

In parallel, the process of localization has transformed many Sino‑Vietnamese lexemes into fully nativized Sinitic‑Vietnamese forms. In such cases, original senses are not always preserved, a phenomenon more pronounced in Japanese Kanji than in Sino‑Vietnamese. For example, lịchsự ("polite") derives from 歴事 lìshì ("experience"), and tửtế ("kind") from 仔細 zǐxì ("meticulously"). Beyond these adaptations, Vietnamese also developed its own stock of self‑coined lexicon built from Chinese glyphs.


Table 1 - Self-coined Sino-Vietnamese compounds

Some compounds are uniquely Vietnamese and absent from Chinese usage, such as linhmục ("priest",  from "soul" + "shepherd"), or giảkimthuật ("art of artificial metal"), popularly applied to "alchemy".  Others, like linhcẩu ("alert dog", meaning "hyena"), are semantic innovations. Still others have fallen out of use in modern Chinese or diverged in meaning.


Definition Chinese characters
Vietnamese
city 城鋪 thànhphố
week 旬禮 tuầnlễ
presence 現面 hiệndiện
entertainment 解智 giảitrí
lack 少寸 thiếuthốn
proud 倖面 hãnhdiện
pleasant to the eyes 玩目 ngoạnmục
orderly, proper; honest, upright 眞方 chânphương
(polite, respectful) you 貴位 quývị
traditional 古傳 cổtruyền
festival 禮會 lễhội
legend 玄話 huyềnthoại
satisfy 妥滿 thoảmãn
polite 歷事 lịchsự
important; significant 關重 quantrọng
millionaire 兆富 triệuphú
billionaire 秭富 tỷphú
thermometer 熱計 nhiệtkế
(mathematics) matrix 魔陣 matrận
biology 生學 sinhhọc
subject 門學 mônhọc
average 中平 trungbình
cosmetics 美品 mỹphẩm
surgery 剖術 phẫuthuật
allergy 異應 dịứng
hearing-impaired 欠聽 khiếmthính
bacteria; microbe; germ 微蟲 vitrùng
to update 及日 cậpnhật
data; information 與料 dữliệu
forum 演壇 diễnđàn
a smoothie (drink) 生素 sinhtố
dojo; martial art school 武堂 võđường
cemetery 義地 nghĩađịa
a surgical mask 口裝 khẩutrang
thermometer 熱計 nhiệtkế
television (medium) 傳形 truyềnhình
broadcast 發聲 phátthanh
animation 活形 hoạthình
subtitles 附題 phụđề
transliterate 翻音 phiênâm
transcribe 轉字 chuyểntự
visa 視實 thịthực
nurse 醫佐 ytá
artist, musician, actor, comedian, 藝士 nghệsĩ
singer 歌士 casĩ
musician, songwriter, composer 樂士 nhạcsĩ
poet 詩士 thisĩ
dentist 牙士 nhasĩ
artist, painter 畫士 hoạsĩ
parliament member, congressman 議士 nghịsĩ
prison, concentration camp 寨監 trạigiam
victim 難人 nạnnhân
special forces 特攻 đặccông
farm 莊寨 trangtrại


Contrary to modern assumptions, Vietnamese is best characterized as Sinitic‑dominant in a way that Japanese or Korean are not. It inherits a dense Middle Chinese lexicon, and many items classified as Sino‑Vietnamese overlap with Sinitic‑Vietnamese through integration into everyday speech alongside native vocabulary.

A case in point is the etymon (shùn, SV thuận), which displays context‑dependent variation across compounds:

  • 順利 (shùnlì, VS suônsẻ)
  • 孝順 (xiàoshùn, VS hiếuthảo)
  • 順便 (shùnbiàn, VS sẵntiện)
  • 逆順 (níshùn, VS ngượcxuôi)

Such examples illustrate how Sinitic forms, once literary, became embedded in colloquial Vietnamese, blurring the boundary between learned vocabulary and vernacular usage.

Table 2 - A case study of Sinitic-Vietnamese neologism formed with Chinese lexemes


The Vietnamese term 'côngcuộc'–now familiar in modern discourse as a formal compound meaning 'cause', 'process', or 'undertaking'–is a persistent source of lexical confusion and scholarly intrigue. While often misinterpreted as a Sino-Vietnamese compound mapping straight onto Chinese 公局 or 工局 (Mandarin gōngjú 'public bureau', 'work office'), its correct etymological genesis instead lies in 工作 (gōngzuò, 'task', 'work'), with the element 'cuộc' emerging not from 局 (jú) but from 作 (zuò). The fact that 'cuộc' in Vietnamese phonologically and semantically diverges from both its Sino-Vietnamese dictionary reading (tác) and its expected Mandarin reflex (zuò) reflects a network of historical sound change, sandhi assimilation, and semantic-phonetic association–processes that collectively illuminate the complex history of Chinese lexical influence in Vietnam.

The Vietnamese word 'côngcuộc' functions in modern written and spoken Vietnamese to denote a significant collective undertaking–'project', 'cause', 'the course of'–especially in governmental or historical phrasing (e.g., "côngcuộc khángchiến" 'resistance war', "côngcuộc đổimới" 'the undertaking of renovation/reform'). It is a compound of 'công' (from 工 'work; labor') and 'cuộc'.

The confusion with 公局 or 工局 is understandable, as both 公 and 工 read 'công' in Sino-Vietnamese, and 局 (SV: cục) is a common bound morpheme for official entities. However, 'côngcuộc' is a modern compound built on the model of Chinese 工作 (gōngzuò), but adapted phonetically and semantically within the Vietnamese system. While 'công tác' is the canonical Sino-Vietnamese reading for 工作, 'côngcuộc' emerged as a neologism where 'cuộc' operates as a native or nativized reflex of 作, rather than 局.

The emergence of Sino-Vietnamese compounds such as 'côngcuộc' reflects longstanding processes of borrowing and semantic adaptation widespread across the Sinosphere, i.e., Japan, Korea, and Vietnam, collectively referred to as the 'Sino-Xenic' realm. In these contexts, new words for modern concepts were often coined using Chinese morphemes and then mapped phonologically into the target language in a regularized, but sometimes innovative, fashion.

Middle Chinese, as represented in rime dictionaries such as Qieyun (7th century), had a richly articulated syllable template. For the character 作 (Mandarin: zuò), used in 工作 (gōngzuò), the reconstructed MC pronunciation is commonly given as */tsak/ or /tsak-s/, with the following features: initial: ts- (voiceless alveolar affricate), vowel and medial: /a/ as nucleus, sometimes with a palatal medial in some dialects, final: -k (voiceless velar stop), a classic 'entering tone' coda., and one: entering (rusheng), which has phonological and tonal correlates in Sino-Vietnamese readings for 作 are systematically 'tác', tracing the regular sound correspondences established for Chinese readings in Vietnamese. Key observations:
  1. The initial [ts-] to [k-] shift is irregular (i.e., not predicted by the regular SV correspondence), suggesting non-Sino-Vietnamese, perhaps colloquial or nativized, development.

  2. Labiovelar final [‑əwkpʔ] is robustly preserved in 'cuộc', with the final -k and medial -w- (from /ua/ or /uə/) mapping closely to MC -ak, and aligning phonotactically with native Vietnamese coda structure.

  3. The resultant tone is nặng [˧ˀ˩ʔ], consistent with the entering (rùshēng) tone category linked to -k finals in Han-Viet transmission.

Semantic-phonetic association: 'cuộc' vs. 'cục' and the shadow of 局, the homophony and semantic overlap

One reason for the widespread misreading of 'côngcuộc' as 公局 or 工局 is the phonological and structural near-identity between 'cuộc' and 'cục' (局):
  • 'cục' SV: cục, Mandarin jú, MC *kɨwk; used for administrative, governmental, and physical 'units' or 'offices'

  • 'cuộc', derived via the above pathway from 作, but due to similar form and function, is often reanalyzed by speakers and writers as rooted in 局, especially in compounds

The confusion is exacerbated by the convergence of rimes and finals, both 'cục' /kʊkpʔ/ and 'cuộc' /kəwkpʔ/ conforming to the [k•w•k•p̚] structure, with heavy final closure and possible central or back rounded vowels.

Semantic blending in compound formation: semantic overlap also drives this folk association

In both Sinitic and Vietnamese, compounds involving 工作 (work), 局 (office), and 作 (to do/make) are semantically related to tasks, operations, or affairs, domains where 'cuộc' has come to be used.

For example, in classical Chinese, 局 (jú) denoted physical bureaus ('bureaus', 'games') and by extension 'affairs' or 'situations' and 作 (zuò) in compounds implied the 'doing', 'working, or 'citing upon' something: matching the function of 'cuộc' in in syntagms such as "côngcuộc vậnđộng" 'the campaign task'.

Consequently, the phonetic resemblance between 'cuộc' and 'cục' enables semantic-phonetic association (lexical contamination or 'folk etymology'), especially when context or classical literacy is limited.

This phenomenon is hereby called 'sandhi assimilation' or 'assimilative association'; it is recurrent in the realm of Sinitic-Vietnamese.

Conclusion - The analysis of 'côngcuộc', especially the sound change underlying 'cuộc', is a case study in the stratification, innovation, and reanalysis inherent to Sinitic-Vietnamese contact linguistics. Through the transformation of Middle Chinese *tsak to Vietnamese 'cuộc', we witness the interplay of phonological adaptation, semantic reinterpretation, and structural assimilation:

 ▪ The initial [ts-] > [k-] shift, though irregular, is emblematic of colloquial nativization and possibly dialectal borrowing 

 ▪ The preservation of labiovelar coda [-əwkpʔ] aligns with Vietnamese phonotactics, fostering both the formation of new compound morphemes and confusion with native terms like 'cục' 

 ▪ The importance of sandhi, compound formation, and semantic blending means the etymological and structural boundaries between Sinitic and native vocabulary are porous. 

Comparative evidence across Sino-Xenic languages highlights both shared roots and Vietnamese-specific pathways. While 'côngcuộc' initially traces to 工作, its contemporary form and meaning exemplify Vietnam's creative synthesis of linguistic inheritance, local adaptation, and ongoing lexical renewal.



In discussing the etymology of Sinitic‑Vietnamese words, the author restricts analysis to references locally influential within the Sinitic framework for comparative purposes. In practice, this means focusing on etyma that occur concurrently in both Chinese and Vietnamese, including foreign words that entered Vietnamese through a Chinese intermediary.

Examples illustrate this filtering:

  • Vietnamese mắt "eye", rendered as 目 mù (SV mục) in Chinese, may connect to Malay mata.
  • Vietnamese gạo "rice", represented by 稻 dào, aligns with Thai /gaw/.
  • Other foreign‑derived items include SV kỹsư (技師 jìshī "engineer"), borrowed from Japanese gishi–contrasting with the modern Chinese sense "technician"; bệnhviện (病院 bìngyuàn "hospital"), also from Japanese usage; ưumặc (幽默 yōumò "humor"); câulạcbộ (俱樂部 jùlèbù "club"); and country names such as Anh (英 Yīn "England"), Mỹ (美 Měi "America") from English, Pháp (法 Fǎ "France") from French, and Đức (德 Dé "Germany") from German Deutsche.

Sound‑change patterns observed in core vocabulary across Chinese and Vietnamese suggest the preservation of substratal residues from an earlier Yue linguistic layer. These exchanges demonstrably predate the Qin–Han expansion into southern China (206 B.C.-220 A.D.). Numerous lexical items from this substratum are securely attested in the Kangxi Dictionary 康熙字典, the Qing‑era compendium commissioned by Emperor Kangxi, underscoring their deep historical entrenchment. (1)

From a linguistic standpoint, the predominance of Sinitic features in all over Vietnamese etyma–including tonality, morphological structure, phonological traits, and disyllabicity – has led many scholars to infer a Chinese origin. However, as phonological and semantic convergence increases, so too does the likelihood of borrowing. This is especially evident in Tai-Kadai languages, and most prominently within the Tai-Kam-Sui subgroup, where nearly all lexical items appear to derive from Chinese sources (cf. Comparative Sino-Tibetan Etymologies).

Vietnamese gạo "hulled rice" is often compared to Thai /kao/, while nếp "sticky rice" aligns with Thai /nɛp/ and Lao /nèep/. These correspondences parallel Chinese dào (SV đạo) and nuò (SV nọ), both of which are themselves loanwords in Chinese.

By contrast, Vietnamese lúa "paddy rice" appears to be a native Yue‑Taic term, corresponding to Lao /lua/ and Zhuang /luə/, with no direct Chinese cognate. This challenges A. Starostin’s claim (1953-2005) that lúa reflects an archaic Chinese loanword derived from dào, reconstructed as [ lhu:ʔ < Protoform ly:wH ] with meanings such as "rice",  "grain", and "paddy". Starostin’s broader comparative framework extends to Burmese /luh/ ("a grain species," Panicum paspalum), Kachin /c^əkhrau1/ ("paddy ready for husking"), and Kiranti lV ("millet"), which he interprets as part of a native Chinese semantic field. Yet this interpretation requires careful distinction between inherited Yue‑Taic forms where Vietnamese shares syntactic word order, and later Sinitic overlays.

A similar caution applies to items traditionally assigned to the Austroasiatic Mon‑Khmer layer. Where phonological correspondences fail to match established sound‑change patterns or semantic alignments, such items are more plausibly explained as intergroup loanwords, facilitated by geographic proximity and prolonged contact.


Table 3 - Glyph origins and etymological convergence: 來 and 麥, and the case of Vietnamese "lúa" and "lại"

    The character 來, now widely interpreted as 'to come', originated as a pictogram (象形) depicting wheat. Its ancestral forms include 麥 (OC *mrɯːɡ, 'wheat') and 麳 (OC *rɯː, 'wheat'). In early script forms, the central vertical line represented the ear of wheat, flanked by upward strokes for leaves and downward strokes for stem and roots. An additional horizontal line was often added at the top, possibly to emphasize the ear. Compare 禾, which shares structural parallels.

    This glyph was borrowed for the meaning 'to come' as early as the oracle bone script. During the Western Zhou and Warring States periods, semantic components such as 止 ('foot') and 辵 ('walk') were appended to distinguish the original agricultural sense from the emerging verbal usage. These additions, however, were not retained in later script traditions. Some scholars interpret the derivative 麥, formed by adding 夊 ('to walk slowly'), as the original glyph for 'to come'. If so, the meanings of 來 and 麥 may have interchanged due to the dominant use of 來 in verbal contexts.

    Shuowen connects the semantic domains of 'wheat' and 'arrival' mythologically: 天所來也 ('it comes from the heavens'). This interpretation may be supported by archaeological evidence suggesting that wheat was not indigenous to China, but introduced from the Heavens.

    Phonologically, both 來 and 麥 have been reconstructed with initial *mr- in Old Chinese. In 來, the liquid onset /l/ is retained, while 麥 preserves the nasal /m/. Etymologically, 來 derives from Proto-Sino-Tibetan *la-j ~ *ra ('to come') (STEDT), and is cognate with:

  • 迨 (OC *l'ɯːʔ, 'reach; until')
  • 賚 (OC *rɯːs, 'bestow')
  • 蒞 (OC *rɯbs, 'arrive') – Schuessler (2007)
  • Burmese လာ (la, 'come')
  • Proto-Vietic *laːjʔ

    The Vietnamese reflex "lai" (SV lai) is possibly related to Chinese 來 (MC lʌi, ləj 'to come; to arrive'). 

    Baxter–Sagart (2014) note that 來 shows irregular development, possibly due to the loss of final *-k in an unstressed form that was later restressed:

來 *mə.rˤək > *mə.rˤə > *rˤə > loj > lái 'come'

    This trajectory, however, does not fully explain the irregular presence of final -ʔ (nặng tone) in Vietnamese. If we posit an intermediate stage where *-k > *-ʔ occurred and was subsequently lost, allowing for borrowing into Vietic during that window, the tone could be accounted for. Yet the Vietnamese form lacks expected traces of *-rˤ- (e.g., ‹r› or ‹s›), suggesting a late loan, after *r(ˤ) > l had already occurred. This raises further questions about tonal interpretation and phonological alignment.

    For comparative reference, Zhuang (or Nùng) /lai/ aligns with Proto-Tai *ʰlaːjᴬ ('many; much') [ cf. Vietnamese "lắm" ], itself derived from Old Chinese 多 (OC *t.lˤaj). Cognates include:

  • Thai lǎai
  • Lao lāi
  • Lü l̇aay
  • Shan lǎay
  • Bouyei laail
  • Saek หล่าย
  • Jizhao laːi²¹

    These forms suggest a broader semantic and phonological network in which 來 participates, spanning Sino-Tibetan, Vietic, and Tai-Kadai domains.


II) Phonological Exceptions and Dialectal Comparison

This principle is not universal. In some cases, a single-morpheme syllable categorized as a "word" is governed primarily by phonological alternation while showing additional features beyond tonality that do not neatly fit established patterns. For example, tỏi suàn (SV toán, 'garlic') exhibits alternation /s- ~ t-/ and /-n ~ -i/, whereas chua suān (SV toan, 'sour') does not follow the same pattern. Nonetheless, such items are still classified as loanwords based on overall affinity.

Consider ér, which corresponds to SV nhi and yields VS nhỏ ('child'), VS nhí ('baby'), and nhínhảnh (with nhảnh as a reduplicative morphemic syllable conveying 'childish', analogous to English "-ish"), as opposed to nhỏ (SV nhụ, 'young'). We may thus conclude that the etymon nhi entered via Middle Chinese and that its cited derivatives are all Chinese loanwords.

In comparison with other southern Sinitic dialects, and contrary to common assumption, Vietnamese–beyond sharing a similarly broad tonal range (up to nine tones)–aligns more closely with Mandarin than with Cantonese, Min Nan, or Wu varieties, particularly in the lexical domain. Only a small number of indigenous Cantonese words have cognates in thuầnViệt ('basic native Vietnamese'), such as:

  • sik6 → xơi ('eat')
  • jam2 → uống ('drink')
  • gai1 →  ('chicken')

By contrast, rarer Cantonese forms lack direct Vietnamese matches, for example:

  • fajng1kao1 ('sleep') ≠ M 卧 wò, corresponding to SV ngoạ → VS ngủ
  • pin5tow2 ('where') ≠ M 哪裏 nǎlǐ, corresponding to SV nalínơinào
  • tzuo3 ('already') ≠ M 了 liǎo, corresponding to SV liễurồi

      The Cantonese-speaking population traces its historical roots to a substantial portion of the ancient NamViệt Kingdom (204-111 BCE). Its capital, Phiênngung 番禺 (present-day Fanyu district, Guangzhou), was established under the rule of the founding monarch Triệu Đà 趙佗 (Zhao Tuo) and maintained by his dynastic successors. This southern polity marked the frontier zone of early Sinitic expansion, where indigenous Yue traditions intersected with the advancing Han cultural and linguistic sphere. (2).

      III) Divergence of Cantonese and Vietnamese after NamViệt

      Following the annexation of NamViệt into the Middle Kingdom (中國), the ongoing process of Sinicization intensified. This catalyzed the divergence of Cantonese and Vietnamese into distinct linguistic and cultural entities. Each followed separate historical trajectories, with only Annam ultimately achieving independence from Chinese rule in 939 CE.

      The genetic and cultural composition of modern Cantonese speakers differs markedly from that of their pre‑Han ancestors and from populations inhabiting the region up to the tenth century. It is plausible that some kin groups migrated southward into Annamese territories, a phenomenon repeated across centuries of intertwined regional histories. In China, such migrations often occurred in response to famine, repression, or political upheaval. Similarly, ancient Annamese populations moved further south to evade imperial reach.

      By the time these migrations occurred, settlers in new regions would have encountered populations not vastly different from themselves, especially under shared or adjacent statehoods. The border between China and Vietnam remained relatively permeable throughout history, facilitating such movements until its closure in 1949 under Maoist rule.

      Had Annam remained under Chinese dominion into the present, its national trajectory might have mirrored that of NamViệt (Cantonese: NamJyut6), now subsumed within Guangdong Province. Historically, Guangdong produced millions of emigrants who dispersed globally, including to Annam and other Southeast Asian polities. Conversely, had the greater Canton region achieved statehood akin to Annam's, it might have retained linguistic sovereignty. Its language, like Annamese, could have preserved distinct typological features, prompting reevaluation of its classification within the Sino‑Tibetan family. Similar speculation applies to Fukienese (Hokkien) and Hainanese.

      Modern Cantonese descendants, now fully Sinicized, can only access their pre‑Han heritage through archaeological vestiges such as the mausoleums of NamViệt kings in present‑day Guangzhou. The orthography of NamViệt may be rendered phonetically where appropriate to reflect its historical pronunciation.

      The immersive Sinicization of the Canton region profoundly shaped its linguistic identity. Cantonese, as a Sinicized Yue language, stands in contrast to Vietnamese, a distinction rooted in their respective historical paths. Cantonese remained within China from 111 BCE onward, while Vietnam extricated itself from Chinese rule in 939 CE. This divergence is foundational to Vietnam’s national identity.

      During the Ming Dynasty’s 25‑year occupation of Vietnam in the fifteenth century, Chinese influence left indelible marks. A particularly devastating episode occurred when Ming forces destroyed Vietnam’s entire written library (Nguyễn Tài Cẩn, 1998). Over centuries, Vietnam navigated a complex sovereignty, alternating between vassalage and independence, adapting to the shifting power dynamics of its northern neighbor. Even after more than a millennium since the end of China's 1,004‑year colonial rule, this balancing act remains central to Vietnam’s historical narrative.

      Despite their shared Yue ancestry, Vietnamese speakers often express nostalgia for their Yue heritage, whereas many Cantonese speakers remain unaware of or indifferent to their Yue origins. The Cantonese model is instructive: the Sinicization of Yue subjects in NamViệt deeply influenced the ethnic and linguistic evolution of the ancient Yue. Records of Canton’s OuYue (甌越) exhibit striking parallels to Annam’s LuoYue (雒越). The Han colonization extended into the Sông Hồng Basin (Red River Delta), which became part of southwestern NamViệt following the conquest of 111 BCE.

      Han imperial policies left enduring Sinitic imprints on the emerging Yue languages, which over centuries evolved into Cantonese and Vietnamese. While these languages share notable features, they are not linguistically bonded as kin. This is evident in the limited number of newly identified Sinitic‑Vietnamese etyma with shared ancestral roots. For instance, the legend of the Magic Sword, which recounts the shared ancestry of the Zhuang and Vietnamese peoples–once self‑identified by the same ethnonym–underscores their connection to ancient Cantonese traditions. (3).

      The Chinese affiliation of Sino‑Vietnamese etyma and Sinitic vocabulary in Cantonese is unequivocal. This is attested by their shared reliance on Middle Chinese variants and phonological commonalities, including tonal systems (eight tones in Vietnamese versus nine in Cantonese) and final consonants (‑m, ‑p, ‑t, ‑k).

      Among Sino‑Vietnamese lexemes derived from Middle Chinese, one of the most debated cases involves the naming of the duodenary zodiac system. This system reveals substratal pathways in Sinitic‑Vietnamese terminology that trace back to ancient Yue, passing through Old Chinese before entering Vietnamese. These forms are conspicuously absent in Cantonese, likely due to its deeper Sinicization. Cultural elements such as the twelve‑animal cycle, shared among Chinese, southern minorities, Vietnamese, and Mon‑Khmer groups, exemplify this substratal retention.

      For example, the Year of the Horse (馬年) in 2014 was also referred to as Jiawu Year (甲午年, Jiǎwǔ Nián) or Năm GiápNgọ in Vietnamese. Here, Ngọ (午), an ancient Yue loanword for "horse" (contrasting with native ngựa), exemplifies Yue heritage. Although terms like Jiawu Year may sound foreign to modern Chinese ears, they remained current until the early twentieth century. A notable instance is the Xinhai Revolution (辛亥 革命) of 1911, which overthrew the Qing Dynasty. The year (hài), signifying "pig", is another Yue loanword: in Sino‑Vietnamese it appears as hợi, while in Sinitic‑Vietnamese it is heo. Thus, 1911 is remembered as the Xinhai Year or "Year of the Boar" (Boltz 1991).

      Another case is mẹo, an older Sinitic‑Vietnamese reflex of (M máo), later reintroduced as Sino‑Vietnamese mão. In Vietnamese tradition, denotes the fourth zodiac position, but unlike Chinese usage where it corresponds to (tù, SV thố, VS thỏ"hare"), Vietnamese associates it with mèo ("cat"). Thus, while Chinese marks 兔年 (Tùnián, "Year of the Hare"), Vietnamese calls the same year 卯年 (M Máonián), rendered SV Mãoniên, VS nămMão, nămMẹo, or colloquially nămMèo. This divergence shows that the Vietnamese "Year of the Cat" is not a reinterpretation of the Chinese "Year of the Hare" but a retention of an older Yue association.

      A parallel case involves (M wèi) and Vietnamese (/ze1/, "goat"). The original southern concept of as "goat" was later supplanted by northern terms for "ram" or "sheep" (VS cừu jié, SV kiết; yú, SV du), even though (yáng) still denotes "goat" in many southern lects. This semantic shift reflects northern influence, where was associated with "sheep" or "lamb" (羔 gāo, VS cừu). Crucially, should be understood as "goat", corresponding to SV dương (羊 yáng) and VS (/ze1/). This pronunciation aligns with southern Sinitic varieties such as Teochew (/jẽw1/), Amoy (/jũ1/), and Hainanese (/jew1/), all meaning "goat". The compound 山羊 (shānyáng, VS dênúi"mountain goat") reinforces this reading.

      It is plausible that descends from an ancient Yue form approximating /ze1/ or /je1/, entering Chinese through its integration into the zodiac system. In this context, may have transcribed a foreign term for "goat," replacing 羊, which northern cultures more commonly associated with "sheep." The Sinitic‑Vietnamese (/je1/) thus preserves a substratal pronunciation diverging from Mandarin /wèi/.

      Middle Chinese pronunciations of varied considerably–/mwe̯i/, /mĭwəi/, /miuəi/, /mʉi/, /mʷɨi/, /muj/–and eventually bifurcated into SV vị (/vjej6/, VS southern /zjej6/, "upcoming") and SV mùi (/mʷɨi2/, "goat"). The phonological shift from /v-/ to /j-/ or /z-/ in VS suggests a southern borrowing, possibly mediated through an intermediate /wj-/ stage. In this scenario, Mandarin wèi may represent a back‑loan from Old Chinese */mɯds/, as noted in 《說文》: 未, 味也.

      The character thus bifurcates semantically and phonetically into SV vị ('not yet', 'future'), as in vợchưacưới (未婚妻 wèihūnqī, vịhônthê), and SV mùi ("goat"), as in NămẤtMùi (乙未年 YǐWèiNián, "Year of the Goat"). It is plausible that was introduced by Yue‑speaking populations of NamViệt or Annam prior to the Old Chinese period. While neither ancient Chinese nor Vietnamese possessed a native /v-/ onset, southern dialects likely preserved a form closer to /jej/ or /zjej/.

      Further complicating the etymology, Vietnamese may also be a doublet cognate of (yáng), reflected in VS and SV dương (/jɨəŋ1/), and paralleled in Teochew yeo (/jẽw1/), all denoting "goat". These forms reinforce the hypothesis that Vietnamese retains a substratal lexical layer distinct from northern Sinitic developments.

      In zodiac reckoning, years such as 1955, 2015, and 2075–formally designated in Vietnamese as NămẤtMùi (乙未年 YǐWèinián) – are now more commonly referred to in mainland Chinese usage as 羊年 (Yángnián, "Year of the Goat", VS nămDê). Notably, younger Chinese speakers often do not recognize the calendrical significance of 乙未年, whereas Vietnamese youth remain familiar with both NămẤtMùi and nămDê. This is reflected in expressions such as 乙未年 (Wǒde shēng YǐWèinián; Tôi sanh NămẤtMùi) and 生肖 屬羊 (Wǒde shēngxiào shǔyáng; Tôi cầmtinh conDê), or simply (Wǒ shǔ yáng; Tôi tuổi Dê).

      This cultural continuity supports the hypothesis that originated as a Yue loanword, plausibly reconstructed as /zẽ/ or /jẽ/, distinct from 羊. The semantic and phonological interplay between and is further illustrated in (měi, SV mỹ /mej4/, "beautiful"), where placed over (huǒ, "fire") metaphorically conveys "beautiful taste". The etymological links between and 未, particularly through SV mùi (/mʷɨi2/, "goat"), reinforce their shared heritage and suggest that Vietnamese preserves substratal lexical and symbolic associations diverging from later northern reinterpretations. (4) 

      These two zodiac cases 卯 and 未 have broader implications for Sino-Tibetan comparative work. Further analysis could examine Vietnamese cognates such as SV "ngọ" (VS "ngựa", 午 'wǔ', 'horse') and SV "sửu" (VS "trâu", 丑 'chǒu' < MC ʈʰuw < OC *n̥ʰuʔ, 'buffalo'). Additional parallels include:

              Vietnamese     Gloss        Old Tibetan Note
                  cẳng     foot        rkań     Phonological alignment
                  mắt     eye        mig      Semantic stability
                  sông     river        kluń      Cf. Viet-Muong */krong/
                  bò     cow        ba      Lexical continuity

      Such correspondences suggest that these terms may have existed in proto-Vietic or evolved independently before later Sinitic influence. They open new avenues for exploring Vietnamese affiliations within the Sino-Tibetan family, as will be illustrated in later sections using Shafer's comparative wordlists (1966-1974) (5)

      Table 5 - The case of "the Year of the Cat"

      According to Nguyễn Cung Thông, the connection between MãoMẹo, and mèo is quite straightforward: these sounds all belong to the "low-pitched" tonal category and share the vowel e (as in Mẹo and mèo), which is an older form compared to the vowel a (as in Mão). Examples in VS/SV correspondences include /hạxe/xakeo/giaovẽ/hoạ/machè/tràbeo/báo, etc. The confusion between cats and rabbits in Chinese culture is evident in the case of Thốtôn (兔猻), a type of wildcat that is gradually disappearing. This animal, found in Central Asia, Siberia, Kashmir, Nepal, Qinghai, Inner Mongolia, Hebei, Sichuan, Tibet, and Xinjiang, is also known as Xálịtôn (猞猁孫) or Steppe cat in English, and it typically inhabits desert regions.

      When the Han people expanded southward and westward, the phenomenon of "mistaking cats for rabbits" (similar to the Vietnamese idiom "mistaking a chicken for a quail") became apparent, as seen in the naming of thốtôn. This confusion partly explains why the fourth Earthly Branch (MãoMẹo) is associated with cats rather than rabbits in its original context. Thốtôn (兔猻) is also referred to as dươngxálị (洋猞猁), ôluân (烏倫), mãnão (瑪瑙), or mã nãotặc (瑪瑙勒). The term xálị (猞猁) refers to a type of wildcat (lynx). The Sino-Vietnamese word miêu (貓) means "cat," but in ancient Chinese, miêu referred to a type of hairless tiger rather than a domestic cat. This evidence supports the idea that Mão (卯) was a phonetic transcription of a foreign word (likely an ancient Vietnamese term) that entered the Chinese language.

      The definition of miêu in the Erya (Nhĩnhã) states: "A tiger with sparse fur is called 虦貓 (sạnmiêu)." According to the Ngọc Thiên dictionary, sạn/sàn (虦) also refers to a cat. The character  (a rare variant written as 虥) denotes a striped wildcat. Meanwhile, Thố/thỏ (鵵) in its ancient sense referred to a type of bird, and mãn (梚, a rare character) referred to a type of tree in ancient Chinese texts. In the Hakka dialect, thỏ is pronounced t'u2 (similar to thổ), which contrasts with the pronunciations of mãn (cat) and thố/thỏ.

      To understand why the Vietnamese associate cats with the Earthly Branch Mão (卯), one common explanation in Chinese sources is that the sound of Mão when adopted into Vietnamese resembled mèo or miêu (Sino-Vietnamese for "cat"). Thus, the Vietnamese used cats as the symbol for this branch instead of rabbits. If mèo sounded similar to Mão and was used as the symbolic animal for this branch, it is difficult to explain why nga (wild goose or seabird), which is closely associated with Vietnamese life (fishing, coastal living), and whose ancient pronunciation ngwa resembles Ngọ (午), was not chosen as the symbol for the Earthly Branch Ngọ. Similarly, the ancient pronunciation of Mùi (未) for the eighth branch is closer to muỗi (mosquito), yet the Vietnamese chose goats instead of mosquitoes. There are many other such phonetic parallels.

      Although the Nôm script is relatively "young" for analyzing the phonetic connections of the 12 zodiac animals, some notable points include the use of mèo (and meo) with the Sino-Vietnamese character miêu (貓), as seen in Nguyễn Bỉnh Khiêm's Bạch Vân Thi tập (1491–1585): "Lẻo lẻo doành xanh con mắt mèo" ("Bright green eyes of the cat"). Meanwhile, méo in Nôm uses the character Mão (卯), sometimes with additional diacritical marks, as in Hồng Đức Quốc Âm Thi Tập (compiled by Lê Thánh Tông, 1442–1497): "Tròn tròn méo méo in đòi thuở" ("Round and round, distorted through time"). Thus, the distinction between Mão and mèo has existed since at least the Lê dynasty, and the likelihood of confusion between Mão (Middle Chinese pronunciation, reintroduced into Vietnam during the Tang-Song period) and mèo (ancient Vietnamese pronunciation) is minimal.

      The general and natural tendency of human writing systems evolves from concrete and simple to abstract. For example, animal names are often extended to more abstract meanings, such as "mouse face" (compared to "dragon face"), "ox-like body," "eating like a cat sniffing," or "snake-like temperament." Therefore, deriving mèo from Mão does not align with this natural tendency; rather, it is more logical for the concrete term mèo (animal) to give rise to the abstract term Mão (timekeeping system, divination). The system of naming specific animals (simple) familiar to farmers was integrated into Chinese culture and transformed into a system for recording time and divination (abstract, complex). This 12-zodiac system flourished as Chinese culture reached its peak (Qin, Han, Tang, Song dynasties) and influenced surrounding regions, including Vietnam. This phenomenon of "reverse borrowing" is often overlooked in Vietnam's case.

      In reality, Vietnamese people do not need to overanalyze the natural connection between MãoMẹo, and mèo, just as they do not question the links between  (mouse), Ngọ (horse), Hợi (pig), or Sửu (ox). Unlike Chinese culture, which uses compound terms like Mão Thố (卯兔, "Rabbit of Mão"), Tý Thử (子鼠, "Mouse of Tý"), or Sửu Ngưu (丑牛, "Ox of Sửu") to emphasize these connections, Vietnamese culture inherently recognizes the associations between Mão and mèo and chuột, or Sửu and trâu.

      Source: Nguyễn Cung Thông"Nguồn gốc Việt (Nam) của tên 12 con giáp - Mão/Mẹo/mèo"

      Our revised hypothesis, as elaborated etymologically above, is substantiated by Vietnamese etyma that exhibit direct cognacy with Sino-Tibetan roots. These etyma appear to descend from other Sino-Tibetan languages rather than through Chinese transmission. The frequency and consistency of such correspondences are too numerous to dismiss as coincidental. Consequently, we propose a novel linguistic classification: a distinct category termed Sinitic-Vietnamese. This classification may warrant equal footing with the Sinitic branch itself, given the historical precedence of Yue substrata over proto-Chinese, as previously discussed. Moreover, the Vietnamese fundamental words cited in Chapter 10 demonstrate clear cognate relationships with Sino-Tibetan etyma, lending further credence to this theorization.

      IV) Analytical framework and classification challenges

      Analytically, the etymological survey presented here integrates the historical perspective outlined above, examining linguistic development through both synchrony and diachrony. The methodology resembles capturing frames in a historical reel–allowing one to fast‑forward, rewind, zoom in, and zoom out to contextualize lexical evolution. Yet the chronological placement of certain etyma remains ambiguous.

      For instance, béo ("greasy") aligns with yóu, as in 油膩 yóunì (VS béongậy), illustrating the /y‑ ~ b‑/ pattern in Mandarin ~ Vietnam "because"), (yóu, VS bưởi, "pomelo"), and (yóu, VS bơi"swim")–all conforming to the Sinitic‑Vietnamese phonological contour. While such interchanges are plausible, identifying the latest sound splits depends on comparative methodologies introduced in later chapters.

      Given that all Vietnamese sister languages in "China South", including regional Chinese lects, are classified under the Sino‑Tibetan family, how has Vietnamese come to be categorized instead as a member of the Austroasiatic family, specifically the Mon‑Khmer subbranch? How does this classification reconcile with the Sino‑Tibetan and ancient Yue etymological evidence presented here?

      The challenge lies not in the data but in the mindset of those committed to inherited frameworks. Reevaluating Vietnamese classification requires confronting entrenched assumptions and acknowledging the complexity of its linguistic ancestry.

      Conclusion

      The evidence assembled in this study demonstrates that Vietnamese is neither simply Austroasiatic nor straightforwardly Sino‑Tibetan, but instead a language of dual inheritance. At the same time, centuries of Sinicization layered Middle Chinese lexicon, tonal systems, and phonological traits onto Vietnamese, producing the Sino‑Vietnamese and Sinitic‑Vietnamese strata that permeate everyday speech.

      The comparative analysis shows that Vietnamese diverged from Cantonese and other southern lects not because it lacked Sinitic influence, but because it followed a distinct historical trajectory after independence in 939 CE. Cantonese remained within China and became more deeply Sinicized, while Vietnamese balanced inherited Yue elements with successive overlays of Chinese, Japanese, and later European borrowings. This dynamic produced a language that is at once uniquely Vietnamese and intimately connected to its neighbors.

      The classification challenge lies not in the data but in entrenched frameworks. Vietnamese resists reduction to a single family label. It is best understood as a Sinitic‑Vietnamese category in its own right: a hybrid system born of contact, adaptation, and creative renewal. Recognizing this dual heritage reframes Vietnamese as a language of convergence where substratal Yue voices, Middle Chinese courts, colonial scripts, and modern reforms all resonate together.

      In this sense, Vietnamese offers more than a case study in etymology. It provides a model for how languages evolve through layered histories, how identity is negotiated across centuries of contact, and how cultural memory persists in the smallest syllables. The story of Vietnamese is thus the story of survival and synthesis: a language that carries the imprint of ancient Yue, the weight of Chinese empire, and the creativity of its own people.

      FOOTNOTES



      (1)Pig Terminology in Vietnamese and Its Yue Origins: For "pig", northern Vietnamese speakers use lợn (豚 tún, SV độn), whereas in the south, it is called heo (亥 hài, SV hợi). The latter is an archaic, authentic Yue term found in both Vietnamese and Chinese zodiac systems, where 亥年 Hàinián (VS NămHợi or NămHeo) corresponds to the "Year of the Boar". Meanwhile, lợn 豚 tún (SV độn), appearing in the Kangxi Dictionary, is more accurately a doublet of 豘 tún, which carries the same meaning.

      The key point to emphasize is that Yue linguistic elements predate Chinese ones, as 亥 hài was likely transcribed from an ancient Yue term for heo, both etymologically and culturally (See APPENDIX D, E, F, G)

      (2)NanYue (Chinese: 南越; pinyin: NánYuè; Cantonese Yale: Nàahm-yuht; Vietnamese: NamViệt) was an ancient kingdom encompassing parts of present-day Guangdong, Guangxi, and Yunnan in China, as well as northern Vietnam. Today, visitors can explore the magnificent ruins of mausoleums once built by the kings of NanYue, located in Guangzhou City, Guangdong Province, China. https://en.wikipedia.org/wiki/Nanyue

      (3)Shared Folktales Between Zhuang and Vietnamese Cultures: The Zhuang folktale of the Magic Sword and the Vietnamese legend of Trọng Thuỷ and Mỵ Châu narrate strikingly similar stories, both detailing the historical transition of ÂuLạc (歐雒) into the NamViệt Kingdom. (cf. Truyệncổ Dòng BáchViệt and https://vi.wikipedia.org/wiki/Mỵ_Châu.)

      (4)Goat and Its Linguistic Associations: The Chinese character 未 wèi can be transliterated as both Sino-Vietnamese vị ("upcoming") and SV mùi, as seen in Năm ẤtMùi 乙未年 Yǐwèinián ("Year of the Goat"). In Sinitic-Vietnamese,  (goat) is cognate with 羊 yáng (SV dương, VS dê), which aligns with Teochew /jẽ/, all denoting "goat." The zodiac name 羊年 Yángnián ("Year of the Goat") corresponds with Sinitic-Vietnamese NămDê.

              An important elaboration here is that 未 wèi originated as a loanword from the ancient Yue linguistic family, whereas 羊 yáng is a pictograph depicting the head of a goat or sheep. Linguistically, 未 wèi and 羊 yáng may be considered doublets, connected both semantically and phonetically. This relationship is exemplified in 美 měi (SV mỹ, "beautiful"), where 羊 yáng above 火 huǒ ("fire") metaphorically conveys "beautiful taste" or "deliciousness." Furthermore, 美 měi and 未 wèi (cf. mùi) exhibit phonetic and semantic connections.

         It is plausible that an early form of "" entered the Chinese language in dual forms for zodiac classification, possibly sounding similar to 未 (wèi) centuries before being reintroduced to the Yue populace of the NamViệt Kingdom or Annam.

      (5)A classic example of a Sinitic‑Vietnamese word is 江 jiāng (VS sông, "river"), an ancient loan from the Yue form /krong/. Similarly, 目 mù and VS mắt ("eye") may have originated from a shared ancestral root, likely tracing back to a pre‑Taic linguistic stratum in the distant prehistoric past.

      Other
      notable zodiacal examples illustrate the Sinitic‑Vietnamese layer:

      • 子鼠 Zǐshǔ → Týchuột (" rat")
      • 丑牛 Chǒuníu → Sửutrâu ("Sửu buffalo")
      • 寅虎 Yínhǔ → Dầncọp ("Dầtiger")
      • 卯貓 Mǎomāo → Mãomẹo ("Mão cat") [ not 卯兔 Mǎotù → Mão thỏ ("Mão rabbit") ]
      • 辰龍 Chénlóng → Thìnrồng ("Thìn dragon")
      • 巳蛇 Sìshé → Tỵrắn ("Tỵ snake")
      • 午馬 Wǔmǎ → nămNgọ ("Ngọ horse")
      • 未羊 Wèiyáng → Mùidê ("Mùi goat")
      • 申猴 Shēnhóu → Thânkhỉ ("Thân monkey")
      • 酉雞 Yǒujī → Dậugà ("Dậu chicken")
      • 戌狗 Xūgǒu → Tuấtchó ("Tuất dog")
      • 亥猪 Hàizhū → Hợitrư ("Hợi pig")

              These correspondences highlight how Vietnamese zodiac terminology preserves substratal Yue associations while simultaneously adopting Middle Chinese forms. The divergence between Mãomẹo ("Year of the Cat") and Chinese 卯兔 ("Year of the Hare") is especially significant, underscoring Vietnam’s retention of older Yue cultural symbolism.