Tuesday, April 8, 2025

Table of Contents - Chapters 1-14

 ABBREVIATIONS AND GLOSSARY

  1. Introduction
    1. Defining Sinitic-Vietnamese
    2. Historical roots and Yue influence
    3. Linguistic evolution through dynasties
    4. Comparative and etymological challenges
    5. Cultural integration and beyond
    6. Key contributions to linguistics
  2. Rainwash from the Austroasiatic sky
    1. The "Rainwash" effect
    2. Dispelling Austroasiatic Mon-Khmer misconceptions
    3. Reassessing Austroasiatic theories
    4. Archaeological evidence and Yue origins
    5. Historical and cultural context
    6. Exploring two new approaches
  3. Sinitic-Vietnamese studies
    1. The Zen of Sinitic-Vietnamese
    2. On the one-size-fits-all conspiracy
    3. On the relativity of historical phonology and the limits of reconstruction
  4. Vietnamese and Chinese commonalities
    1. Modern dialectal similarities
    2. The role of Mandarin
  5. The politics of Chinese-Vietnamese linguistic studies
  6. The Chinese connection
    1. The Chinese intruders
    2. The languages of China before the Chinese
    3. Linguistic evolution through colonial history: The case of Vietnamese
    4. Prelude on the Sinitic etyma
  7. Hypothesis of common Yue origin of Vietnamese and Chinese
    1. Historical background
    2. Core matter of Vietnamese etymology
    3. Chinese and the Vietnamese basic vocabulary stock
    4. A new disyllabic sound change approach to be explored
  8. The Mon-Khmer association
    1. Basic word lists at crossroads
    2. Comparative Mon-Khmer and Vietnamese basic words
    3. Cultural and polysyllabic approaches: Distinguishing Vietnamese vocabulary from Austroasiatic Mon-Khmer
  9. Similarity in cross-linguistic-family vocabularies proves no genetic relation
    1. The underlined stratum of basic vocabularies
    2. Haudricourt’s theory of tonal development
    3. Correspondences in basic vocabularies revisited
  10. Parallels with the Sino-Tibetan languages
    1. Sino-Tibetan etymologies
    2. Issues in cognates of cardinal numbers
    3. The unfinished work
  11. Vietnamese and Chinese cognates in basic vocabulary stratum
    1. Chinese basic words
    2. Sino-Vietnamese words
    3. Sinitic-Vietnamese words
  12. How sound changes have come about
    1. In search of sound change patterns
    2. An analogy of Vietnamese etymology
      1. A corollary approach
      2. Words of unknown origin
      3. Questionable words of Chinese origin
  13. Case study worksheet
    1. A synopsis of phonological sound changes from Chinese to Vietnamese
    2. Localization and innovation or "Vietnamized"
  14. Conclusion

BIBLIOGRAPHY AND REFERENCES

WEBSITES

APPENDICES


ABBREVIATION AND GLOSSARY :

  • AA = Austroasiatic linguistic family (Ngữhệ NamÁ)
  • AC = Ancient Chinese (TiếngHán Thượngcổ 上古漢語)
  • Amoy = Fukienese or Fùjiànhuà (TiếngPhúckiến hay Hạmôn 厦門方言)
  • Ancient-Vietnamese = Annamese (see AV)
  • Annamese = Ancient-Vietnamese (see AV)
  • ArC = Archaic Chinese (TiếngHán Tháithượngcổ 太古漢語)
  • associative sandhi process = changes of sound of words as the results of the assimilation of the sound or form of similar word in the same context.
  • Austroasiatic = Austroasiatic linguistic family (Ngữhệ NamÁ)
  • AV= Ancient-Vietnamese, also, ancient Việt-Mường (TiếngViệtcổ, TiếngViệt-Mườngcổ)
  • B, Beijing = Běijing dialect (thổngữ Bắckinh 北京方言)
  • bound morphemes = the smallest meaningful phonological units that are bound together and usually appear in pairs to form composite words
  • C = Chinese in general (TiếngHán 漢語) (See also: tiếngTàu)
  • Cant. = Cantonese (TiếngQuảngđông 廣東方言)
  • cf., ss, or "§" = compare (sosánh)
  • character = mostly referring to a Chinese ideogram; also, a Roman letter or an ideographic symbol (chữ, tự, mẫutự 字母, 漢字)
  • Chin., C., Chinese = Chinese in general (TiếngHán 漢語) (See also: tiếngTàu)
  • Chin. dialects, Chinese dialects = 7 major Chinese dialects, including sub-dialects (phươngngữHán, TiếngTàu 漢語方言)
  • Chaozhou (Chiewchow, Teochew) = a sub-dialect of Fukienese, also known as Tchiewchow (tiếngTriều, tiếngTiều 朝州方言), also 'TiếngTàu'
  • China North = 華北 Huáběi (Hoabắc), regions in the upper north of the Yangtze River in today's northern part of mainland China
  • China South = 華南 Huánán (Hoanam), regions below the south of the Yangtze River in today's southern part of mainland China
  • composite word = two-syllable word that is composed of two bound morphemes of which either one of them cannot function fully as a word (từkép, từ songâmtiết)
  • compound word = two-syllable word that is composed of two words (từghép, từ songâmtiết)
  • doublet = A Chinese character of the same root that appears in different form (từđồngnguyên 同源辭)
  • diachronic = concerning historical development of language of something through time
  • Dai = T’ai, Tai, Tày, and sometimes Thai, languages (TiếngTày 傣語)
  • ex. = example (= td. 'thídụ')
  • dissyllabicity = dissyllabics, dissyllabism
  • dissyllabics = Charateristics of a language based on its dominant two-syllable words in its vocabulary (tínhsongâmtiết 雙音節性)
  • dissyllabism = dissyllabics, dissyllabicity
  • EM = Early Mandarin
  • EMC = Early Middle Chinese (TiếngHán Tiềntrungcổ 前中古漢語)
  • Fk = Fuzhou, Fukienese (Fùjiàn) or Amoy (TiếngPhúckiến hay phươngngữ Hạmôn 厦門方言)
  • FQ (or Pt) = 'fănqiè' 反切 phiênthiết (initial and syllabic conjugation, a Chinese lexical spelling system in classics)
  • Hai. = Hainanese, a sub-dialect of Fukienese or Amoy (TiếngHảinam 海南方言)
  • HN = Nôm words, same VS, or Vietnamese words, of Chinese origin (HánNôm 漢喃辭匯)
  • ideograph/ideogram = a written symbol of language writing system developed from graphic representation (chữtượnghình 形像字母)
  • "iro" (or #) = in reverse order, metathesis (nghịchđảo thứtự từ)
  • IPA = the International Phonetic Symbol (Phiênâm Quốctế)
  • K, Kh. = Khmer or Cambodian (TiếngKhmer/TiếngCaomiên)
  • Kinh / NgườiKinh = literally "the metropolitans", or "the Kinh", meaning the Vietnamese majority ethnic group living in the coastal lowlands as opposed to "NgườiThượng" ("the Montagnards") which denotes minority ethnic groups living in remote highlands in Vietnam (京族)
  • Latinized / Latinization: same as Romanized / Romanization (Latinhhoá 羅丁拼音)
  • loangraph = A loangraph in Chinese is a homophone connveying a different meaning but using the same ideographic character (giảtá, 假借)
  • LZ = Late Zhou, L. Zhou (Cuối ÐờiChâu 周末)
  • M = Mandarin, QT (TiếngPhổthông, tiếngQuanthoại 普通話, 國語)
  • Malay = Malay linguistic affinity (Ngữchi Mãlai 馬來語支); National language of Malaysia (TiếngMãlai 馬來語)
  • Mao-Nan = Mao-Nan language, a Mon-Khmer language spoken by Mao-Nam ethnic group in Southern China (TiếngMaonam 毛南語) MC = Middle Chinese (TiếngHán Trungcổ 中古漢語)
  • MK = Mon-Khmer linguistic affinity (Ngữchi Mon-Khmer 猛高棉語支)
  • monosyllabicity = monosyllabics
  • monosyllabics = charateristics of a language based on its dominant one-syllable words in its vocabulary (tínhđơnâmtiết 單音節性)
  • Mèo = Hmong 苗
  • Môn = Mon
  • monosyllabism = monosyllabics
  • N = Original Vietnamese, also old Chinese-based Vietnamese wrting system
  • (từ Nôm, tiếngNôm hoặc từ thuần Việt 純喃辭匯, ChữNôm "字喃")
  • Nôm= Nôm characters of an old Chinese-character based Vietnamese writing system, or in expanding meaning Nôm words, HN (HánNôm), Vietnamese words, of Chinese origin (HánNôm 漢喃辭匯)
  • Nùng = Zhuang language, same as Ðồng, Tráng (TiếngNùng 莊語, 垌語)
  • OC = Old Chinese (TiếngHán Cổ 古漢語)
  • OV = Old Vietnamese form (TiếngViệt cổ / TiếngViệtMường cổ)
  • Pt = FQ 'fănqiè' 反切 phiênthiết (initial and syllabic conjugation, a Chinese lexical spelling)
  • Pinyin = People's Republic of China's official Romanization transcription system of Pǔtōnghuà (pinyin haylà bínhâm 拼音 -- phiênâm )
  • polysyllabicity = polysyllabics
  • polysyllabics = charateristics of a language based on its dominant multi-syllable words in its vocabulary (tínhđaâmtiết 多音節性)
  • polysyllabism = polysyllabics
  • pre-SV = pre-Sino-Vietnamese (TiềnHánViệt 前漢越辭匯)
  • pro-C = proto-Chinese (TiếngHán Tiềnsử 前史漢語)
  • Putonghua, or Pǔtōnghuà = Official name of Mandarin (Tiếngphổthông haylà Quanthoại 普通話/國語)
  • PV = proto-Vietnamese, proto-Vietic (TiếngViệt Tiềnsử)
  • radical = basic Chinese ideographic root on which other characters are built (tựcăn 字根)
  • Quốcngữ = Vietnamese national orthography
  • Romanized / Romanization: same as Latinized / Latinization (Latinhhoá 羅丁拼音)
  • synonymous compound = compund word that is composed of two synonymous syllables or words (từghép đẳnglập, từkép đẳnglập, từsongâmtiết đẳnglập)
  • sandhi = change of sound of word under the influence of a preceding or following sound
  • sandhi process of assimilation / association = same as the associative sandhi process
  • synchronic = studying language as it exists at a certain point in time, without considering its historical development
  • Sinicized = influenced, characterized, and/or identified by Chinese elements (Hánhoá 漢化)
  • ss, or "§" = cf., compare (sosánh)
  • ST = Sino-Tibetan (HánTạng 漢藏語系)
  • SV = Sino-Vietnamese (HánViệt 漢越辭匯)
  • Tai, T'ai, Tày, Thái (see Dai)
  • Tchiewchow = a sub-dialect of Fukienese, also known as Chaozhou (tiếngTriều, tiếngTiều 朝州方言) with variants spellings), Chaozhou, Tchewchow, Teochoew, Teocheo, Chewchow, etc.
  • Thượng / NgườiThượng = See: Kinh/NgườiKinh
  • TiếngTàu = a coloquial term to connote the Chinese languages, of which the term "Tàu" could have originated from Tần 'Qín 秦' or tiếngTiều 朝州方言 (từ "Tàu" cóthể do "Tần" hoặc tiếngTiều 朝州方言 màra.)
  • V, Viet. = Vietnamese (TiếngViệt 越南話)
  • Vh, Vh @, Việthoá = "Vietnamized", vernacular reflex of
  • VHh, VHh @, ViệtHánhoá = "Sino-Vietnamized", folk Sino-Vietnamese
  • "Vietnamized" = Characterized by the localization of loanwords to fit into Vietnamese speech habit (Việthoá 越化), vernacular reflex of
  • VM = VietMuong or Việt-Mường form (TiếngViệtMường 越孟語)
  • VS = Sinitic-Vietnamese (HánNôm 漢喃辭匯), vernacular Vietnamese
  • Zhuang = the Zhuang language, same as Nùng, Ðồng, Tráng (TiếngNùng 莊語, 垌語)

x X x

Chapter 1 - Introduction to Sinitic-Vietnamese

 

Executive Summary

This chapter establishes the conceptual and methodological groundwork for analyzing Sinitic-Vietnamese (VS)—a foundational stratum in Vietnamese etymology and linguistic identity. VS denotes the deeply naturalized layer of Chinese-derived vocabulary, shaped by sustained contact with northern Sinitic lects and the broader Sino-Tibetan family. Through an interdisciplinary lens, this study traces the linguistic evolution of Vietnamese, foregrounding its dual inheritance: a Yue-descended substrate interwoven with Sinitic influence. The VS stratum exemplifies the cumulative integration of Sinitic elements into Vietnamese, forged through dynastic governance, cultural transmission, and vernacular adaptation across centuries of Annamese history.

I ) Defining Sinitic-Vietnamese

Sinitic-Vietnamese encompasses all Chinese-derived vocabulary that has undergone localization within the Vietnamese linguistic environment. It includes subsets such as Sino-Vietnamese (SV), rooted in Middle Chinese phonology, which formed the backbone of administrative, literary, and colloquial Vietnamese during the Han and Tang periods. Sino-Vietnamese is not merely a historical residue, it is a living system of semantic and phonological adaptation.


The chapter traces Sinitic-Vietnamese origins to the Yue aboriginals, pre-Han inhabitants of southern China and northern Vietnam. Their linguistic contributions to proto-Vietic and Tai-Kadai languages shaped the substrate upon which Sinitic layers were later imposed. The term Việtnam itself, "Yue people of the South", encapsulates this fusion of Yue and Han cultural-linguistic heritage.

III) Linguistic evolution through dynasties

Sinitic elements entered Vietnamese during the Han colonial era (206 B.C.–24 A.D.), and were further enriched by Tang influence. These layers evolved into functional registers, literary forms, and vernacular usage, culminating in ChữNôm and later Quốcngữ, the Romanized national script. The chapter also sketches Middle Chinese tonal systems and their role in shaping Vietnamese phonology, emphasizing Vietnam's position as a Yue-descended yet highly Sinicized language.

IV) Comparative and etymological challenges

This chapter reexamines the etymological foundations of Vietnamese by proposing Sino-Tibetan origins for a substantial portion of its lexicon, challenging long-standing Austroasiatic Mon-Khmer classifications. Through comparative phonological and semantic analysis, it uncovers Vietnamese cognates with Old Chinese, many of which have been historically obscured or misclassified.
 
These items suggest deep geographic and etymological ties between Vietnamese and early Sinitic strata, particularly those shaped by Yue substratal influence and Han expansion. The evidence supports a reevaluation of Vietnamese's linguistic lineage, not as a peripheral Austroasiatic offshoot, but as a Yue-descended, Sinitic-integrated language with complex tonal and morphological inheritance.

V) Cultural integration and beyond

Beyond language, the chapter notes cultural remnants such as the twelve-animal zodiac system and agricultural terminologies that reflect Yue and Taic roots. It also sketches the socio-political significance of linguistic change during colonial and post-independence eras, showing how historical influences shaped Vietnamese identity.

VI) Key contributions to linguistics

This research reframes regional relationships by comparing Sinitic-Vietnamese etymology with Sino-Tibetan variants with Old Chinese, Middle Chinese, Mandarin, Cantonese, Hokkien, and other lects. The comparative framework enables a more nuanced understanding of shared phonological, morphological, and semantic features across the Sinitic-Yue continuum.

x X x

This introductory section aims to provide readers with a foundational overview of the study. It introduces key concepts and engages with illustrative examples of Sinitic-Vietnamese  vocabulary, particularly those whose etymologies, despite clear Sinitic or Sino-Tibetan (ST) origins, have been misclassified as Mon-Khmer (MK). These examples serve to highlight the methodological challenges and historical misattributions that have shaped the field.

The discussion also extends into a new frontier of Vietnamese historical linguistics: the identification of prominent   Sino‑Tibetan  (漢藏 Hàn‑Zàng) etymological evidence as it will be elaborated in the Chapter 10 - Parallels with the Sino-Tibetan languages . Among the primary objectives of this study is to establish a structured methodology for investigating this discovery. The findings reopen the long‑standing debate over whether  Vietnamese s hould be reclassified as a member of the Sino-Tibetan language family.

This chapter introduces the framework of Sinitic-Vietnamese as a comprehensive approach to analyzing Chinese-derived vocabulary in Vietnamese. Unlike the narrower category of Sino-Vietnamese, which reflects formalized Middle Chinese phonology, the Sinitic-Vietnamese domain encompasses both literary and vernacular adaptations shaped by sustained linguistic contact. Vietnamese is situated within a Yue substratum, and the chapter proposes a Sino-Tibetan affiliation based on phonological and semantic evidence, challenging the conventional Austroasiatic classification.

Polysyllabicity is introduced as a central methodological principle, enabling the identification of layered etymologies and semantic and phonetic shifts across registers.

Cultural domains including the zodiac, agricultural terminology, and literary traditions demonstrate the enduring influence of Yue-Taic heritage. Lexical and idiomatic examples such as "mẹo" (卯), "ngọ" (午), "gà" (雞), "trống" (雄), "cồ" (公), "mái" (母), and the colloquial phrase "Bấtkể ai nóigànóivịt, mình chỉ nói ngang." (不管 講雞講鴨, 我 只 講 鵝) illustrate bidirectional transfer and deep-rooted cognates.

By integrating historical periodization, comparative linguistics, and typographic precision, the chapter lays the foundation for a polysyllabic annotated lexicon and a revised linguistic historiography. It advocates for a reclassification of Vietnamese and a more nuanced understanding of its Sinitic layers, with the goal of advancing methodological clarity and scholarly accessibility.

I) Defining Sinitic-Vietnamese

In this paper, 'Sinitic-Vietnamese' not only designates a blend of foundational items rooted in the Yue substrate, layered with Old Chinese elements, and further enriched by the "Sino-Vietnamese" layer of Middle Chinese loanwords but also  refers to lexical items derived from, or shared with, northern Mandarin Chinese (M), introduced through processes of localization and innovation by speakers within the colonial administration of Annam at present‑day northern Vietnam for over more than nine centuries, from 111 B.C. to 939 A.D. The term also encompasses a distinct subset identified as Sino‑Vietnamese (SV), whose phonological and semantic origins trace to Middle Chinese (MC). Over preceded centuries, this class had developed under the administrative influence of officials serving various northern Chinese imperial dynasties. Comparable to the Sinitic strata in southern Chinese lects such as Cantonese and Fukienese (Hokkien), these elements form a foundational layer of the modern Vietnamese lexicon.

Sinitic‑Vietnamese (VS)  encompasses every lexical item of Chinese origin that has been localized within the Vietnamese speech environment, including:

  • Sino‑Vietnamese (SV): A codified subset rooted in Middle Chinese phonology, functioning in Vietnamese much like Greco‑Latin loanwords in English.

  • Pre‑Sino‑Vietnamese forms: Older loans from pre‑Qin and Han eras, many with Old Chinese (OC) or Taic‑Yue origins.

  • Parallel forms: Doublets where one is formal‑literary and the other colloquial‑vernacular, sometimes diverging in meaning.

The scope of Sinitic‑Vietnamese will include all mono‑ and disyllabic words of Chinese origin, including those that resemble or sound like Sino‑Vietnamese forms, except where 'Sino‑Vietnamese' applies specifically to words as exemplified in a "Hán‑Việt từđiển" (Sino‑Vietnamese dictionary).

By convention, the term Sino‑Vietnamese (SV), or Hán‑Việt (漢越), is most often used to refer to the systematic Vietnamese pronunciation of the large body of Chinese vocabulary employed in modern Vietnamese. In analogy, Sino‑Vietnamese words function much like Latin‑ or Greek‑derived terms in English. The Vietnamese pronunciation in this context reflects the consensus that Hán‑Việt words are those rendered with modern Vietnamese phonological characteristics. In reality, they represent slight variations of Middle Chinese sounds, which are believed to have been used in the spoken language of the imperial court from the early colonial period, paralleling the development of Cantonese in the same era.

Each lexical stratum carries its own developmental history. In contrast to the term 'Sinitic', the term 'Yue', written alternatively in Chinese Classics as 越, 粵, 戉, 鉞, among other forms, is used here to denote the indigenous linguistic stratum composed of core vocabulary upon which the proto‑Vietic language evolved. "Yue" denotes the indigenous southern substratum, upon which Sinitic-Vietnamese was imposed. Archaeological and textual records suggest Yue communities pre‑date the ethnolinguistic entity now called "Chinese" by millennia. Western labels like “Sinitic” are scholarly shorthands; while imperfect, they aid accessibility in comparative linguistics.

The use of prefixes such as 'Sino‑' or 'Sinitic‑' to denote the concept of 'Chinese' in linguistic taxonomy should be understood as a matter of scholarly convenience. These terms, frequently adopted by Sinologists, serve as shorthand for a widely recognized label. In the historical periods under discussion, however, the entity now called 'Chinese' had not yet formed until the Qin Dynasty. Archaeological and textual evidence shows that Yue communities predated the emergence of what would later be called 'China', along with the linguistic features that came to define it.

The term 'Chinese', nevertheless, is effective in this context because of its broad recognition, whereas 'Yue' remains comparatively unfamiliar. This usage represents, hence, a form of academic shorthand, employing familiar terminology to efficiently reference earlier linguistic forms recognizable to the scholarly community. Such naming conventions are standard practice in historical linguistics. Substituting it with 'Việt' or 'Jyut6' in a title would likely reduce accessibility and limit broader scholarly engagement, but that is 'what makes Chinese so Vietnamese.'

Etymologically, many foundational Vietnamese words are currently classified by historical linguists within the Austroasiatic Mon‑Khmer (AA‑MK) subfamily, itself nested within the broader Austric linguistic family. However, it is hypothesized that these core terms may instead descend from a shared ancestral Yue root. This root is posited to derive from an older Taic‑Yue substratum, a proto‑language complex that predates and contributed to the formation of proto‑Vietic (the forebear of the Việt‑Mường group) as well as other Daic languages. Elements of this Taic‑Yue layer are also discernible in Chinese lects belonging to the Sino‑Tibetan family, including Cantonese and Fukienese, suggesting a deeper historical interconnection across the region.

In lexical practice, Sinitic‑Vietnamese and Sino‑Vietnamese function in tandem. They complement each other across literary registers, from classical texts to modern usage including everyday speech across diverse social contexts. This functional parity underscores the intricate ways in which Vietnamese is interwoven with 'Chinese', not only linguistically but conceptually, contributing to the distinctively Vietnamese character of Chinese‑derived vocabulary.

The scope of Sinitic‑Vietnamese sometimes extends loosely to include other strata: forms traceable to Old Chinese (OC), also referred to as Archaic Chinese (ArC), Ancient Chinese (AC), and occasionally Early Middle Chinese (EMC) as well. It may also encompass the class of "Tiền‑Hán‑Việt", or pre‑Sino‑Vietnamese loanwords from pre-Qin-Han era, along with their Vietnamese variants, some of which may date back to proto‑Chinese origins. 

Such archaic forms belong to various pre‑Han linguistic stages, representing ancestral precursors to OC in the pre‑Qin era, centuries before present (B.P.). Over time, Sino‑Tibetan and Sinitic etyma circulated bidirectionally between Chinese and ancient Vietnamese lexicons, undergoing changes in both form and meaning, for example:

  1. bụtPhậtvãi佛 Fó (SV Phật) [M 佛 Fó, fú, bó, bì (Phật, bột, phất, bất) < MC but, phut < OC *bɯd || Note: Derived from 'Buddha' in Sanskrit, cf. VS 'bụt' > SV 'Phật'. Cantonese: fat42, Wenzhou 溫州: vai42. In Vietnamese, 'bụt' preceded the later equivalent of Buddha.) ]: Buddha, Buddhist, Buddhist monk.

  2. bụaphụvợ: 婦 fù (SV phụ) [ M 婦 (媍) fù < MC buw < OC *bɯʔ || cf. 'goábụa' 寡婦 guăfù (widow), 'vợchồng' 公母 gōngmǔ (wife and husband) ]: wife, lady, woman.

  3. chàilướichàilưới羅 luó (SV la) [ M 羅 luó < MC la < OC *ra:l || cf. 羅 luó (SV la) + 羅 luó (VS lưới)]: net-fishing, bird net, net.

  4. cộxexecộcỗcỗxe車 chē (SV xa) [ M 車 chē, jū, jù < MC cʰia, kɨə̆ < OC *kʰlja, *kla || cf. 'xe' 車 chē, 'cộ' 檋 jù (SV cục) and possible cognate Cantonese 架車 /kache/) (車) ]: carriage, car, modern automobile.

  5. ôngtrốngcồ公 gōng (SV công) [M 公 gōng < MC kəwŋ < OC *klo:ŋ || cf. 雞公 jīgōng 'gàcồ' ~ 'gàtrống' (rooster), 主公 zhǔgōng 'ôngchủ' (master), 公母 gōngmǔ (trốngmái, vợchồng) ]: duke, public, senior male figure, man of authority, grandfather, husband's father, rooster.

The subtitle "An Introduction to Sinitic‑ Vietnamese Studies" originated as the title of the initial outline draft, first published online in 2003. At the time, it served as a foundational guide to the study of the Sinitic ‑Vietnamese (VS) field, drawing upon available data compiled by various authors in related disciplines, with a primary focus on etyma. Since then, the scope of the survey has expanded significantly, fueled by new discoveries in both Vietnamese and Chinese etymologies. These findings reveal shared linguistic traits between the two languages, providing a robust springboard for advancing academic achievements in this interdisciplinary field. 

The author hence finds it apt to title this paper 'What Makes Chinese So Vietnamese?' reflecting the historical reality that the Yue existed first, and it was only afterward that the Chinese emerged on what is now the Flowery Land.

The divergence between these linguistic classifications stems largely from their synchronic mode of analysis. For example, the term 'Sinitic', though historically tied to the Qin State of the 3rd century B.C., is retroactively applied to proto-Chinese formations that predate the Qin Dynasty by millennia, reaching back beyond the Shang and Xia dynasties to encompass over five thousand years of linguistic development.

Modern Vietnamese began to take shape in the 12th century with a majority of Sinitic-Vietnamese vocabulary can be traced across the past three millennia through Chinese historical records (Nguyễn Tài Cẩn, 1978; see Appendix I). In prehistorical period, however, research on Yue origin of Vietnamese requires engagement with alternative hypotheses, such as those proposed by De Lacouperie (1887) and even scholars of the Austroasiatic Mon-Khmer school, which offer provisional frameworks for understanding deeper linguistic relationships.

In the early 20th century, Vietnamese used to be classified a of Sino-Tibetan language. Nevertheless, there was not a notable research on such supposition. 

To make that happen, this research, drawing on extensive comparative analysis,  isolates newly identified Vietnamese terms attested within Sino-Tibetan languages. Following exemplified cases are for illustrations of how close their etyma:

  • "bồng" ~ "bế" 抱 bào (SV bão): 'carry' [ N. Ass. Midźu ba (N),Taying ba (N) (p. 186), E. Nyising bü (p. 194) | (Haudricourt) Daic Siamese peek, Lao ɓɛk, Shan mɛk, Tay Noir, Tay Blac ɓɛʔ, Tho bɛk || cf. Hainanese /boŋ2/ ]

  • "biển" ~ "bể" 海 hăi (SV hải ~ VS "khơi") [ Sino-Tibetan: M. Bur. pań-lay, Karenic *pań, Pwo pə9-lai28, Sgaw pä7-lâ7, p@7-lâ7 || cf. Cantonese /hoi2/ for VS "khơi" as in "rakhơi" @ 出海 chūhǎi (SV xuấthải, 'set sails'), "ngoàikhơi" @ 海外 hǎiwài (SV hảingoại, 'be out at sea') ]: 'the sea',

  • "bò" 牝 bì  (SV bí): 'cow' [ OB ba, OB E. *bik || A W. Bod. Burig bā (p. 83), Groma, Śarpa bo (calf), Dangdźongskad, Lhoskad ba (p. 93), Central Bodish Lagate pa-, Spiti, Gtsang, Dbus, Ãba bʿa, Mnyamslad, Dźad pa (p. 98), other Bod. languages Rgyarong (ki)-bri, -bru (p. 120), modern Bod. dialects New Mantśati (bullock), Tśamba Lahuli (ox) bań, Rangloi bań-ƫa (bullock) (p. 130) || also Chin. 牝 byi/ (Chin. cow, female of animal), OB ãbri-mo (tame female yak) (p. 59), Minor group Toţo pik-(a), Dimal pi-(a) (p. 187), Southern Branch Kukish *b@ń, Luśei b@ń, Thado boń, Vuite -b@ń- (p. 250), E. Himalayish bʿi, Khambu pi', Lohorong, Yakhha pik (p. 330) | for 'buffalo': Luśei pă-na, Khami *mă-na, Karenic *-na-, Karenni pæ2-nä2, Pwo pə1-na6, Sgaw pə2-nə8, Bwe pa-nä2 (p. 414) | (Haudricourt) Chin. ńǔ- 牛 (M níu), Siamese ŋwă, Lao, Tay Noir ńuo, Shan, Tay Blanc ńo, Tho, Nung mɔ, Sui mo, Mak pho (p. 501) ]
not to mention other entries happened to be recorded in the Kangxi Dictionary, such as
  • "ăn" (唵  ǎn, SV àm)'eat' [ Also VS "ngậm" (hold in the mouth) || M àn 唵 ʿām-, Luśei *um, Siamese ʿ@m (p. 71) || Note: 唵 àn is plausibly cognate to VS 'ăn' or eat. As Sino-Tibetan scholars, Shafer or Haudricourt should switch this word with their M hán 含 ɣām-. Kangxi Dictionary define this entry as 'eat with the hand.' ]

  • "nước" (淂 dé, SV đắc): 'water' [ In semantic alignment with 'water' as define in the Kangxi Dictionary as 'Guangyun - Entering Tone - 德·德': 淂 'appearance of water'. Also read with the fanqie 丁力切. 'Kangxi Dictionary - Water Section - Eight': 淂 in Guangyun, read 都則切; in Jiyun, read 的則切. Both pronounced 德. 'Yupian': means “water.” Also glossed as “appearance of water.” Additionally, Guangyun records 丁力切, pronounced 滴. The meaning is the same. || cf. Proto-Vietic *ɗaːk, Cantonese /dak1/ || cf. (Haudricourt) Daic Siamese ʾnām, Shan, Sui, Mak nam, Lao, Tho, Ahom, Tay Noir, Tay Blanc, Dioi, Mak năm, Nung ram, Bê nɔm, Li nom, nəm (p. 482) ]

As a result, the scope of inquiry expands beyond Vietnamese–Chinese (越漢 YuèHàn, or 'Sinitic-Vietnamese') cognates to encompass etymologies distributed across the broader Yue and Sino-Tibetan spectra. This expanded scope includes reflexes traceable to Old Chinese (上古漢語 Shànggǔ Hànyǔ) and pre‑Qin-Han strata, with evidence of bidirectional lexical transfer between ancestral Yue (越) and Sinitic (漢 Hàn) domains.  In doing so, the analysis directly challenges established Austroasiatic theories that assert a Mon‑Khmer (MK) origin for Vietnamese, backed by Sino-Tibetan—'Bod' or (蕃) etyma, offering substantial support to the Sino-Tibetan hypothesis. This re‑evaluation is grounded in shared phonological innovations, semantic correspondences, and structural patterns documented across the Sino-Tibetan continuum, all framed within the polysyllabicity principle for rigorous cross‑linguistic comparison .

II) Historical roots and Yue influence

Archaeological findings and early chronicles converge on a shared narrative: Yue communities inhabited the southern reaches of what is now China and northern Vietnam for centuries prior to Qin unification. Their languages contributed essential phonological structures, core lexicon, and syntactic preferences to the proto‑Vietic substrate.

These Yue—pre‑Han populations of the region—served as linguistic architects of proto‑Vietic, supplying phonological and semantic building blocks that later absorbed Han‑ and Tang‑era vocabulary through successive waves of contact. The name " Việtnam" itself ('Yue people of the South') encodes this dual inheritance. Cultural and linguistic exchange unfolded in tandem with political annexation, particularly following the Han conquest of NamViệt in 111 BCE. Yue‑origin forms persist in modern Vietnamese, from zodiacal terms such as " mẹo" (卯) to agricultural and kinship lexicon.

Traditional Austroasiatic classifications place Vietnamese within the Mon‑Khmer branch. This chapter reconsiders that placement, presenting phonological and semantic correspondences with Sino‑Tibetan lects that support an alternative alignment. What Indo‑European scholars have labeled ‘Austro‑Asiatic’ was, in effect, the linguistic domain of Yue communities inhabiting China South (華南, Hoanam) prior to the arrival of populations who would later be called 'Chinese'. This is the case often described as China before the Chinese—a framing that also resonates with the qualified question: What makes Chinese so Vietnamese? For Vietnamese of Yue origin, and for the Vietnamese polity, the enduring presence of the meme " Việt " that is, " Yue " represents both survival and sovereignty of identity.

In this study, the former indigenous inhabitants are designated as ' Taic' . From this population emerged the Daic‑Kadai, the Yue, and the Austroasiatic Mon‑Khmer, the latter incorporating both Taic and Yue components. Later waves of migration gave rise to Sino‑Tibetan groups with Taic and proto‑Tibetan elements; to the Han (Chinese), formed through a fusion of Taic + Yue + Sino‑Tibetan components; and to the Vietnamese, whose linguistic and cultural profile reflects a synthesis of Yue and Han elements.

On the premise that prehistoric southern China was originally inhabited by ancient Yue aborigines, early Chinese populations emerged from the fusion of these Yue with proto‑Tibetan migrants from the southwestern plateau, further mixing with Tartar groups from the southern periphery of Siberia. These elements coalesced into the diverse populations of various pre‑Chinese polities in the centuries before the Qin conquest (秦國), and continued through successive historical transformations to shape the demographic and cultural landscape well into the twentieth century.

By 111 BCE, the NamViệt Kingdom stood in the south alongside the Han Empire. However, Liu Bang's Han annexed Triệu's NamViệt, inaugurating a prolonged era of Chinese rule and intensive Sinicization. Only in 939 CE, after more than a millennium under Chinese dominion, did the ancient Annam prefecture, located in what is now northern Vietnam, achieve independence from the NamHan State (南漢國  NánHàn Guó ).

As a foundation for this premise, it is widely acknowledged in academic discourse that the Sino‑Tibetan and the proto‑Chinese peoples were absent from the geographic regions they now inhabit roughly 5,000 years before present. The term ' Chinese' has never denoted a racial category, but rather a cultural construct shaped by a historical experience in which the prevailing mentality was that of emigrants repeatedly seeking to leave the often repressive yet persistently compelling polity of mainland China. This trajectory began with the Qin Dynasty, and after its collapse (221 BCE–207 BCE), its authoritarian legacy was assumed by the Han Empire (漢朝) and perpetuated by successive Chinese dynasties.

The discussion of Yue entities in ancient Annam gains further depth when situated within this broader arc of early Chinese history. From this perspective, the introduction of Sinitic elements was preceded by the long‑established presence of Yue communities. Evidence for this sequence is found in both cultural artifacts, such as the twelve‑animal Zodiac system, and in lexical correspondences — for example, /krong/ 'river', cognate with 江 (jiāng) as in 'Sông Dươngtử' 揚子江 (Yángzǐjiāng, 'Yangtze River'), in contrast to 'Hoànghà' 黃河 (Huánghé, 'Yellow River'). Both names were recorded by early Chinese sources for two great rivers that have long defined China's geopolitical and cultural identity. Historically and linguistically, these two river systems marked the boundary between the Yue and Han spheres.

Yue‑derived forms embedded within the Sinitic branch of the Sino‑Tibetan language family are preserved in much of Vietnam's foundational lexicon. Notable examples include 'voi' (elephant) aligned with 為 (wēi), 'chuột' 鼠 (shǔ, 'mouse'), and 'bò' 牝 (bì, 'ox'), among others. (See Chapter 10 - Parallels with the Sino-Tibetan Languages.)

The Sinitic-Vietnamese layer of Vietnamese vocabulary developed primarily during and after the Han colonial periods. Illustrations include "gà" ('chicken') corresponding to 雞 (jī), "buồng" ('room) 房 (fáng), 羅 (luó) reflected in "chài" and "lưới" ('net'), and 車 (chē) aligned with "xe" ('carriage'') () . See Chapter 11 - Vietnamese and Chinese Cognates in Basic Vocabulary Stratum for further comparative data.

Taken together, these features attest to the deep interweaving of Yue and Sinitic elements in the linguistic foundation of Vietnamese and support a reconsideration of its etymological origins. The lines of inquiry outlined here will be pursued in greater detail in subsequent chapters.

Table 1.1: Proto-Tibetan Migration and Shu Contact 

Proto-Tibetan groups are believed to have originated in the highlands of southwestern China, particularly in regions bordering modern-day Yunnan and Sichuan.

  • The Shu polity (蜀國), centered in Sichuan, was known for its early bronze culture and distinct linguistic profile.

  • Archaeological findings from sites such as Sanxingdui and Jinsha reveal material assemblages unrelated to central plains cultures, suggesting contact with highland populations.

  • Migration patterns inferred from burial styles and ceramic typologies indicate northward movement along Yangtze tributaries, consistent with your claim.

Extinct Populations and Material Assemblages

Isolated archaeological sites in Sichuan and adjacent regions show evidence of cultural discontinuity, abrupt shifts in material culture that suggest population replacement or extinction.

  • These assemblages often include non-Han artifacts, such as stylized masks, ritual bronzes, and unique pottery forms.

  • Linguistic extinction is inferred from the absence of direct descendants in modern Sino-Tibetan languages, though substratal influence may persist in phonology and syntax.

    (See 
    Comparative Sino-Tibetan Etymologies)

In a more remote epoch, Proto‑Tibetan groups—originating in the southwestern highlands of ancient China—migrated northward, interacting with indigenous communities along the periphery of the Shu polity (蜀國) in present‑day Sichuan. Their migratory paths extended toward the northeastern tributaries of the Yangtze River. Archaeological evidence from isolated sites, distinguished by unique material assemblages, indicates that these populations have since become extinct.

The fusion of Taic‑Yue aboriginals with Proto‑Tibetan nomads migrating from what is now southwestern China ultimately gave rise to the broader Sino‑Tibetan ethnolinguistic complex. This included the proto‑Chinese founders of the Xia Dynasty, dated to nearly 5,000 years ago.

According to both legend and the Chinese historical record, these populations established the Yin polity (殷朝, "NhàÂn", 1600 B.C.–1046 B.C.), initiating the Yin‑Shang Dynasty. Between approximately 1225 B.C. and 1220 B.C., the Yin are recorded as having invaded ancient Annam. Over the subsequent two millennia, pre‑Chinese populations merged with Taic‑Yue communities, forming the ethnolinguistic matrix later identified as 'Chinese' well before the pre‑Qin‑Han consolidation. Among the Yue were lineages diverging from the same Taic substratum as the founders of the Chu polity, including ancestral Zhuang (百),  communities that later established both the Yue (越國) and Eastern Yue (東粤) states.

As the Yin ("Ân") advanced southward, Yue populations were displaced, migrating deeper into the southern regions. The Qin‑Yue admixture, shaped over successive millennia, dispersed along both northern and southern migratory corridors. These trajectories extended from a pivot in present‑day Yunnan through Zhejiang and Fujian provinces; turning southward, they traversed Hubei, Jiangxi, and Jiangsu, ultimately reaching territories now encompassed within the Austric, Austronesian, Austroasiatic, and Austro‑Thai hypotheses, both in anthropological and linguistic classification. Across this expanse, the languages exhibit demonstrable relatedness; divergences arise primarily from the multiplicity of nomenclatures under which they have been categorized (cf. Terrien de Lacouperie 1965 [1887]).

In modern taxonomy, 'Chinese' lects and their dialects and sub-dialects are classified under Sinitic—not because Sinitic predates Yue, but because the designation reflects their Sino-Tibetan affiliation. Likewise, as used here, the term Yue (越) (M), or more precisely, 'Viet', does not imply that either the ancient Yue aboriginals or the modern descendants of the "LạcViệt" (雒越, LuoYue) constitute a homogeneous ethnolinguistic population.

To clarify: the Yue, whether Eastern Yue (東越) in the Zhejiang region or Southern Yue (南越) in Guangdong—corresponding to the Wu groups and Cantonese speakers respectively—were not direct ancestors of the modern Daic peoples. Rather, both descended from a shared ancestral Taic lineage. This same origin plausibly extends to the Chu State and the NamViệt Kingdom, by analogy. That explains the mutual unintelligibility of their lects. Nevertheless, their shared Yue features and indigenous etyma across the languages of China South and North Vietnam produced numerous cognate doublets, many of which are preserved in the Chinese classical tradition. For example, the Kangxi Dictionary (康熙字典) records 淂 dé (SV "đắc"), linked to Old Viet /dák/ 'water', alongside 水 shuǐ (SV "thuỷ", "đák"; cf. 踏 tǎ, VS "đạp"), meaning 'water' or 'river', which also appears as 川 chuān (SV "xuyên") and 江 jiāng (SV "giang") for Vietnamese "sông" ('river').

These doublets preserve vestiges of archaic speech from native populations within the bounds of ancient states later absorbed into the Chinese empire, including the states of Shu (蜀國 'NướcThục'), Chu (楚國 'NướcSở'), Yue (越國 'NướcViệt'), and the NamViệt Kingdom (南越王國 'Vươngquốc NamViệt'). Their territories were home to ethnically composite groups such as the Luo Yue (雒越 'LạcViệt'), Xi'Ou (西甌 'TâyÂu'), Ou Yue (歐越 'ÂuViệt'), Dong'Ou (東甌 'ĐôngÂu'), and MinYue (閩越 'MânViệt'), tribal confederations of considerable diversity.

Table 1.2: ÂUVIỆT

The ÂuViệt or OuYue (Chinese: 甌越) was an ancient conglomeration of Baiyue tribes living in what is today the mountainous regions of northernmost Vietnam, western Guangdong, and northern Guangxi, China, since at least the third century BCE. They were believed to have belonged to the Tai-Kadai language group. In eastern China, the Ouyue established the Dong'Ou or Eastern Ou kingdom. The Western Ou (西甌; pinyin: Xī'Ōu; Tây meaning "western") were other Baiyue tribes, with short hair and tattoos, who blackened their teeth and are the ancestors of the modern upland Tai-speaking minority groups in Vietnam such as the Nùng and Tay, as well as the closely related Zhuang people of Guangxi.

The  ÂuViệt traded with the LạcViệt, the inhabitants of the state of Văn Lang, located in the lowland plains to ÂuViệt's south, in what is today the Red River Delta of northern Vietnam, until 258 or 257 BCE, when Thục Phán, the leader of an alliance of ÂuViệt tribes, invaded Vănlang and defeated the last Hùng king. He named the new nation "ÂuLạc", proclaiming himself "Andươngvương" (literally "Peaceful Virile King"). The origins of Thục Phán are uncertain. According to traditional Vietnamese historiography, he was the prince or king of the Kingdom of Shu (in modern Sichuan). However the kingdom of Shu was conquered by the Qin in 316 BCE, making it chronologically improbable that Thục Phán was Shu royalty a hundred years later. There may be some merit to the story due to archaeological evidence of cultural ties between Yunnan and the Proto-Vietnamese, but possibly as a result of the gap in time between the origin of the story and when it was recorded, the location could have been changed to Shu or simply mistaken due to erroneous geographical knowledge. According to a translated oral account of a Tày legend, the western part of ÂuViệt's land became the Namcương Kingdom, whose capital was located in what is today the Caobằng Province of Northeast Vietnam. It was there that Thục Phán hailed from. The authenticity of this account is considered suspect by some historians. It was published in 1963 as a translation while no extant copy of the original Tày text exists. The title of the story contains many Vietnamese words with slight tonal and spelling differences rather than Tai words. It is uncertain what text the translation originated from.

According to Chinese historians:

The Qin Dynasty conquered the State of Chu, unifying China. Qin abolished the noble status of the royal descendants of the State of Yue. After some years, Qin Shihuang sent an army of 500,000 to conquer the West Ou. After three years, Qin forces killed West Ou chief Yiyusong (譯籲宋). Even so, West Ou waged guerilla warfare against Qin and slew Qin commander Tu Sui (屠睢) in retaliation.

Before the Han Dynasty, the East and West Ou regained independence. The Eastern Ou was attacked by the MinYue Kingdom, and Emperor Wu of Han allowed them to move to region between the Yangtze and the Huai rivers. The Western Ou paid tribute to NanYue until it was conquered by the Han. Descendants of these kings later lost their royal status. Ou (區), Ou (歐) and Ouyang (歐陽) remain as family names.

According to Vietnamese historians:

257 BCE, Andươngvương 安陽王 unified the LạcViệt tribe (Austroasiatic) (chiefdom) of Hung Kings 雄王 (Hùngvương) with his ÂuViệt tribe (Tai-Kadai) (chiefdom) into a single tribe (The ÂuLạc chiefdom).

208 BCE, Zhao Tuo captured ÂuLạc and incorporated it into his Han kingdom of NanYue, which was ruled by the Han Dynasty.

Source: https://en.wikipedia.org/wiki/Âu Việt


Prior to the first century B.C., the Chinese-Han population had already emerged as an anthropological fusion of proto-Tibetan groups and Yue indigenous peoples, forming the core population of the Qin state. This population was drawn from six other ancient states, with a notable contribution from Chu subjects, including Daic and Yue peoples.

Following the annexation of the NamViệt Kingdom into the Han Empire in 111 B.C., the Yue people became further intermixed with Han subjects, expanding beyond the Lingnan southern region. This process of integration repeated itself continuously across both space and time.

The ethnic composition of the Han populace likely preserved the same proportionate racial fusion that had characterized the Chu polity by the time the Han Empire was founded. However, the total population must have declined due to the preceding wars. Crucially, Yue-Daic elements remained predominant among Han subjects following the fall of Chu, given that Chu itself had originated as a Daic polity. This continuity is historically significant: the founding emperor of Han, Liu Bang (劉邦), along with his generals, sub-commanders, and much of the infantry, were originally Chu fighters who had resisted the Qin army prior to the Han's eventual triumph. This fact merits renewed emphasis for its broader ethnohistorical implications.

The term Han and Han-related designations originated from Hanzhong (漢中, SV "Hántrung"), a remote enclave in present-day Shaanxi Province. Liu Bang (劉邦, SV "Lưu Bang") had been appointed viceroy of Hanzhong by General Xiang Yu (項羽, SV "Hạng Võ"), the last Duke of Chu, acting on behalf of the final King of Chu. However, Liu Bang and Xiang Yu later turned against one another (楚漢戰爭, 206–202 B.C.), and the victorious Han faction subsequently dissociated from Chu heritage, identifying themselves as the "Han people"—that is, the followers of the Hanzhong viceroy.

As a result, 'Han' entities emerged alongside terms such as 'Chinese' (from 'China') and 'Sinitic' or 'Sino' (from 'Qin'). While alternative names such as 'Cathay', 'Tang', or 'Qing' have also been applied to the Han entity, their actual racial composition reflects a unification of states within China, shaped by centuries of admixture and regional consolidation. (C)

 


Figure 1.1: Map of territories of dynasties in China
Source: https://en.wikipedia.org/wiki/File:Territories_of_Dynasties_in_China.gif

More than 4,000 years later, the subjects of the newly unified Qin State (秦國, 206 B.C.) included the Taic-Daic peoples of Chu (楚國) and the Yue descendants who formed the southern Yue State (越國) and the vassal State of Wu (吳國) as recorded in the Chinese chronologies of the Spring Autumn  (770-476 B.C.) and  Warring States  (475-221 B.C.) periods. Prior to  the Western Zhou era of this period, the forementioned southern Yue polities had already existed in a tributary relationship.

After the Han faction supplanted the Qin State and consolidated power over the Middle Kingdom, including its southern territories, the newly established Han Empire required all subjects to adopt the official court language (X). That language, reportedly spoken by Liu Bang, the founding emperor of the Han Dynasty, was likely a Chu dialect. This was a Taic-Daic language reflecting his origins in the Chu realm. The Yue peoples, along with the ancestral Zhuang populations in the far south, adopted this language following the annexation of the Nam Viet Kingdom in 204 B.C. In this compound, 'Nam' (南) means 'south' and 'Viet' (越) means 'Yue' (Cantonese /jyut6/). In antiquity, these characters were pronounced similarly in Vietnamese of the Luo-Yue branch and in Cantonese of Eastern Yue. That phonological correspondence has been preserved in historical transcription.

Many of the shared Yue etyma can be traced to remote antiquity, when proto‑Tibetan and ancestral Yue languages came into contact with the later Viet‑Muong language, which itself has roots in the Taic linguistic family. Proto‑Yue languages were once widely spoken by the aboriginal peoples of a vast region in South China, and their domains extended into parts of northern China along the banks of the Yangtze River (揚子江, also known as 長江). These areas formed the natural habitats of ancient Taic indigenous peoples, who in time gave rise to the Yue people, known as "Bách Việt" (百越) or Bai Yue, a term possibly derived from the Tibetan "Bod". (華) 

In the Vietnamese case, successive north-south migrations across both geographic and historical scales displaced indigenous populations from their fertile lowland settlements into less arable, mountainous zones. Following the Qin-Han period, incoming settlers from southern China introduced their own languages, which gradually blended over the subsequent millennium with local Tai-Yue speech forms, namely Dai, Thái, Tày, Nùng, alongside other Viet-Mường dialects.

Prior to 939, when both Annam and Canton remained under the rule of the NamHan Kingdom (南漢帝國), their inhabitants appear to have been mutually intelligible, at least through a vernacular form of regional Mandarin. During this period, Annamese scholars actively participated in administrative affairs and literary production with the Tang imperial court. This is evidenced by the literary record and the development of a fully articulated Hán-Việt (漢越) or Sino-Vietnamese lexicon, presumably transmitted directly from Middle Chinese, particularly during the final 289 years of Tang rule.

Historical records indicate that large-scale migration from southern China into "Annam" occurred not only during the millennium of Chinese colonial administration (111 B.C.-939 A.D.), but also well into the modern era, continuing beyond 1949 and, notably, spilling over the 21st century when Chinese laborers are seen establishing Chinatown-style enclaves across the country.

Such ease of communicative transition is exceptional in this context when compared to other Mon-Khmer speakers, with the partial exception of later contact effects on neighboring Mường groups. These communities, having diverged from earlier Viet-Muong populations that resisted Han colonization, withdrew into remote highland zones where they coexisted with Mon-Khmer speakers. This interaction resulted in distancing lowland Yue-specific commonalities from Mon-Khmer lexicons, whose resemblances appear to have emerged only through later contact. Archaeological and historical evidence suggests that Mon-Khmer groups migrated into the Red River Delta approximately 6,000 years ago (Nguyễn Ngọc San 1993, p. 43).

Over time, the Annamese vernacular retained only a limited proportion of Yue elements.  The long process of Vietnam's national formation began with the biological composition of its early population, descended from the racially mixed Yue, LạcViệt, XiLuo, and OuLuo communities of the Nam Việt Kingdom. Consequently, it is unlikely that the ancestors of the Vietnamese remained genetically pure descendants of the original Yue tribes, even before the 1,004 years of Chinese rule that ended in 939 A.D.

The linguistic traits introduced by these immigrant populations, including tonal patterns and phonological features characteristic of Cantonese, Hainanese, Chaozhou, Amoy, and Hokkien, entered Annamese as integrated structural components, rather than as merely external overlays influencing tone or syllabic configuration. A comparable process of contact-driven change can be observed in the development of Cantonese.

From a linguistic perspective, the exchange of vocabulary between host and migrant communities rendered native Yue elements complementary to the expanding Sinitic domain, rather than replacing its structural foundations. This process is broadly comparable to the way Chinese lexical material was assimilated into the formation of new Japanese concepts.

Following independence, the Vietnamese population, then referred to as the Annamese, established a sovereign polity corresponding to present-day Vietnam ("越南" Yuè-Nán), literally 'the Yue of the South'. This interpretation stands in contrast to the mistaken view that "越" signifies 'advancing to the south', a misconception rooted in its semantic association with 'advance' or 'surpass'. In fact, ancient Chinese transcriptions of 'Việt' ("越") include variant graphs such as "戉", "粵", and "鉞", each denoting implements or weapons resembling an axe that associate with "Yue" as well. This distinction is significant, as it separates the early ethnonymic identity conceptualized from the territorial expansion that occurred long after the 10th century.

Over successive millennia, and through a sustained southward migration from a polity referred to as "Vănlang"—likely a transcription of the early sound '賓郎 Bīnláng' [← 'blau' = 'trầu', cf. "檳城 Bīnchéng" ('Bếnthành' ~ 'Penang') or 'betel']—located in what is now northern Vietnam, the later Vietnamese emerged as a composite population, hybrid in origin, incorporating Chamic and Mon-Khmer elements along the migratory corridor. Archaeological and anthropological evidence consistently supports this view, framing modern Vietnamese ethnogenesis as the result of stratified admixture rather than a linear descent from preexisting ethnic groups in either the north or the south.


Just as no population can claim to be ‘purely Chinese’, there is no entirely ‘pure’ Vietnamese lineage. Vietnam's history is marked by the fusion of Chinese settlers and Southern Yue communities, many hailing from what is now southern China. The very name ‘Việtnam’, translating as “Yue people of the South”, embodies this shared legacy. Unlike ethnic Chinese communities elsewhere in Southeast Asia, those in Vietnam integrate readily; within two generations at most, descendants born and raised on Vietnamese soil commonly self-identify as Kinh in census data. Through successive waves of southward settlement and integration with indigenous groups, these blended communities coalesced into the Kinh majority that defines contemporary Vietnam.


As previously noted, the early Annamese population emerged through centuries of intermixture between indigenous Yue groups and Han colonial settlers, culminating in the formation of the Kinh (京族, Jīngzú; VS "tộcKinh") being descendants of this complex hybridity. The enduring interaction between these communities became a recurring theme in nationalist discourse, particularly in the face of so many Han settlers who fled living disruptions accompanying dynastic transitions in China, from the fall of the Tang Dynasty in the 10th century to the rise of communist rule after 1949, and they permanently remained in the southern territories over time.

This demographic pattern has continued into the contemporary period. Reports indicate that since 1990, over one million mainland Chinese have established permanent residence in Vietnam, according to figures compiled from annual Chinese diaspora assemblies held in major Vietnamese cities (see factsanddetails.com).

This deep historical interconnectedness explains the shared etyma derived from a common ancient linguistic substrate, close enough that some have inferred Vietnamese etyma evolved from Cantonese. In fact, both languages share a substantial Middle Chinese inheritance from the Tang period, reinforced by large‑scale migrations during the 'An Lushan Rebellion' (755–763), which devastated the Central Plain and sent many northerners to the Lingnan region. That is why early communities of present‑day Cantonese speakers identified themselves as 'Tang people' (唐人, Tong4jan4), whereas the Vietnamese ethnonym "Việtnam" (越南), again, literally means 'the Yue people of the South'.



Figure 1.2: Map of the Yangtze River Basin
Source: http://en.wikipedia.org/wiki/File:Map_of_the_Yangtze_River.gif

Regarding the proto‑Vietic language, the split within the Viet‑Muong groups marked a decisive divide between indigenous people who resisted Han occupation of their ancestral land and those who submitted to and collaborated with Chinese colonizers. In a manner comparable to the evolution of Cantonese speech, early Sino‑Vietnamese forms were actively integrated into the ancient Vietic language, which over time developed into early Annamese. This process unfolded over centuries and culminated in the Middle Vietnamese period, particularly through the absorption of Tang‑era linguistic variants by the emerging "Kinh" elite. It involved the localization of Middle Chinese vocabulary and expressions, together with gradual, nuanced changes in phonology, syntax, and semantics.

The entire process likely began before and extended well beyond the fall of the Tang Dynasty (618–906). It entailed the adaptation and localization of Middle Chinese lexical stock during periods of colonization, aligning with the broader evolution of Chinese lexicography, a trajectory shaped by shifting patterns of phonological and semantic crystallization across the Han and Tang dynasties (Tang Lan, 1965, p. 110).

Having deeply shared the same historical background, the sound‑change patterns of Sino‑Vietnamese and Cantonese, both originating from Middle Chinese, appear to have followed similar phonological paradigms in literary contexts, for example, literature and scholarship, as well as in spoken forms. This parallel evolution persisted until at least the 10th century, after which the two languages diverged from their shared path. During their shared period, both made use of Middle Chinese as the lingua franca of the NamHan Kingdom. Over time, their respective vocabulary stocks either disappeared because of lexical redundancy in the form of doublets or stabilized into distinct forms, as seen in Sino‑Vietnamese on one hand and the so‑called 'Tang language', now commonly associated with Cantonese, on the other.

For general readers, Sinitic‑Vietnamese etyma often appear indistinguishable, not only to novices without formal linguistic training, but even to language educators, particularly in matters of sound change and etymological divergence between formal and colloquial registers. This observation is based on the author's survey of bilingual teachers in general subject areas, such as language arts and ESL, in U.S. schools. Many of these teachers candidly acknowledged that they had never noticed lexical correspondences between the two languages in literary lexicons. For example, none were aware that the SV "quốcgia" (國家 guójiā, 'nation') directly matches Cantonese "/gok7ga5/" ('nation'), let alone that its vernacular Vietnamese synonym "nướcnhà" carries the same meaning.

In the case of the latter, laypersons with some exposure to historical linguistics may recognize such correspondences when they are explained through regular patterns of sound change, yet they often resist the idea that "nướcnhà" shares a common root with "quốcgia". This resistance is partly rooted in a poetic interpretation of "nướcnhà" as a compound of "nước" ('water') and "nhà" ('home'), reflecting an idealized vision of Vietnam as a land of virtuous governance cherished by Confucian scholars who love compose Tang poems. Such a reading, however, implicitly denies by obstructing the view of the Chinese etymology of 水 (shuǐ, SV "thuỷ") and 家 (jiā, SV "gia"), as well as the compound 國家 (guójiā), despite the well‑known early 20th‑century classroom chant in village schools "gia/nhà, quốc/nước" from the primer Tam Thiên Tự Kinh ('Prime Book of 3000 Characters'), in which the latter pairing conveyed an abstract sense approximating 'country'.

While the poetic interpretation is semantically plausible, it obscures the phonological continuity linking "nướcnhà" to "quốcgia" and Cantonese "/gok7ga5/". Adding further complexity, the more recent form "nhànước", meaning 'ruling body of government', reverses the original syllabic‑morphemic order, creating an inversion that introduces yet another layer of morphological and semantic development.

Long after the NamHan Kingdom ceased to exist in 971, and despite Annam's separation from its control in 939, Cantonese and Sino-Vietnamese may still have retained notable phonological similarities inherited from late Tang-era speech. By that time, however, the two languages were already distinct, much as their divergence is evident today. A comparable situation is observed in the localized variant of Cantonese spoken in the Guangxi Autonomous Region, known as Baihua (白話) .

This transformation resulted from layered ethnic blending with migrants from northern regions of the Tang Empire. Southern China, especially the Guangzhou prefecture, experienced major influxes of settlers due to upheavals such as the An Lushan Rebellion during the reign of Emperor Tang Ming Huang. Widespread famine further altered the region's demographic balance. The conflict led to mass displacement and mortality, as documented by Bo Yang (1982–1992, Vol. 49).

Meanwhile, Cantonese speech underwent repeated phases of transformation shaped by surrounding sociohistorical forces. Until the tenth century, it is plausible that Cantonese speakers in Guangzhou and Annamese speakers in Tonkin could still communicate using Sinicized speech forms, such as Yue's Baihua, as previously noted in accounts of interaction between the peoples of Guangdong and Guangxi, not mentioning the common share of basic lexical stock. For instance, within the aboriginal Yue substratum, several foundational etyma shared by Cantonese and Vietnamese—such as "lưỡi" 脷 (tongue) [Cant.: /lej6/], "bông" 花 (flower) [Cant.: /fa1/], "biếu" 畀 (give) [Cant.: /pej3/], "khui" 開 (open) [Cant.: /hoj5/], "xơi" 食 (eat) [Cant.: /sik8/], "uống" 飲 (drink) [Cant.: /jam3/], "thấy" 睇 (see) [Cant.: /taj3/], "đéo" 屌 (curse) [Cant.: /tjew3/], and "ỉa" 屙 (defecate) [Cant.: /o5/]—represent only a small portion of the indigenous Yue layer that persists across both languages.

Similarly, while Cantonese retains the Middle Chinese‑derived pronunciation of 走 as "zow3" meaning 'go', the Sino‑Vietnamese term "tẩu" (təw3) has shifted in modern usage to "chạy", denoting 'run'. This latter sense aligns with Mandarin "qù" and Cantonese "hoeỉ3/hoeỉ2", and is etymologically linked to 去 qù (SV "khứ"). The connection extends to a set of Sino‑Vietnamese doublets such as "khu", "khử", and "khứ", along with variants including "khừ", "khự", "khử", "khứa", and "đi", as well as the Hanoi sub‑dialect form /xɨ5/.  In the Han‑period stage of Ancient Chinese, these terms shared a unified core meaning. Over time, however, their semantic range broadened to encompass related notions such as 'eliminate', 'get rid of', and 'cut off'.

Additional etyma likely reflect remnants of the Taic-Yue substratum found in both Vietnamese and Cantonese. These languages emerged from distinct Tai-Kadai branches long before their speakers were unified under the NamViệt Kingdom in 204 BCE. For example, the term 雞公 (jīgōng) for 'rooster' corresponds to Vietnamese "gàtrống" and to archaic Cantonese /kaj5koŋʷ1/. This rare but compelling correspondence offers evidence of a shared Yue linguistic affiliation at a substratal level, suggesting that both forms derived from the same source prior to Sinicization.

The modern grammatical pattern in which an adjective precedes the noun it modifies, as in Mandarin gōngjī (公雞), syntactically equivalent to the English 'male bird', reflects Sinitic elements that developed atop an aboriginal Yue substratum. In Vietnamese, the corresponding term is "gàcồ", which follows the [noun + modifier] order. It is likely that in earlier stages of language development — when both systems were still in the formative phase of polysyllabicity during the late Ancient or Early Middle Chinese period — the two languages shared a much greater degree of structural similarity, particularly in the official court languages beginning in the Han colonial era.

As disyllabic words became more common, along with synonymous constructions, Sinitic speakers began to differentiate homophones by placing modifiers before the main morphemic syllable to create new polysyllabic words. Vietnamese, in contrast, retained a Yue speech habit that tended to reverse this order, placing the noun before the modifier.

From a linguistic standpoint, Vietnamese speakers across social backgrounds can readily acquire the Cantonese dialect or approximate its pronunciation, reflecting a degree of accessibility between the two languages. The prominent Sinitic elements in Cantonese are broadly similar to those found in Sino-Vietnamese lexemes, as both developed through sustained contact with northern migrants from successive Chinese dynasties over the past two millennia. Despite this influence, Cantonese speakers and their language remain distinct—not only from Vietnamese, but also from other regional varieties within China.

Over time, the ethnic composition of Cantonese communities became increasingly mixed due to the influx of migrants from northern regions of the Great Tang Empire. This process continued as mainland China came under the control of various northern rulers, including the Jurchens of the Northern Song, the Tartars of the Liao, the Mongols, and the Manchurians. Notably, the Cantonese population was primarily composed of descendants who identified as Tang subjects, particularly from the seventh century onward. These descendants are recognized as the Hoa ethnic group in Vietnam and other parts of Southeast Asia.

In contemporary usage, Vietnamese and Cantonese no longer exhibit the semantic and syntactic parallels they once shared. For example, the modern Vietnamese term gàtrống contrasts with its earlier Cantonese counterpart "gung1gai1" (公雞), a divergence that reflects historical shifts in linguistic affinity.

These differences are further shaped by varying degrees of Chinese influence. The impact of Han Chinese, both prior to 111 BCE and during the Middle Chinese period beginning in the seventh century, left enduring phonological and semantic imprints. For example, Vietnamese continues to use "đôiđũa", a term cognate with Han Chinese zhúzi (箸子) for "chopsticks". In contrast, Cantonese, like Mandarin, avoids using the term 箸, as its phonetic resemblance to "đổ" 倒 (dǎo, SV "đảo"), meaning 'capsize' that carries negative connotations. Instead, it favors kuàizi (筷子) or faai3zi2 where 筷 is homophonous with 快 (kuài, VS "mau"), meaning 'fast', a term associated with auspiciousness in southern Chinese culture, particularly in regions where boat travel was historically common.

Although Cantonese preserves ancestral Yue substratal elements like Vietnamese, it is still classified within the Sino‑Tibetan language family. This classification is grounded primarily in its substantial Middle Chinese lexical stratum, which outweighs the influence of ancient Yue etyma. Throughout its history, Cantonese has remained firmly within the Sinosphere, with a continuous lineage as a living language traceable at least to its historical presence during the era of Zhao Tuo of NamViệt Kingdom, later reinforced by waves of immigrants during the flourishing of the Tang Empire. It is therefore unsurprising that Cantonese has been informally referred to as "the Tang language" (唐話, tong4waa6‑2).

The placement of Cantonese in the Sino‑Tibetan family is well‑founded, shaped by both quantitative and qualitative considerations as a result. As noted earlier, except for its share with a limited number of Sinitic‑Vietnamese fundamental lexemes, the core vocabulary of both Cantonese and Sino‑Vietnamese derives from the same Middle Chinese source is substantial. This common origin reinforces Cantonese's inclusion in the Sino‑Tibetan framework and, by extension, invites a reassessment of whether Vietnamese might also be situated within this classification.

The present task, then, is to advance comparative analyses that assess the position of Sino‑Vietnamese and Cantonese in the broader context of Middle Chinese historical linguistics.  Anthropologically, in considering the Yue‑before‑Sinitic substratum, both Zhuang and Vietnamese traditions suggest that the Vietnamese (越, Việt) and Cantonese (粵, Jyut) peoples may have descended from distinct branches of the Yue (戉) prior to the second century BCE (cf. Truyệncổ Dòng BáchViệt - dchph, on the legend of the magic sword Thần cung Bảo kiếm). The earlier Jyut-speaking communities, associated with Báihuà (白話), were likely of Zhuang (壯族) origin, expanding from Guangdong (廣東) into what is now Guangxi (廣西). The correspondence between these two toponyms reinforces the linkage between TâyÂu (西甌 Xī’Ōu) and ĐôngÂu (東甌 Dōng'Ōu), wherein the phonological parallel of 壯 (OC /ʔsraŋs/) and 廣 (OC /kʷaːŋʔ/) reflects a pattern of regional continuity. The Zhuang self‑designation /Bố‑/ stands in contrast to the /Bod/ ethnonym discussed earlier.

This distribution of BaiYue tribes encompassed the region historically known as the Southern Mountainous Range (嶺南道 Lingnan Dao). Notably, a lexical chain links terms such as Bốchuang, Bốthổ, Bốỷ, Bốbản, and Bốviệt with the etymon Bod, which is cognate with BaiYue, BáViệt, and BáchViệt—names once used to designate indigenous populations.

III) Linguistic evolution through dynasties

This section investigates Sinitic-Vietnamese terminology whose etyma are traceable to Old Chinese, a historical branch of the Sino-Tibetan family. It also explores foundational Vietnamese cognates attested across Sino-Tibetan languages that appear to descend from the ancient Taic-Yue linguistic complex—a substrate that flourished throughout China South long before the emergence of Chinese civilization.

Sinitic-Vietnamese development proceeded through successive dynasties:

  • Han period: Initial Old Chinese loans, particularly in governance, military, and agriculture.

  • Tang period: Enrichment from high‑register Middle Chinese, reinforcing tonal and phonological complexity.

  • Post‑Tang independence: ChữNho retained as the prestige written medium; chữNôm created for vernacular literature.

  • Colonial to modern: Romanized Quốcngữ script codified all registers, that is, literary Sino-Vietnamese, vernacular Sinitic Vietnamese, and indigenous vocabulary, into a unified orthography.

To fully appreciate the argument presented above within a historical timeline, we must examine both the prehistoric and historical periods in China and Vietnam, a perspective that the Austroasiatic theory overlooks. A historical review of Yue entities is essential for understanding that modern Vietnamese emerged as a very late product; moving beyond a strictly Mon-Khmer framework reveals many fundamental Vietnamese words with Sino-Tibetan etymologies, thus reviving the former Sino-Tibetan theory that began to emerge in the late 19th century but has yet to be fully realized in the 21st century.

First, let us establish a historical picture of the prehistoric era, approximately 5000 years BP, when the indigenous Yue, that is, the Taic or proto-Yue, terms used before these groups were later designated as Yue (越, 粵, 戉, 鉞, etc.) in Chinese history, inhabited southern China. This period predates the arrival of itinerant proto-Tibetan nomads in search of fertile lands. Later proto-Chinese resettlers, who were formidable warriors conquering on horseback, colonized and subjugated the indigenous vassal states across the fertile mainland. Over time, successive dynasties, including the Xia (夏), Yin (殷, SV "Ân"), Shang (商), and Zhou (周), brought under their control indigenous states such as Qin (秦 SV "Tần"), Chu (楚), Yue (越), Wu (吳), Yan (燕), and Qi (齊), with all these entities eventually subjugated by the Zhou kings (1045 B.C.–256 B.C.). By the end of the Eastern Zhou period, in 221 B.C., the Qin state had conquered its remaining opponents, forging the first unified Middle Kingdom, later known as China. Etymologically, the term 'China' derives from variants such as 'Cin' and 'Chine', which in turn originate from 'Qin' (秦).

The brief Qin Dynasty (秦朝, 221–207 B.C.) was succeeded by the Han Dynasty (漢朝), founded by Monarch Liu Bang (劉邦), known as Han Gaozu, who emerged victorious in the final battle against the resurgent Chu State in 206 B.C. to claim the imperial crown of the nascent Flowery Empire. In the meanwhile, in the war-torn southern region, Triệu Đà (趙佗 Zhào Tuó), formerly a Qin general and viceroy, gathered breakaway Yue colonies from southern China and established the NamViet Kingdom (南越 王國, "NamViệt Vươngquốc" in 204 B.C., a polity that endured for 93 years (see Keith Weller Taylor, The Birth of Vietnam [1983] as quoted by Bùi Khánh-Thế in APPENDIX I)

Figure 1.3: Map of the Qin State

The emergence of Vietnamese statehood can be traced to the period following 111 B.C., when the Han Dynasty annexed the  NamViet Kingdom. The region that would later be known as "Annam", a name derived from the Tang-era administrative unit 'Protectorate of the Pacified South' (安南 都護府, SV "Annam Đôhộphủ"), was subsequently absorbed into the Chinese empire and governed by successive dynasties for nearly a millennium. This imperial control persisted until the early 10th century, when the collapse of the Tang Dynasty in 906 A.D. fractured the empire into nine independent states. Amid the ensuing fragmentation, the people of  Annam broke free from the disintegrating NamHán Kingdom (南漢 帝國) and established an independent polity in 939 A.D. (Bo Yang, Sima Guang Zizhi Tongjian, Vol. 69, p. 209, 1993).

Following independence, the former Annam territory was renamed ĐạiViệt (大越) in 1054 and later Việtnam (越南) in 1804. Vietnam stands apart as the only state founded by early descendants of the proto-Yue peoples, ancestral to the later Sinicized Yue populations across southern China. In contrast, other Yue groups in the region, now identified as the Cantonese in Guangdong, the Wu in Jiangsu, the MinNan in Fujian, the Zhuang in Guangxi, the Gang in Jiangxi, and various ethnic communities throughout Yunnan, Guizhou, and neighboring provinces, all was gradually absorbed into the Chinese imperial structure over successive dynastic periods.

The ancestral subjects of the ancient NamViet Kingdom who settled in Annam actively participated in the struggle for independence and endured over a millennium of Chinese domination and successive invasions. Despite repeated subjugation by every Chinese monarch, and later by heads of state from a rising empire that has long exerted influence over the region, from Mao Zedong and Deng Xiaoping to  Jiang Zemin and the current, indefinite-term General Secretary-President Xi Jinping, Vietnam retained her sovereignty. This history underscores the enduring impact of Chinese political dynamics on territories south of the border.

As a matter of fact, while the Middle Kingdom often succeeded in suppressing internal uprisings, it developed a recurring tendency to lose wars to foreign invaders. Among these were the Jurchens (女真), Mongols, and Manchurians, who each went on to establish ruling dynasties in China, the Liao (寮),  the Jin (金), Yuan  (元) , and Qing (青) dynasties, respectively.

Changes in dynasties within the Middle Kingdom have led the outside world to recognize the region under one common name, China. In discussions of "Sinicization," the transformative power of Chinese heritage and culture is inescapable, as it has long absorbed foreign elements and made them integral to its identity. For example, the official court language, Mandarin (官話), was adopted by various regimes of northern origin, including the Liao, Jin, Yuan, and Qing dynasties, all of which were led by Tartar or Turkish-derived elites. Linguistically, Mandarin absorbed numerous foreign influences: its original eight-tone system was reduced to four tones under the impact of non-tonal Altaic languages, and final consonants such as /-p/, /-t/, and /-k/ disappeared, changes that departed markedly from its ancestral Middle Chinese characteristics. Despite these shifts, Mandarin evolved into Putonghua, today's national language of China, reflecting its adoption and adaptation by predominantly northern rulers.

After Qin unification, Sinitic elements circulated back into major Yue lects. Wu, Min (Hokkien or Fukienese), Cantonese, and Vietnamese progressively absorbed these features, layering them over an older Yue foundation and producing highly Sinicized Yue speeches (cf. Comparative Sino-Tibetan Etymologies.) This cyclical traffic helps explain Vietnamese cognates aligned with Sino‑Tibetan fundamental etyma. Like other Yue lects, the ancient Vietic language participated in shaping the Sinitic subfamily until Annamese diverged following political independence in the 10th century.

Over subsequent centuries, Yue roots embedded in Old and Ancient Chinese resurfaced across Sinitic languages in repackaged forms. Alongside broadly comparable tonal systems (from roughly three to ten tones), many items vary only subtly in regional articulation. The pattern is especially clear in lexical doublets, words tracing to the same ancestral root, notably from proto‑Taic, Taic, and Tai‑Kadai. For example, Vietnamese "gạo" aligns with Chinese "dào" 稻 ('rice'), and analogous correspondences appear for animals such as elephant, whale, fox, and rhinoceros (see APPENDIX G: Tsu-lin Mei, The case of "ngà").

A frequently cited illustration is the set of twelve animals in the well-known Chinese zodiac, many of which were borrowed and repurposed in a range of southern Chinese minority languages. The sole exception is the 'hare' 兔 (tù), an auspicious creature in both Chinese and Altaic traditions, rendered in Vietnamese as  "thỏ" . The other eleven zodiac animal names in modern Vietnamese trace their origins to shared indigenous sources, with cognates attested among diverse ethnolinguistic groups of the China South.

Historical sources record that lexical material from both aboriginal Yue and proto‑Chinese merged into a shared diplomatic koine known as Yǎyǔ  (雅語, 'elegant speech'), employed among pre‑imperial polities, as noted in early Chinese annals. This lingua franca likely originated in Taic, the speech of the subjects of the Chu State (楚國) during the Spring and Autumn Period (春秋時代, 771 B.C.–403 B.C.). From this base, Taic developed into the modern Daic–Kadai languages spoken today by the Dai, Thai, and related peoples such as Laotians of Laos, Tày in Vietnam. Yue, as a descendant subbranch of Taic, likewise constitutes a primary substrate in the ancestral Vietnamese lexicon.

An early stage of Vietnamese,  historically referred to as ' ancient Annamese' , began to take shape with the introduction of Old Chinese elements during the Western Han period (206 B.C.–24 A.D.), brought into the Annamese territories under Han colonial administration. These Ancient Chinese influences continued to evolve across subsequent dynasties. By the time Annam achieved sovereignty in 939 A.D., Chinese characters known locally as chữNho (儒字), or Classical Chinese ( 文言文  wényánwén ), remained the official medium of administration and scholarship. The Vietnamese language in the form recognized today, however, did not fully crystallize until the 12th century (Nguyễn Ngọc San, 1993, p. 5).

From the 15th century onward, vernacular literary works began to appear in chữNôm (𡨸喃) (字), a modified script derived from Chinese characters. In the 18th century, confronted with the complexity of these Vietnamized character systems, Western missionaries devised a Romanized orthography for Vietnamese. This Latin‑based script gained wide currency in the early 20th century owing to its relative simplicity, though it was not officially adopted until 1945. By then, the national script known as Quốcngữ had already received active promotion by the French colonial government as a means of reducing Chinese cultural influence in Annam.

In practice, the new Romanized script functioned chiefly as a transcription system for both Vietnamese and Hán and chữ Nôm ('pure Vietnamese') vocabulary. It encompassed the full range of Sinitic‑Vietnamese  and Sino‑Vietnamese  lexicons, integrating them seamlessly into Romanized spelling. By contrast, French borrowings contributed fewer than one thousand low‑frequency items to the modern language. (APPENDIX A-V Polysyllabic Vietnamized English and French words)

In an article published in Tập san Khoa học, Trường Đại học Khoa học Xã hội & Nhân văn, National University of Hồ Chí Minh City, issue 38 (2007, pp. 3–10), Prof. Bùi Khánh‑Thế examines the interaction and interchange of Chinese in Vietnam's linguistic history. Citing his own mentors, including Nguyễn Tài Cẩn (1998), he condenses key points in the summary table reproduced below.

Table 1.3 Division of Historical Periods in the Development of the Vietnamese language

A Proto-Vietnamese 2 languages in use: Ancient Chinese (a vernacular Mandarin spoken by the ruling class) and Vietnamese;
1 Chinese writing script
the 8th and 9th centuries
B Archaic Vietnamese 2 languages in use: Ancient Chinese and Archaic Vietnamese (spoken by the ruling class);
1 Chinese writing script
the 10th, 11th, and 12th centuries
C Ancient Vietnamese 2 languages in use: Ancient Vietnamese and Classical Chinese;
2 Chinese and Chinese-based Nôm scripts
the 13th, 14th, 15th, and 16th centuries
D Middle Vietnamese 2 languages in use: Middle Vietnamese and Classical Written Chinese;
3 Chinese writing scripts: Chinese and Nôm scripts, and National Romanized Quốcngữ writing system
the 17th, 18th, and the first 1/2 of the 19th centuries
E Early contemporary Vietnamese 3 languages in use: French, Vietnamese and Classical Written Chinese;
4 writing scripts: French, Chinese, Nôm, National Romanized Quốcngữ writing systems
during the rule of the French colonial government
F Modern Vietnamese 1 language in use: Vietnamese;
1 National Romanized Quốcngữ writing system
From 1945 until present

Based on the formation of the Hán-Việt pronunciation of the Middle Chinese, Annam Dịchngữ (安南譯語 'Translated Annamese Words') and the Annamese-Latin-Portugese Dictionary by Alexandre de Rhode (1651), H. Maspero devised similar division of 5 development periods:

A) Proto-Việt (prior to the 9th century)
B) Archaic Vietnamese: the 10th century (formation of the Hán-Việt)
C) Ancient Vietnamese: the 15th century (Annam Dịchngữ)
D) Middle Vietnamese: the 17th century (Dictionary by A. de Rhôde 1651)
E) Contemporary Vietnamese (19th century)

Source: Table 1 by Nguyễn Tài Cẩn (1998, p. 8) quoted by Bùi Khánh-Thế. (See Appendix I)

This work advances the thesis that core Chinese and Vietnamese vocabulary shares Yue etyma, called "Việt" (越, Yuè) in Vietnamese and "Jyut6" (粵, Yuè) in Cantonese, layered atop a Sino-Tibetan stratum. The classical literary language of later periods incorporated many native items cataloged under Yǎyǔ (雅語) (De Lacouperie 1887). That diplomatic koine provided a matrix from which Old Chinese, Ancient Chinese, and Middle Chinese took shape.

Table 1.4. HISTORY IN A NUTSHELL

Archaeological evidence and historical records show that the region of modern southern China, located below the Yangtze River (揚子江), was originally home to the ancient Yue aborigines. During the Zhou Dynasty (1045 B.C.–256 B.C.), and especially toward the end of the late Eastern Zhou period (culminating in 221 B.C.), these indigenous peoples formed the bulk of the population in the seven states that would later fall to the Qin Dynasty. The Qin, emerging as the strongest state, unified these territories under the banner of the Middle Kingdom (中國).

After their conquest, the Taic-Yue natives were incorporated first into the Qin Empire (秦朝, 221 B.C.–207 B.C.) and subsequently into the Han dynasties. Over time, many of these peoples came to identify as "Han" (漢人), a name derived from the Han Dynasty (漢朝) founded by Liu Bang (劉邦), who himself had once been a subject of Chu (楚國人). Successive Han rulers continued to displace the independent Yue groups in southern China, driving them further south.

In the land later known as Annam, ruled for a significant period by the Han, the distinction between the original Yue and the later Han immigrants gradually diminished. Waves of Chinese settlers fleeing the recurring dynastic upheavals in northern China blended with the indigenous inhabitants, effectively erasing clear-cut ethnic boundaries. 

This historical layering survives today in Vietnam, the sole state emerging from the ruins of ancient cultures such as  Chu 楚 (Sở), Shu (蜀 Thục), Yue (粵 Việt), NanYue (南越 NamViệt), Dali (大理 Đạilý), and Nanzhao (南詔 Namchiếu). The Vietnamese (the people of Việtnam) represent the enduring legacy of the Southern Yue. Ironically, the same expansionist processes that once characterized Chinese history were mirrored later by the Vietnamese. After achieving sovereignty, Vietnam expanded its territory further south, culminating in the downfall of the Kingdom of Champa and the annexation of parts of the eastern flank of the old Khmer Empire.

In many respects, the historical trajectories of Vietnam and China were deeply entwined until Annam secured independence from Chinese rule. Vietnam's own written historiography did not cohere until well after the 10th century; before then, accounts of its past were drawn chiefly from Chinese chronicles, often without corroboration from alternative sources. The same axiom applies in linguistics: any comprehensive treatment of Vietnamese or Chinese remains incomplete without the other, especially in discussions of Old Chinese, Sinitic-Vietnamese etyma, and shared structural peculiarities (see Wang Li, 1957).

For over two centuries prior to 939 A.D., ancient Vietnam functioned as a Chinese prefecture known as the Annam Protectorate (679–860, 863–906), a historical condition that accounts for the extensive presence of Middle Chinese loanwords in Vietnamese. The final phase of influence, following the collapse of the NamHan State during the post-Tang period, proved especially consequential: it disseminated elite court vocabulary into broader usage—much like the incorporation of Latin and Greek terms into English—and reinforced a Middle Chinese lexical substratum within Vietnamese. This substratum contributed to Vietnamese's resemblance to Cantonese, particularly through retention of the full eight-tone system, including the eighth tone, "thanhnhập" 入聲 ('Rusheng', or 'Entering Tone').

Contrary to common belief, Vietnamese aligns more closely with Mandarin—a court language—in its colloquial uptake of northern vernacular elements than with Cantonese, which reflects a Tang-era literary register. Among Vietnamese's distinguishing phonological traits are finals such as /‑owŋ/, which contribute to its unique acoustic profile and tonal architecture. The scope and transmission routes of Mandarin influence will be addressed in detail in subsequent chapters.

Both literary and colloquial forms derived from Tang-period speech were thoroughly integrated into Annamese (a term used here to avoid the retrospective label 'Vietnamese', paralleling the terminological ambiguity surrounding 'Chinese')(H). These forms circulated widely across social domains, not only among the literati but also within the general populace. This widespread adoption explains why Vietnamese speech often bears Mandarin-like expression and cadence.

This historical reality also accounts for the persistence of systematic Hán‑Việt (Sino‑Vietnamese) variants and the extensive Middle Chinese lexical substratum long after Vietnam's political independence in the 10th century. These elements, phonological, lexical, and syntactic, contributed to the formation of Ancient Vietnamese, became foundational to Middle Vietnamese, and remain integral to the modern language as it is known today. Their presence also explains why Vietnamese, despite its Yue-Taic substrate, retains structural affinities with Cantonese, particularly in tone contour and compound formation. (差)

Sinitic influence is not the whole story, though. Older Yue elements lie beneath the heavy Sinitic overlay, and many indigenous Taic-Yue words have been misidentified as Chinese, a pattern mirrored in Vietnam, where such items are paradoxically labeled 'thuầnViệt' or 'pure Vietnamese'. Vietnamese thus preserves Yue-descended survivals whose archaic features are realized in distinctively Vietnamese ways; Chinese-layered variants can act, in effect, as tonal modulators for toneless items in several other Sino-Tibetan languages. While Yue-origin words were often masked as Chinese, Taic-Yue terms that moved into Sinitic languages were simultaneously preserved within the ancient proto-Vietic layer. Across these eras, Sinitic-Vietnamese interacted with Yue and Taic speech habits, producing unique word order patterns (e.g., noun+modifier as in "gàcồ" vs. Mandarin 公雞 gōngjī).

This distinction underpins the claim that the 'Yue' people predates the arrival of early Sino-Tibetan speakers, the forebears of the Chinese, in China South. Fundamental cognates shared among Taic-Yue, Chinese, and Vietnamese etyma across many Sino-Tibetan etymologies will be treated in Chapter 10. 

From an anthropological perspective, the Taic peoples preceded the Yue, followed by the Dai, who at one point held dominion over the Chu State. Within this Chu cultural sphere, Liu Bang rose as a subject of Chu and ultimately founded the Han Dynasty. His ascent is linked to his appointment as viceroy of the Hanzhong region, situated in present-day southern Shaanxi, where Chu forces had earlier triumphed over the Qin.

Technically, Sino-Vietnamese and Sinitic-Vietnamese are distinct lexical classes; the latter comprises multiple layers of doublets superimposed on the former, driven by vernacular Mandarin forms that spread from at least the Han in the 2nd century B.C. through the Ming in the 15th century.

The persistence of fixed expressions in Vietnamese that align with those found in modern Mandarin suggests that Early Mandarin may have functioned as a concurrent spoken language among mandarins for official purposes, as evidenced by Prof. Nguyễn Tài Cẩn's analysis in Table 1.1 above. (W). Such uses would have included imperial decrees, legal documentation, and reports to the Tang imperial court in Chang'an (長安, SV "Tràngan")now Xi’an City (西安市.) (安)   As a protectorate throughout the Tang Dynasty, old Annam contributed to the imperial court through administrative internship, scholarship, and artisanship—channels that introduced higher-register Middle Chinese vocabulary, the same that circulated in Cantonese, into Annamese during the Tang period (618–906).

From the Tang era until its gradual decline toward the end of the 19th century, Classical Chinese style (文言文) was extensively employed in Vietnamese letters. Its dense, allusive register shaped Tang-style verse and Vietnamese literary prose alike, until Romanized Quốcngữ ushered in a shift toward a more colloquial written style (see Nguyễn Thị Chân-Quỳnh, 1995).

As ancient Vietnamese transitioned into late Middle Vietnamese, the emergence of new function words became essential for constructing sentences that increasingly mirrored French syntactic patterns, particularly by the early 20th century. Lexically, a stratum of Sino-Vietnamese items, likely rooted in Tang-era vernacular, was retained in a markedly Sinitic register, comparable in style to spoken Cantonese. By the 16th century, numerous Middle Chinese lexemes had evolved into Sinitic-Vietnamese function words ('虛辭'), and these lexemes became indispensable in Vietnamese vocabulary, serving grammatical roles analogous to English particles and prepositions such as 's', 'of', 'although', 'not', 'in', 'at', 'from', 'hence', 'herewith', 'albeit', and others. 

These elements became syntactically necessarily for managing non-inflectional grammar in both Vietnamese and Chinese, facilitating syntactic cohesion without morphological variation (cf. Nguyen Ngoc San, 1993, pp. 138–142). More broadly, these items belong to a set of Chinese-origin vocabulary systematically localized through pronunciation rooted in a variety of Middle Chinese, plausibly related to an ancient Shaanxi dialect. 

Through successive periods of contact, the Han and Tang lexicons introduced successive waves of new vocabulary into the Sinitic-Vietnamese layer. This pattern is comparable to the influence of Middle Chinese on Cantonese, much as, in an earlier era, Qin-Han Old Chinese shaped Southern Min (NanMin) varieties such as Hokkien, Amoy (Xiamen or 'Hạmôn'), Hainanese, and Chaozhou (Teochew).

From a typological perspective, when major southern Chinese lects are strongly marked by Sinitic features within the Sino-Tibetan family, classification is determined by dominant attributes. The situation parallels other hybrid outcomes: Latin-influenced French versus Anglo-Saxon-dominant English; Australian English versus Indian English; Bulgarian and Afrikaans in relation to Dutch; Latin-French in contrast to Gaulish; or Haitian French in contrast to Moroccan French.


Table 1.5: The Case Of Afrikaans

Afrikaans, also known as Cape Dutch, is one of the eleven official languages of South Africa. It originated in the 17th century from the Zuid-Holland (South Holland) dialect used by Dutch settlers in South Africa during this period. The language was spoken by Dutch, French, German settlers, as well as by their enslaved people. From the 18th century onward, Afrikaans gradually developed distinct linguistic features.

Afrikaans borrowed vocabulary from English, German, and French, reflecting the cultural and linguistic backgrounds of European settlers in South Africa. It also incorporated words from indigenous African languages. Its grammar underwent simplification, such as the omission of verb endings that indicate tense. Phonetically, changes included simplifying the Dutch "sch" sound to "sk" (e.g., the Dutch word "schoen" became "skoen," meaning "shoe").

Until the mid-19th century, Afrikaans was primarily a spoken language, with Standard Dutch being used for writing. Later, a movement emerged to promote Afrikaans as a literary language. The language gradually found its way into journalism, schools, and churches. In 1925, Afrikaans officially replaced Standard Dutch.

Today, Afrikaans is predominantly used in South Africa and Namibia, with lesser usage in Botswana, Zambia, and Zimbabwe. Estimates from 2020 suggest that the number of Afrikaans speakers ranges between 15 and 23 million. Most linguists classify Afrikaans as a creole language.

It is estimated that approximately 90%-95% of Afrikaans vocabulary originates from Dutch, with additional words borrowed from other languages, including German and South Africa's Khoisan languages. Distinctions from Dutch include more analytic morphology and grammar, as well as certain phonetic differences. The written forms of Afrikaans and Dutch maintain a high degree of mutual intelligibility. In May 2022, Afrikaans was officially recognized as an indigenous language of South Africa.


IV) Comparative and etymological challenges

Comparative Sino‑Tibetan etymologies suggest that the diachronic evolution of modern Vietnamese mirrors a historical trajectory in which early Southern Yue populations established autonomous polities throughout China South prior to the consolidation of Han imperial authority. It is not within the purview of Sino-Tibetan linguistics to classify Vietnamese as a member of the Sino-Tibetan family, whether by subsuming it under the Sinitic branch or by drawing analogies to Cantonese or any other Chinese lects even though characteristically Vietnamese is the one on par with it. Such a classification demands a broader base of etymological evidence and a more rigorous linguistic framework.

Lexical recycling has persisted into the modern era, evident in the transregional circulation of terms such as "cộnghoà" 共和 (gònghé, 'republic') and "dânchủ" 民主 (mínzhǔ, 'democratic'). These items originated as Japanese neologisms constructed from Chinese morphemes, were subsequently re-borrowed into Chinese, and eventually permeated Vietnamese usage. Their trajectory exemplifies the ongoing exchange of linguistic material across Sinitic, Japonic, and Vietic domains.

In parallel, the examples below illustrate the process of localization, whereby Sino-Vietnamese lexemes undergo phonological and semantic nativization to become Sinitic-Vietnamese. In such cases, original senses are not always preserved—a phenomenon more prevalent in Japanese Kanji than in Sino-Vietnamese. For instance, "lịchsự" (polite) derives from 歴事 lìshì (originally 'experience'), and "tửtế" (kind) from 仔細 zǐxì ('meticulously').

Interestingly, contrary to modern belief, Vietnamese is best understood as a Sinitic-dominant language in a way that Japanese or Korean is not. It inherits a rich Middle Chinese lexicon, and many items classified as Sino-Vietnamese overlap with Sinitic-Vietnamese due to their integration into everyday speech alongside native vocabulary. For example, the etymon 順 (shùn, SV "thuận") exhibits context-dependent variation: 順利 (shùnlì, VS "suônsẻ"), 孝順 (xiàoshùn, VS "hiếuthảo"), 順便 (shùnbiàn, VS "sẵntiện"), 逆順 (níshùn, VS "ngượcxuôi"), among others.

Table 1.6: A case study of Sinitic-Vietnamese neologism formed with Chinese lexemes


The Vietnamese term 'côngcuộc'—now familiar in modern discourse as a formal compound meaning 'cause', 'process', or 'undertaking'—is a persistent source of lexical confusion and scholarly intrigue. While often misinterpreted as a Sino-Vietnamese compound mapping straight onto Chinese 公局 or 工局 (Mandarin gōngjú 'public bureau', 'work office'), its correct etymological genesis instead lies in 工作 (gōngzuò, 'task', 'work'), with the element 'cuộc' emerging not from 局 (jú) but from 作 (zuò). The fact that 'cuộc' in Vietnamese phonologically and semantically diverges from both its Sino-Vietnamese dictionary reading (tác) and its expected Mandarin reflex (zuò) reflects a network of historical sound change, sandhi assimilation, and semantic-phonetic association—processes that collectively illuminate the complex history of Chinese lexical influence in Vietnam.

The Vietnamese word 'côngcuộc' functions in modern written and spoken Vietnamese to denote a significant collective undertaking—'project', 'cause', 'the course of'—especially in governmental or historical phrasing (e.g., "côngcuộc khángchiến" 'resistance war', "côngcuộc đổimới" 'the undertaking of renovation/reform'). It is a compound of 'công' (from 工 'work; labor') and 'cuộc'.

The confusion with 公局 or 工局 is understandable, as both 公 and 工 read 'công' in Sino-Vietnamese, and 局 (SV: cục) is a common bound morpheme for official entities. However, 'côngcuộc' is a modern compound built on the model of Chinese 工作 (gōngzuò), but adapted phonetically and semantically within the Vietnamese system. While 'công tác' is the canonical Sino-Vietnamese reading for 工作, 'côngcuộc' emerged as a neologism where 'cuộc' operates as a native or nativized reflex of 作, rather than 局.

The emergence of Sino-Vietnamese compounds such as 'côngcuộc' reflects longstanding processes of borrowing and semantic adaptation widespread across the Sinosphere, i.e., Japan, Korea, and Vietnam, collectively referred to as the 'Sino-Xenic' realm. In these contexts, new words for modern concepts were often coined using Chinese morphemes and then mapped phonologically into the target language in a regularized, but sometimes innovative, fashion.

Middle Chinese, as represented in rime dictionaries such as Qieyun (7th century), had a richly articulated syllable template. For the character 作 (Mandarin: zuò), used in 工作 (gōngzuò), the reconstructed MC pronunciation is commonly given as */tsak/ or /tsak-s/, with the following features: initial: ts- (voiceless alveolar affricate), vowel and medial: /a/ as nucleus, sometimes with a palatal medial in some dialects, final: -k (voiceless velar stop), a classic 'entering tone' coda., and one: entering (rusheng), which has phonological and tonal correlates in Sino-Vietnamese readings for 作 are systematically 'tác', tracing the regular sound correspondences established for Chinese readings in Vietnamese. Key observations:
  1. The initial [ts-] to [k-] shift is irregular (i.e., not predicted by the regular SV correspondence), suggesting non-Sino-Vietnamese, perhaps colloquial or nativized, development.

  2. Labiovelar final [‑əwkpʔ] is robustly preserved in 'cuộc', with the final -k and medial -w- (from /ua/ or /uə/) mapping closely to MC -ak, and aligning phonotactically with native Vietnamese coda structure.

  3. The resultant tone is nặng [˧ˀ˩ʔ], consistent with the entering (rusheng) tone category linked to -k finals in Han-Viet transmission.

Semantic-phonetic association: 'cuộc' vs. 'cục' and the shadow of 局, the homophony and semantic overlap

One reason for the widespread misreading of 'côngcuộc' as 公局 or 工局 is the phonological and structural near-identity between 'cuộc' and 'cục' (局):
  • 'cục' SV: cục, Mandarin jú, MC *kɨwk; used for administrative, governmental, and physical 'units' or 'offices'

  • 'cuộc', derived via the above pathway from 作, but due to similar form and function, is often reanalyzed by speakers and writers as rooted in 局, especially in compounds

The confusion is exacerbated by the convergence of rimes and finals, both 'cục' /kʊkpʔ/ and 'cuộc' /kəwkpʔ/ conforming to the [k•w•k•p̚] structure, with heavy final closure and possible central or back rounded vowels.

Semantic blending in compound formation: semantic overlap also drives this folk association

In both Sinitic and Vietnamese, compounds involving 工作 (work), 局 (office), and 作 (to do/make) are semantically related to tasks, operations, or affairs, domains where 'cuộc' has come to be used.

For example, in classical Chinese, 局 (jú) denoted physical bureaus ('bureaus', 'games') and by extension 'affairs' or 'situations' and 作 (zuò) in compounds implied the 'doing', 'working, or 'citing upon' something: matching the function of 'cuộc' in in syntagms such as "côngcuộc vậnđộng" 'the campaign task'.

Consequently, the phonetic resemblance between 'cuộc' and 'cục' enables semantic-phonetic association (lexical contamination or 'folk etymology'), especially when context or classical literacy is limited.

This phenomenon is hereby called 'sandhi assimilation' or 'assimilative association'; it is recurrent in the realm of Sinitic-Vietnamese.

Conclusion

The analysis of 'côngcuộc', especially the sound change underlying 'cuộc', is a case study in the stratification, innovation, and reanalysis inherent to Sinitic-Vietnamese contact linguistics. Through the transformation of Middle Chinese *tsak to Vietnamese 'cuộc', we witness the interplay of phonological adaptation, semantic reinterpretation, and structural assimilation:

 ▪ The initial [ts-] > [k-] shift, though irregular, is emblematic of colloquial nativization and possibly dialectal borrowing 

 ▪ The preservation of labiovelar coda [-əwkpʔ] aligns with Vietnamese phonotactics, fostering both the formation of new compound morphemes and confusion with native terms like 'cục' 

 ▪ The importance of sandhi, compound formation, and semantic blending means the etymological and structural boundaries between Sinitic and native vocabulary are porous. 

Comparative evidence across Sino-Xenic languages highlights both shared roots and Vietnamese-specific pathways. While 'côngcuộc' initially traces to 工作, its contemporary form and meaning exemplify Vietnam's creative synthesis of linguistic inheritance, local adaptation, and ongoing lexical renewal.


In this paper, when discussing the etymology of Sinitic-Vietnamese words, the author restricts his analysis to locally influential references within the Sinitic framework for comparative purposes. In other words, he focuses only on those etyma that exist concurrently in both Chinese and Vietnamese, including foreign words that entered Vietnamese through a Chinese intermediary. For example, the Vietnamese word "mắt" 'eye', rendered as 目 mù (SV mục) in Chinese, may be related to Malay 'mata'; "gạo" ('rice'), represented by 稻 dào, might compare to Thai /gaw/; and other foreign-derived words include SV "kỹsư" (技師 jìshī, 'engineer') borrowed from Japanese "gishi", as opposed to the modern Chinese meaning 'technician'), "bệnhviện" (病院 bìngyuàn, 'hospital') also from Japanese usage, "ưumặc" (幽默 yōumò, 'humor'), "câulạcbộ" (俱樂部 jùlèbù, 'club') , and country names such as "Anh" (英 Yīn, 'England'), "Mỹ" (美 Měi, 'America'), all from English, "Pháp" (法 Fǎ, 'France') from French, and "Đức" (德 Dé, 'Germany') from German 'Deutsche'.

The sound change patterns observed in core vocabulary across Chinese and Vietnamese suggest the preservation of substratal residues from an earlier "Yue" linguistic layer. These exchanges demonstrably predate the Qin–Han expansion into the southern regions of China (206 B.C.–220 A.D.). Numerous lexical items from this substratum are securely attested in the Kangxi Dictionary 康熙字典, the Qing-era compendium commissioned by Emperor Kangxi, underscoring their deep historical entrenchment. (Y)

From a linguistic standpoint, the predominance of Sinitic features in all over Vietnamese etyma—including tonality, morphological structure, phonological traits, and disyllabicity—has led many scholars to infer a Chinese origin. However, as phonological and semantic convergence increases, so too does the likelihood of borrowing. This is especially evident in Tai-Kadai languages, and most prominently within the Tai-Kam-Sui subgroup, where nearly all lexical items appear to derive from Chinese sources (cf. Comparative Sino-Tibetan Etymologies).

Consider Vietnamese "gạo" ('hulled rice'), often compared to Thai /kao/, or "nếp" (‘sticky rice’) to Thai /nɛp/ and Lao /nèep/. These correspondences align with Chinese 稻 dào (SV đạo), or 糯 nuò (SV nọ), both of which are themselves loanwords in Chinese. In contrast, Vietnamese "lúa" ('paddy rice') appears to be a native Yue-Taic term, corresponding to Lao /lua/ and Zhuang /luə/, with no direct Chinese cognate. This challenges the assertion by A. Starostin (1953–2005) that "lúa" reflects an archaic Chinese loanword derived from 稻 dào, reconstructed as [ lhu:ʔ < Protoform ly:wH ], encompassing meanings such as 'rice', ‘'grain', and 'paddy'. Starostin's broader comparative framework includes Burmese /luh/ ('a grain species', Panicum paspalum), Kachin /c^əkhrau1/ ('paddy ready for husking'), and Kiranti lV ('millet'), which he interprets as part of a native Chinese semantic field. Nonetheless, this view necessitates careful differentiation between inherited Yue-Taic forms—where the Vietnamese share word order syntactically—and Sinitic overlays.

A similar caution applies to items traditionally assigned to the Austroasiatic Mon-Khmer layer. Where phonological correspondences fail to conform to established patterns of sound change or semantic alignment, such items are more plausibly interpreted as intergroup loanwords, facilitated by geographic proximity and prolonged contact.

Table 1.7.  Glyph origins and etymological convergence: 來 and 麥, and the case of Vietnamese "lúa" and "lại"

    The character 來, now widely interpreted as 'to come', originated as a pictogram (象形) depicting wheat. Its ancestral forms include 麥 (OC *mrɯːɡ, 'wheat') and 麳 (OC *rɯː, 'wheat'). In early script forms, the central vertical line represented the ear of wheat, flanked by upward strokes for leaves and downward strokes for stem and roots. An additional horizontal line was often added at the top, possibly to emphasize the ear. Compare 禾, which shares structural parallels.

    This glyph was borrowed for the meaning 'to come' as early as the oracle bone script. During the Western Zhou and Warring States periods, semantic components such as 止 ('foot') and 辵 ('walk') were appended to distinguish the original agricultural sense from the emerging verbal usage. These additions, however, were not retained in later script traditions. Some scholars interpret the derivative 麥, formed by adding 夊 ('to walk slowly'), as the original glyph for 'to come'. If so, the meanings of 來 and 麥 may have interchanged due to the dominant use of 來 in verbal contexts.

    Shuowen connects the semantic domains of 'wheat' and 'arrival' mythologically: 天所來也 ('it comes from the heavens'). This interpretation may be supported by archaeological evidence suggesting that wheat was not indigenous to China, but introduced from the Heavens.

    Phonologically, both 來 and 麥 have been reconstructed with initial *mr- in Old Chinese. In 來, the liquid onset /l/ is retained, while 麥 preserves the nasal /m/. Etymologically, 來 derives from Proto-Sino-Tibetan *la-j ~ *ra ('to come') (STEDT), and is cognate with:

  • 迨 (OC *l'ɯːʔ, 'reach; until')
  • 賚 (OC *rɯːs, 'bestow')
  • 蒞 (OC *rɯbs, 'arrive') — Schuessler (2007)
  • Burmese လာ (la, 'come')
  • Proto-Vietic *laːjʔ

    The Vietnamese reflex "lai" (SV lai) is possibly related to Chinese 來 (MC lʌi, ləj 'to come; to arrive'). 

    Baxter–Sagart (2014) note that 來 shows irregular development, possibly due to the loss of final *-k in an unstressed form that was later restressed:

來 *mə.rˤək > *mə.rˤə > *rˤə > loj > lái 'come'

    This trajectory, however, does not fully explain the irregular presence of final -ʔ (nặng tone) in Vietnamese. If we posit an intermediate stage where *-k > *-ʔ occurred and was subsequently lost, allowing for borrowing into Vietic during that window, the tone could be accounted for. Yet the Vietnamese form lacks expected traces of *-rˤ- (e.g., ‹r› or ‹s›), suggesting a late loan, after *r(ˤ) > l had already occurred. This raises further questions about tonal interpretation and phonological alignment.

    For comparative reference, Zhuang (or Nùng) /lai/ aligns with Proto-Tai *ʰlaːjᴬ ('many; much') [ cf. Vietnamese "lắm" ], itself derived from Old Chinese 多 (OC *t.lˤaj). Cognates include:

  • Thai lǎai
  • Lao lāi
  • Lü l̇aay
  • Shan lǎay
  • Bouyei laail
  • Saek หล่าย
  • Jizhao laːi²¹

    These forms suggest a broader semantic and phonological network in which 來 participates, spanning Sino-Tibetan, Vietic, and Tai-Kadai domains.


This principle is not universal. In some cases, a single-morpheme syllable categorized as a "word" is primarily governed by phonological alternation while showing additional features (beyond tonality) that do not neatly fit established patterns, for instance, "tỏi" 蒜 suàn (SV toán, 'garlic') where /s- ~ t-/ and /-n ~ -i/, whereas "chua" 酸 suān (SV toan, 'sour') does not follow the same pattern. Nonetheless, such items are still classified as loanwords based on overall affinity. Consider 兒 ér, which corresponds to SV "nhi" and yields VS "nhỏ" (child), VS "nhí" ('baby'), and "nhínhảnh" (with "nhảnh" as a reduplicative morphemic syllable conveying 'childish', analogous to English "-ish"), as opposed to "nhỏ" 孺 rú (SV nhụ, 'young'). We may thus conclude that the etymon "nhi" entered via Middle Chinese and that its cited derivatives are all Chinese loanwords.

In comparison with other southern Sinitic dialects, and contrary to common assumption, Vietnamese, beyond sharing a similarly broad tonal range (up to nine tones), aligns more closely with Mandarin than with Cantonese, Min Nan, or Wu varieties, particularly in the lexical domain.  Only a small number of indigenous Cantonese words have cognates in "thuầnViệt"  ('basic native Vietnamese'), such as:

  • sik6 → ("xơi", 'eat')
  • jam2 → ("uống", 'drink')
  • gai1 → ("gà", 'chicken')

    By contrast, rarer Cantonese forms lack direct Vietnamese matches, e.g.:

    • fajng1kao1 ('sleep') ≠  M 卧 wò that corresponds to SV "ngoạ"→VS "ngủ"
    • pin5tow2 ('where') ≠ M 哪裏 nǎlǐ that corresponds to SV "nalí"→"nơinào"
    • tzuo3 ('already') ≠  M 了 liǎo that corresponds to SV "liễu"→"rồi"

      Who are the Cantonese-speaking population in historical-linguistic context? This population historically occupied a substantial portion of the ancient NamViệt Kingdom (204–111 BCE), whose capital, Phiênngung (番禺; present-day Fanyu district, Guangzhou), was governed by its founding monarch Triệu Đà (趙佗; Zhao Tuo) and his dynastic successors. This region formed the southern frontier of early Sinitic expansion. (V).

      Following the annexation of NamViệt into the Middle Kingdom (中國), the preexisting process of Sinicization intensified. This catalyzed the divergence of Cantonese and Vietnamese into two distinct linguistic and cultural entities. Each followed separate historical trajectories, with only Annam ultimately achieving independence from Chinese rule in 939 CE.

      The genetic and cultural composition of modern Cantonese speakers differs markedly from that of their pre-Han ancestors and from populations inhabiting the region up to the tenth century. It is plausible that some kin groups migrated southward into Annamese territories, a phenomenon repeated across centuries of intertwined regional histories. In China, such migrations often occurred in response to famine, repression, or political upheaval. Similarly, ancient Annamese populations moved further south to evade imperial reach.

      By the time these migrations occurred, settlers in new regions would have encountered populations not vastly different from themselves, especially under shared or adjacent statehoods. The border between China and Vietnam remained relatively permeable throughout history, facilitating such movements until its closure in 1949 under Maoist rule.

      Had Annam remained under Chinese dominion into the present, its national trajectory might have mirrored that of NamViệt (Cantonese: NamJyut6), now subsumed within Guangdong Province. Historically, Guangdong produced millions of emigrants who dispersed globally, including to Annam and other Southeast Asian polities. Conversely, had the greater Canton region achieved statehood akin to Annam's, it might have retained linguistic sovereignty. Its language, like Annamese, could have preserved distinct typological features, prompting reevaluation of its classification within the Sino‑Tibetan family. Similar speculation applies to Fukienese (Hokkien) and Hainanese.

      Modern Cantonese descendants, now fully Sinicized, can only access their pre-Han heritage through archaeological vestiges such as the mausoleums of NamViệt kings in present-day Guangzhou. The orthography of NamViệt may be rendered phonetically where appropriate to reflect its historical pronunciation.

      The immersive Sinicization of the Canton region profoundly shaped its linguistic identity. Cantonese, as a Sinicized Yue language, stands in contrast to Vietnamese—a distinction rooted in their respective historical paths. Cantonese remained within China from 111 BCE onward, while Vietnam extricated itself from Chinese rule in 939 CE. This divergence is foundational to Vietnam's national identity.

      During the Ming Dynasty's 25-year occupation of Vietnam in the fifteenth century, Chinese influence left indelible marks. A particularly devastating episode occurred when Ming forces destroyed Vietnam's entire written library (Nguyễn Tài Cẩn, 1998). Over centuries, Vietnam navigated a complex sovereignty, alternating between vassalage and independence, adapting to the shifting power dynamics of its northern neighbor. Even after more than a millennium since the end of China's 1,004-year colonial rule, this balancing act remains central to Vietnam's historical narrative.

      Despite their shared Yue ancestry, Vietnamese speakers often express nostalgia for their Yue heritage, whereas many Cantonese speakers remain unaware of or indifferent to their Yue origins. The Cantonese model is instructive: the Sinicization of Yue subjects in NamViệt deeply influenced the ethnic and linguistic evolution of the ancient Yue. Records of Canton's OuYue (甌越) exhibit striking parallels to Annam's LuoYue (雒越). The Han colonization extended into the Sông Hồng Basin (Red River Delta), which became part of southwestern NamViệt following the conquest of 111 BCE.

      Han imperial policies left enduring Sinitic imprints on the emerging Yue languages, which over centuries evolved into Cantonese and Vietnamese. While these languages share notable features, they are not linguistically bonded as kin. This is evident in the limited number of newly identified Sinitic‑Vietnamese etyma with shared ancestral roots. For instance, the legend of the Magic Sword, which recounts the shared ancestry of the Zhuang and Vietnamese peoples—once self-identified by the same ethnonym—underscores their connection to ancient Cantonese traditions. (Z).

      Conversely, the Chinese affiliation of Sino‑Vietnamese etyma and Sinitic vocabulary in Cantonese is unequivocal. This is attested by their shared usage of Middle Chinese variants and phonological commonalities, including tonality (e.g., 8-toned Vietnamese vs. 9-toned Cantonese) and final consonants (e.g., ‑m, ‑p, ‑t, ‑k).

      Among the Sino-Vietnamese lexemes derived from Middle Chinese etyma, one of the most controversial cases involves the naming of the 'duodenary zodiac system'. This system reveals how substratal pathways in Sinitic-Vietnamese zodiac terminology originating from ancient Yue and passing through Old Chinese, trace long and intricate trajectories before entering Vietnamese. These forms are conspicuously absent in Cantonese, likely due to its deeper Sinicization. Cultural elements such as the duodenary cycle of twelve zodiac animals, shared among the Chinese, ethnic minorities in southern China, Vietnam, and southern Mon-Khmer cultures, exemplify this substratal retention.

      For example, the Year of the Horse (馬年) in 2014 was also referred to as 'Jiawu Year' (甲午年, Jiǎwǔ Nián) or "Năm GiápNgọ" in Vietnamese. Here, the term 'Ngọ' (午), an ancient Yue loanword for 'horse' (contrasting with the native Vietnamese word 'ngựa'), exemplifies the linguistic imprint of Yue heritage. Although nomenclature like 'Jiawu Year' may sound foreign to modern Chinese ears, it remained prevalent until the early 20th century. A notable instance of this usage is tied to the Xinhai Revolution (辛亥 革命) of 1911, which overthrew the Manchurian Qing Dynasty and established the Republic of China (中華民國). The year 亥 (hài), signifying "pig," is another ancient Yue loanword. In Sino-Vietnamese, it appears as 'hợi,' while in Sinitic-Vietnamese, it is rendered as 'heo.' Thus, 1911 is recognized as the "Xinhai Year" or "Year of the Boar" in modern usage (Boltz, William G., 1991, "Old Chinese Terrestrial Names in Saek").

      A notable example is "mẹo", an older Sinitic-Vietnamese reflex of 卯 (M 'máo'), later reintroduced as the Sino-Vietnamese "mão". In Vietnamese tradition, 卯 denotes the fourth position in the zodiac, but unlike Chinese usage where it corresponds to 兔 ('tù', SV "thố", VS "thỏ", 'hare'), Vietnamese associates it with "mèo" ('cat'), an animal culturally unwelcome in Chinese contexts. Thus, while Chinese marks 兔年 ('Tùnián', 'Year of the Hare'), Vietnamese calls the same year 卯年 (M 'Máonián'), rendered SV "Mãoniên", VS "nămMão", "nămMẹo", or colloquially "nămMèo". This divergence demonstrates that the Vietnamese 'Year of the Cat' is not a reinterpretation of the Chinese 'Year of the Hare' but a retention of an older, likely Yue-origin association, contradicting the claims of many Chinese Sinologists, whose interpretations may be based on misreading or deliberate distortion.

      A parallel case involves 未 (M 'wèi') and the Vietnamese "dê" (/ze1/, 'goat'). The original southern concept of 未 as 'goat' was later supplanted by northern terms for 'ram' or 'sheep' (or VS "cừu" 羯 jié, SV "kiết", or 羭 'yú' or SV "du"), even though 羊 ('yáng') still denotes 'goat' in many southern lects. This semantic shift reflects northern influence, where 羊 was associated with 'sheep' or 'lamb' (羔 'gāo', VS "cừu"!). Crucially, 未 should be understood as 'goat' in any case, corresponding to SV "dương" (羊 'yáng') and VS "dê" (/ze1/). This pronunciation aligns with southern Sinitic varieties such as Teochew (/jẽw1/), Amoy (/jũ1/), and Hainanese (/jew1/), all unequivocally meaning 'goat'; the modern disyllabic compound 山羊 ('shānyáng', VS "dênúi", 'mountain goat') reinforces this interpretation.

      It is plausible that 未 ('wèi') descends from an ancient Yue form approximating /ze1/ or /je1/, entering Chinese through its integration into the zodiac system. In this context, 未 may have been adapted to transcribe a foreign term for 'goat', replacing 羊 ('yáng'), which northern nomadic cultures more commonly associated with 'sheep' (羯) aforesaid . The Sinitic-Vietnamese "dê" (/je1/) thus preserves a substratal pronunciation that diverges from Mandarin /wèi/.

      Middle Chinese pronunciations of 未 varied considerably—/mwe̯i/, /mĭwəi/, /miuəi/, /mʉi/, /mʷɨi/, /muj/—and eventually bifurcated into SV "vị" (/vjej6/, VS southern /zjej6/, 'upcoming') and SV "mùi" (/mʷɨi2/, 'goat'). The phonological shift from /v-/ to /j-/ or /z-/ in VS "dê" suggests a southern borrowing instead, possibly mediated through an intermediate /wj-/ stage. In this scenario, Mandarin 'wèi' may represent a back-loan from Old Chinese */mɯds/, as noted in 《說文》: 未, 味也!

      The character 未 thus bifurcates semantically and phonetically into SV "vị" (indicating 'not yet', 'future') , as in "vợchưacưới" (未婚妻 'wèi​hūn​qī​ ', "vịhônthê") and SV "mùi" ('goat'), as in "NămẤtMùi" (乙未年 'YǐWèiNián', 'Year of the Goat'). It is plausible that 未 was introduced by Yue-speaking populations of NamViệt or Annam prior to the Old Chinese period. While neither ancient Chinese nor Vietnamese possessed a native /v-/ onset, southern dialects likely preserved a form closer to /jej/ or /zjej/.

      To further complicating the etymology, the Vietnamese "dê" may also be a doublet cognate of 羊 ('yáng'), reflected in VS "dê" and SV "dương" (/jɨəŋ1/), and paralleled in Teochew "yeo" (/jẽw1/), all denoting 'goat'. These forms reinforce the hypothesis that Vietnamese retains a substratal lexical layer distinct from northern Sinitic developments, though.

      In zodiac reckoning, years such as 1955, 2015, and 2075—formally designated in Vietnamese as "NămẤtMùi" (乙未年 'YǐWèinián')—are now more commonly referred to in mainland Chinese usage as 羊年 ('Yángnián', 'Year of the Goat', VS "nămDê"). Notably, younger Chinese speakers often do not recognize the calendrical significance of 乙未年, whereas Vietnamese youth remain familiar with both "NămẤtMùi" and "nămDê". This is reflected in expressions such as "我 的 生 於 乙未年" ('Wǒde shēng yú YǐWèinián'; "Tôi sanh NămẤtMùi") and "我 的 生肖 屬羊" ('Wǒde shēngxiào shǔyáng'; "Tôi cầmtinh conDê"), or simply "我 屬 羊" ('Wǒ shǔ yáng'; "Tôi tuổi Dê")—all conveying 'I was born in the Year of the Goat'. 

      T his cultural continuity supports the hypothesis that 未 ('wèi') originated as a Yue loanword in any cases, plausibly reconstructed as /zẽ/ or /jẽ/, and distinct from 羊 ('yáng'), a pictograph depicting the head of a goat or sheep. The semantic and phonological interplay between 未 and 羊 is further illustrated in the character 美 ('měi', SV "mỹ" /mej4/, 'beautiful'), where 羊 placed over 火 ('huǒ', 'fire') metaphorically conveys 'beautiful taste'.  The etymological links between 美 and 未, particularly through SV "mùi" (/mʷɨi2/, 'goat')—, reinforce their shared heritage and suggest that Vietnamese preserves substratal lexical and symbolic associations that diverge from later northern Chinese reinterpretations. (未) 

      These two zodiac cases 卯 and 未 have broader implications for Sino-Tibetan comparative work. Further analysis could examine Vietnamese cognates such as SV "ngọ" (VS "ngựa", 午 'wǔ', 'horse') and SV "sửu" (VS "trâu", 丑 'chǒu' < MC ʈʰuw < OC *n̥ʰuʔ, 'buffalo'). Additional parallels include:

              Vietnamese     Gloss        Old Tibetan Note
                  cẳng     foot        rkań     Phonological alignment
                  mắt     eye        mig      Semantic stability
                  sông     river        kluń      Cf. Viet-Muong */krong/
                       cow        ba      Lexical continuity

      Such correspondences suggest that these terms may have existed in proto-Vietic or evolved independently before later Sinitic influence. They open new avenues for exploring Vietnamese affiliations within the Sino-Tibetan family, as will be illustrated in later sections using Shafer's comparative wordlists (1966–1974) (S)

      Table 1.8: The case of "the Year of the Cat"

      According to Nguyễn Cung Thông, the connection between Mão, Mẹo, and mèo is quite straightforward: these sounds all belong to the "low-pitched" tonal category and share the vowel e (as in Mẹo and mèo), which is an older form compared to the vowel a (as in Mão). Examples in VS/SV correspondences include /hạ, xe/xa, keo/giao, vẽ/hoạ, /ma, chè/trà, beo/báo, etc. The confusion between cats and rabbits in Chinese culture is evident in the case of Thốtôn (兔猻), a type of wildcat that is gradually disappearing. This animal, found in Central Asia, Siberia, Kashmir, Nepal, Qinghai, Inner Mongolia, Hebei, Sichuan, Tibet, and Xinjiang, is also known as Xálịtôn (猞猁孫) or Steppe cat in English, and it typically inhabits desert regions.

      When the Han people expanded southward and westward, the phenomenon of "mistaking cats for rabbits" (similar to the Vietnamese idiom "mistaking a chicken for a quail") became apparent, as seen in the naming of thốtôn. This confusion partly explains why the fourth Earthly Branch (Mão, Mẹo) is associated with cats rather than rabbits in its original context. Thốtôn (兔猻) is also referred to as dươngxálị (洋猞猁), ôluân (烏倫), mãnão (瑪瑙), or mã nãotặc (瑪瑙勒). The term xálị (猞猁) refers to a type of wildcat (lynx). The Sino-Vietnamese word miêu (貓) means "cat," but in ancient Chinese, miêu referred to a type of hairless tiger rather than a domestic cat. This evidence supports the idea that Mão (卯) was a phonetic transcription of a foreign word (likely an ancient Vietnamese term) that entered the Chinese language.

      The definition of miêu in the Erya (Nhĩnhã) states: "A tiger with sparse fur is called 虦貓 (sạnmiêu)." According to the Ngọc Thiên dictionary, sạn/sàn (虦) also refers to a cat. The character (a rare variant written as 虥) denotes a striped wildcat. Meanwhile, Thố/thỏ (鵵) in its ancient sense referred to a type of bird, and mãn (梚, a rare character) referred to a type of tree in ancient Chinese texts. In the Hakka dialect, thỏ is pronounced t'u2 (similar to thổ), which contrasts with the pronunciations of mãn (cat) and thố/thỏ.

      To understand why the Vietnamese associate cats with the Earthly Branch Mão (卯), one common explanation in Chinese sources is that the sound of Mão when adopted into Vietnamese resembled mèo or miêu (Sino-Vietnamese for "cat"). Thus, the Vietnamese used cats as the symbol for this branch instead of rabbits. If mèo sounded similar to Mão and was used as the symbolic animal for this branch, it is difficult to explain why nga (wild goose or seabird), which is closely associated with Vietnamese life (fishing, coastal living), and whose ancient pronunciation ngwa resembles Ngọ (午), was not chosen as the symbol for the Earthly Branch Ngọ. Similarly, the ancient pronunciation of Mùi (未) for the eighth branch is closer to muỗi (mosquito), yet the Vietnamese chose goats instead of mosquitoes. There are many other such phonetic parallels.

      Although the Nôm script is relatively "young" for analyzing the phonetic connections of the 12 zodiac animals, some notable points include the use of mèo (and meo) with the Sino-Vietnamese character miêu (貓), as seen in Nguyễn Bỉnh Khiêm's Bạch Vân Thi tập (1491–1585): "Lẻo lẻo doành xanh con mắt mèo" ("Bright green eyes of the cat"). Meanwhile, méo in Nôm uses the character Mão (卯), sometimes with additional diacritical marks, as in Hồng Đức Quốc Âm Thi Tập (compiled by Lê Thánh Tông, 1442–1497): "Tròn tròn méo méo in đòi thuở" ("Round and round, distorted through time"). Thus, the distinction between Mão and mèo has existed since at least the Lê dynasty, and the likelihood of confusion between Mão (Middle Chinese pronunciation, reintroduced into Vietnam during the Tang-Song period) and mèo (ancient Vietnamese pronunciation) is minimal.

      The general and natural tendency of human writing systems evolves from concrete and simple to abstract. For example, animal names are often extended to more abstract meanings, such as "mouse face" (compared to "dragon face"), "ox-like body," "eating like a cat sniffing," or "snake-like temperament." Therefore, deriving mèo from Mão does not align with this natural tendency; rather, it is more logical for the concrete term mèo (animal) to give rise to the abstract term Mão (timekeeping system, divination). The system of naming specific animals (simple) familiar to farmers was integrated into Chinese culture and transformed into a system for recording time and divination (abstract, complex). This 12-zodiac system flourished as Chinese culture reached its peak (Qin, Han, Tang, Song dynasties) and influenced surrounding regions, including Vietnam. This phenomenon of "reverse borrowing" is often overlooked in Vietnam's case.

      In reality, Vietnamese people do not need to overanalyze the natural connection between Mão, Mẹo, and mèo, just as they do not question the links between (mouse), Ngọ (horse), Hợi (pig), or Sửu (ox). Unlike Chinese culture, which uses compound terms like Mão Thố (卯兔, "Rabbit of Mão"), Tý Thử (子鼠, "Mouse of Tý"), or Sửu Ngưu (丑牛, "Ox of Sửu") to emphasize these connections, Vietnamese culture inherently recognizes the associations between Mão and mèo, and chuột, or Sửu and trâu.

      Source: Nguyễn Cung Thông: "Nguồn gốc Việt (Nam) của tên 12 con giáp - Mão/Mẹo/mèo"

      Our revised hypothesis, as elaborated etymologically above, is substantiated by Vietnamese etyma that exhibit direct cognacy with Sino-Tibetan roots. These etyma appear to descend from other Sino-Tibetan languages rather than through Chinese transmission. The frequency and consistency of such correspondences are too numerous to dismiss as coincidental. Consequently, we propose a novel linguistic classification: a distinct category termed Sinitic-Vietnamese. This classification may warrant equal footing with the Sinitic branch itself, given the historical precedence of Yue substrata over proto-Chinese, as previously discussed. Moreover, the Vietnamese fundamental words cited in Chapter 10.) demonstrate clear cognate relationships with Sino-Tibetan etyma, lending further credence to this theorization.

      Analytically, the new etymological survey presented in this paper integrates the historical perspective outlined above, examining linguistic development through both synchrony and diachrony. This methodology resembles capturing motion-picture frames in a historical reel—allowing for fast-forwarding, rewinding, zooming in, and zooming out to contextualize lexical evolution. However, the chronological placement of certain etyma remains ambiguous.

      For example, "béo" ('greasy') aligns with 油 yóu as in 油膩 yóunì (VS "béongậy"), illustrating the ¶ /y- ~ b-/ pattern in Mandarin–Vietnamese sound correspondences. Other examples include 郵 (yóu, SV "bưu", 'postal'), 由 (yóu, VS "bởi", 'because'), 柚 (yóu, VS "bưởi", 'pomelo'), and 游 (yóu, VS "bơi", 'swim')—all of which conform to the Sinitic-Vietnamese phonological contour. While such interchanges are plausible, identifying the latest sound splits depends on the comparative methodologies introduced in later chapters.

      Given that all Vietnamese sister languages in 'China South', including regional Chinese lects, are classified under the 'Sino-Tibetan' family, how, then, has Vietnamese come to be categorized as a member of the 'Austroasiatic' family, specifically the 'Mon-Khmer' subbranch? How does this classification reconcile with the Sino-Tibetan and ancient Yue etymological evidence presented in this paper?

      The challenge lies not in the data, but in the mindset of those committed to inherited frameworks. Reevaluating Vietnamese classification requires confronting entrenched assumptions and acknowledging the complexity of its linguistic ancestry.

      V) Cultural integration and beyond

      From an ethnic-historical standpoint, theorists within the 'Austroasiatic' framework have posited that the origins of Vietnamese, both its people and its language, are primarily traceable to "Mon-Khmer" speakers. This hypothesis finds support in the composition of Vietnam's ethnic minorities, whom we classify as later arrivals. Alongside other populations of "Yue" derivation, both major and minor, these groups now comprise a total of 54 officially recognized ethnicities (as recorded in the 2023 census), with many communities speaking at least one "Mon-Khmer" language, especially those inhabiting the western highlands and southernmost provinces of Vietnam.

      Notably, the majority of these groups have been categorized under the 'Austroasiatic language family' by linguists active in the latter half of the 20th century. (See The distribution map of the Austro-Asiatic languages before the Vietnamese migrated into the central region from the 12th century onward.)

      With regard to the assertion, within our racial-component perspective, that the Mon-Khmer elements were merely latecomers, note that Vietnam acquired its southernmost territory from the ancient Khmer Kingdom only about 325 years ago. In the contemporary era, Vietnam's geopolitical territory is historically many times larger than the ancient Annamese land of two millennia ago (excluding the portion once part of the NamViet Kingdom in what is now annexed to Guangdong Province of China). From an Austroasiatic viewpoint, modern Vietnam encompasses even more indigenous Mon-Khmer ethnic minorities inhabiting their ancestral lands for over 2210 years before present, or since prehistoric times, as some Austroasiatic Mon-Khmer theorists propose.

      Ethnically, as of late 2023, Vietnam’s population surpassed 100 million, with over 85.7 percent identified as the "Kinh" majority. The ancestral roots of the Kinh trace largely to Sinicized 'Yue' emigrants who migrated southward from China into the region now known as northern Vietnam. Over centuries, these groups gradually intermingled with indigenous populations, including 'Chamic' and 'Khmer' communities situated south of the 16th parallel, especially following the 12th century.

      Linguistically, on the other hand, evidence from the Sino-Tibetan family indicates that Sinitic-Vietnamese elements constitute more than 95 percent of the Vietnamese lexical inventory. This includes not only basic and foundational vocabulary of Tai origin, recurrent across ancient linguistic strata, but also a rich array of shared features and structural peculiarities that remain indispensable in modern Vietnamese usage.

      Drawing on archaeological excavation, proponents of the 'Austroasiatic' school have argued that the Indo-Chinese peninsula serves as the cradle of "Khmer" ethnogenesis. Within this framework, the indigenous substratum, reflected in the "Mon-Khmer" foundational vocabulary embedded in Vietnamese, was reinterpreted as a layer of Chinese loanwords, allegedly introduced by emigrants from 'China South' who settled in Vietnam. This theorization was designed to reject the notion that ancient "Yue" entities represented a veiled Austroasiatic presence.

      However, two critical oversights emerge from this dismissal. First, the "Yue" and Austroasiatic populations may share ancestral ties with the native inhabitants of 'China South', suggesting a deeper ethnohistorical convergence. Second, the Vietnamese are a racially composite people: they include descendants of the "Yue" as well as earlier settlers in the Red River Delta—those whom Austroasiatic theorists identify as ancient "Mon-Khmer" speakers who later established autonomous polities in the southern territories.

      Over the past millennium, this demographic mosaic expanded to include successive waves of "Chamic" and "Khmer" populations crisscrossing in the region, whose ancestors were gradually assimilated into the region's geopolitical landscape beginning in the 12th century.

      The author's position on this issue is that, although the Austroasiatic language family may have given rise to the Mon-Khmer languages, it is not directly ancestral to modern Vietnamese (see Table 1 above). Anthropologically, prior to the arrival of Mon-Khmer groups from the southwest, local aboriginal populations and early settlers likely intermingled in the same region. These groups are believed to have descended from shared Taic ancestors in the Red River Delta of northern Vietnam, a region that extended into what is now southern China, as discussed earlier. This scenario is proposed solely to account for the commonality of several shared fundamental words between Vietnamese and Mon-Khmer languages.

      Chinese sources, anthropological evidence suggests that early immigrants from both the southwestern and northern neighbors of ancient "Annam", in what is now northern, Vietnam were present in the northwest long before these regions were incorporated into Annam's geopolitical domain. These populations were of mixed "Taic" descent, presumed to be descendants of ancient "Daic" peoples. A similar pattern recurred with later "Mon-Khmer" migrants. The integration of these groups into the existing ethnically diverse population did not significantly alter Vietnam's overall ethnic composition as the Annam polity expanded westward and southward.

      It was not until the late 16th century that the western territories of the old "Khmer" Kingdom were annexed into Vietnam, with their inhabitants now classified as ethnic minorities. A comparable process had already occurred along the central coast, south of the 16th parallel, where Chamic natives were gradually incorporated between the 12th and 18th centuries.

      Archaeologically, this southward expansion contributed to the formation of contemporary Vietnamese communities stretching from the central coastline to the tip of Camau Cape. To date, cultural artifacts excavated from these ancestral lands were neither created by nor exclusively associated with the forebears who founded Annam nor with modern Vietnamese, and their linguistic items must be evaluated accordingly. Under linguistic scrutiny, the early Annamese language appears to have undergone only limited transformation after prolonged exposure to local speech, presumably of Austronesian Chamic or Austroasiatic Mon-Khmer language family. In fact, aside from the adoption of a few local elements, such as placenames and foundational lexicons encountered along the southward migratory routes, the developments south of the 16th parallel during that period bear minimal anthropological or linguistic connection to ancient Vietnamese identity, despite assertions made by Austroasiatic theorists.

      Meanwhile, new Vietnamese nationals, such as late Ming refugees from Chaozhou, the Teochew people possibly the group underlying the modern derogatory term "Tàu"), fled southward by boat in large numbers during the 17th century, as Qing Manchurian forces advanced to occupy mainland China. These refugees eventually resettled in what is now the southwestern region of Vietnam. As a result, their presence, along with the Teochew language, has continuously infused the Vietnamese lexicon with new phonological layers atop the older lexical substrate while the Khmer did a little.

      Culturally, Chinese society has long absorbed traditions from northern peoples, including the Moon Festival, which some attribute to Altaic or Korean influences from the northeastern frontier, while also retaining deeply rooted ancestral Yue elements. As noted earlier, one such Yue contribution is the duodenary cycle of twelve animals, a system that has served as a chronological marker for years across centuries.

      Chinese identity, it is clear, is fundamentally cultural rather than racial. There is no distinct Han race; instead, we speak of the Han people, much as older Pekingese once referred to themselves as Qírén (旗人), reflecting Manchurian or Jurchen ancestry, or as veteran Cantonese still identify as Tang subjects (唐人), denoting their heritage as citizens of the Great Tang Empire (大唐帝國), even as they have migrated across the globe. In practice, the Chinese are an ethnically mixed population, unified by a shared national identity. This is evident in how overseas Chinese continue to regard themselves as Chinese, regardless of whether they hold citizenship in Taiwan, Malaysia, Singapore, Canada, or the United States.

      In contrast, Vietnamese national identity encompasses not only the tangible legacy inherited from the extinct Champa and Khmer kingdoms—their lands, peoples, languages, and cultural artifacts—but also the intangible spirit of nationalism and valor passed down from generations who resisted repeated Chinese aggressions. Anyone who has read all 72 volumes of Bo Yang's edition of Sima Guang's Zizhi Tongjian (資治通鑑, 1983–93), which chronicles Chinese governance from antiquity through the Song Dynasty (宋朝), will have encountered the harsh realities imposed by successive Middle Kingdom regimes upon their own subjects. These narratives reveal the suffering of commoners, including those in the colonized vassal state of ancient Annam, and help explain why modern Vietnam endures as a nation sustained by a resilient national spirit—a collective will to resist foreign domination and preserve cultural integrity.

      Vietnam, uniquely, is remembered for having repelled three Mongol invasions led by Genghis Khan and his heirs, who had previously shattered the Song Dynasty and established the Yuan Dynasty (元朝) on Chinese soil, a regime that endured for nearly a century.

      Here, nationalism refers to the indomitable spirit of the Vietnamese people and their hard-won independence, a spirit they have consistently defended. This fervent nationalism has shaped their anthropological identity, especially their national language. It helps explain why many Vietnamese reject genetic affiliation with the Chinese and question aspects of the Austroasiatic theory, instead affirming an ancestral connection to the Yue, a non-Chinese lineage, an interpretation steadfastly upheld by patriotic Vietnamese scholars.

      In an ethnically diverse society, elements assimilated into the Vietnamese melting pot emerge distinctly as Vietnamese, regardless of whether a person is of Chinese, Chamic, or Khmer descent. The history of the nation known as Vietnam is a chronicle of descendants from those who arrived either as conquerors or as refugees fleeing hunger and oppression from the north. Their long southern journey, culminating at the tip of the Indo-Chinese peninsula, spanned nearly ten centuries during which they waged continuous wars against northern and southern external enemies, beginning as early as 939 A.D., in the relentless pursuit of national sovereignty.

      Vietnamese history is shaped not only by resistance wars but also by ongoing patterns of immigration and emigration, much like China's. Consider Taiwan, where modern migration trends mirror those familiar in Vietnam: successive waves of Chinese migrants from the mainland settled over generations, while hundreds of thousands of Vietnamese women married into Taiwanese families. This long-standing exchange continues today.

      In other words, the history of the Vietnamese people is also the story of descendants of racially mixed immigrants from southern China. These groups included refugees fleeing war-ravaged regions, as well as outcast proletarians from newly affluent provinces. Notably, fledgling Ming loyalists, escaping execution after the Manchurian conquest and the founding of the Qing Dynasty (1644–1912), contributed to Vietnam's migratory mosaic. This is reflected in the prevalence of Chinese surnames among the Kinh majority. 

      In the 21st century, Vietnam continues to receive immigrants from its northern border with China, including economically disadvantaged laborers and so-called 'technical' workers, many of whom, critics argue, form a 'Chinese fifth column' after overstaying their visas. Regardless of origin, many Chinese emigrants from inland provinces along the northern frontier have, over time, come to identify as Vietnamese. Since the 1990s, over one million new migrants from mainland China have settled permanently in Vietnam, often through marriage into Vietnamese families, a trend well documented at annual gatherings of Chinese expatriates.

      The formation of the Kinh majority was shaped not only by immigration but also by domestic emigration. Hanoi, much like Shanghai, underwent significant demographic shifts as its original residents relocated—some moving south during the great migration of 1953–54, others departing overseas after the Vietnam War ended on April 30, 1975. As middle-class urban dwellers left in search of opportunities abroad, their absence was gradually filled by incoming villagers, who arrived as new migrant laborers to occupy the growing vacancies in the city.  

      Taken together, these demographic shifts reveal that modern Vietnamese identity, and the Vietnamese language, cannot be traced solely to Mon-Khmer origins. Instead, contemporary Vietnam reflects a complex mosaic of ancestry. Its citizens are primarily of mixed Chinese descent, tracing back to the ancestral Yue of Zhou Dynasty vassal states and the Yue-influenced Han of the Chu region more than 2,100 years ago. They also carry genetic contributions from native Mon, Chamic, and Khmer populations from the 12th century onward, along with more recent admixtures, such as Euro-Asian children born to American servicemen during the Vietnam War (1965–1975), which added over 50,000 individuals to South Vietnam's population of 20 million by 1975. This extensive intermingling underscores the profound racial mixing that defines Vietnam.

      Linguistically, Austroasiatic theorists have pointed to Mon-Khmer basic words in Vietnamese as evidence for their theory. For example, their numerical presence in the range from one to five, these items do not align with Vietnamese counting from six to ten at all, and they bear no genetic relationship to the core vocabulary. Like any living language, Vietnamese has absorbed a wide range of loanwords over time, including those from Daic, Thai, and Malay, as well as English and French, alongside contributions from the Austroasiatic family.

      Statistically, the rate of foreign lexical infiltration in Vietnamese remains modest. Even the decade of active American presence during the Vietnam War failed to significantly reshape the language, leaving only a small set of persistent English terms, such as 'hello', 'okay', 'bye-bye', 'number-one', 'one-two-three', 'snack-bar', 'cowboy', '(bus)boy', 'hippy', and 'jeep', in stark contrast to the enduring Sinitic influence.

      In fact, the situation became somewhat farcical when certain French institutions sponsored Vietnamese scholars to publish works on French influence in Vietnamese, including one that argued for a French origin of select Vietnamese words (see Cao Xuân-Hạo, 2001). Had the French colonial presence in Annam lasted longer, it is conceivable that roughly 400 French loanwords might have entered mainstream usage. By proportion, French loanwords, remnants of the 96-year colonial legacy ending on July 20, 1954, number several hundred in Vietnamese (see APPENDIX A (5)). Common terms in some Vietnamese circles, such as 'moi' 'I', 'toi' 'you', 'monsieur' 'mister', 'madame' 'madam', and various modern grammatical constructions, of course, do not reflect a deep-rooted etymological bond.

      This stands in contrast to entrenched Chinese pronunciations in Vietnamese, such as "anh" (兄 xiōng, SV "huynh", 'brother'), "em" (俺 ǎn, SV "am", 'younger sibling'), "chị" (姊 zǐ, SV "tỷ", 'sister'), "cô" (故 gū, SV "", 'miss'), and "mẹ" (母 mǔ, SV "mẫu", 'mother'), including the many modern Chinese loans that remain popular today, including "bảotrọng" (保重 bǎozhòng, 'take care'), "đảmbảo" (擔保 dànbǎo, 'guarantee'), "thịphạm" (示范 shìfàn, 'demonstrate'), "đạocụ" (道具 dàojù, 'prop set'), and "giaođãi" (交待 jiāodài, 'to brief'). 

      VI) Key contributions to linguistics

      Anthropologically, in addressing the origin of Vietnamese etymology, the author advances an independent argument grounded in data analysis to counter the claims put forth by the Austroasiatic linguistic camp, which he regards as having introduced a distracting agenda into the debate. Advocates of this camp approach the issue from a southern geospheric perspective, focusing on regions where the Austroasiatic boundary intersects with the Austronesian racial substratum—particularly among Chamic populations in the Indo-Chinese peninsula—and extending across the archipelagos of Malaysia and Indonesia, the western islands of the Philippines, and Taiwan, formerly known as Formosa.

      Why did Austroasiatic theorists group the Vietnamese language into the Mon-Khmer branch in the first place? The Austroasiatic hypothesis took root largely because the Mon-Khmer populations dominated the Indo-Chinese peninsula and permeated deeply into the local demographics. Additionally, this hypothesis emerged during the 'gold rush' era of historical linguistics in the late 19th century, when Western linguists were yet to hear of the Yue people and their linguistic legacy. By contrast, Mon-Khmer speakers in Southeast Asia resonated with the grandeur of the ancient Khmer Empire, a past that captured admiration and envy. This led to the creation of the Viet-Muong subdivision within the Austroasiatic Mon-Khmer linguistic subfamily as scholars sought connections among these groups.

      In response, the author firmly establishes the theory that the Vietnamese people descend primarily from ancient Yue ancestry in southern China, having intermixed with Han settlers during the millennium of Chinese domination following 111 B.C. As the Annamese polity expanded southward into what is now central Vietnam, further admixture occurred with Austronesian Chamic and Austroasiatic Mon-Khmer populations. Consequently, the modern Vietnamese population reflects a racially composite lineage shaped by centuries of migration, integration, and cultural synthesis.

      That is the author's anti-thesis of what the Austroasiatic Mon-Khmer theorists have ever argued about the Sinicization of indigenous Mon-Khmer people in ancient Annam that is the real process that produced the Vietnamese identity. This viewpoint largely ignored the recorded history of Yue people, considered ancestors of early Annamese populations, who had advanced further south and bridged the anthropological gap leading to modern Vietnamese fusion. According to the Austroasiatic camp, the intermingling of Mon-Khmer people with Chinese resettlers during the colonial period was the origin of the Vietnamese. They claimed that Mon-Khmer peoples from the Indo-Chinese peninsula were the direct ancestors of modern Vietnamese. Crucially, the 'Vietnamization of the Mon-Khmer' factors seemed overlooked, possibly because the timeframe of when Mon-Khmer groups purportedly arrived in the Red River Basin, already inhabited by Daic populations, remains vague. 

      While archaeological findings in Central Vietnam further affirm that the inhabitants prior to these migrations bore no ancestral connection to the Vietnamese. Historically speaking, early Vietnamese emigrants ventured into the southern Indo-Chinese peninsula only after the 12th century, where they first mixed with the Chamic people. This mixing was facilitated by the concession of two Chamic prefectures as a gift to the Tran Dynasty through royal interracial marriage between the King of Champa and a Vietnamese princess, Huyềntrân Côngchúa. That is how the late Vietnamese appear along the stretch of the Vietnam's central coastline and southwestern part.

      Intriguingly, the Austroasiatic hypothesis aligned neatly with domains historically attributed to the Yue as recorded in ancient Chinese annals, a coincidence that blurred distinctions between Yue and Austroasiatic entities. The Austroasiatic Mon-Khmer theorists discreetly adopted this notion while sidestepping the complexities of Sinitic-Vietnamese linguistics. It was certainly simpler to identify a set of basic words shared by Mon-Khmer and Vietnamese and then draw conclusions about their shared roots, rather than confronting more intricate etymological challenges.

      In effect, it has often proved formidable and challenging for many Western-educated scholars to delve deeply into ancient Chinese classics to uncover the intricate etymological roots of Vietnamese. While their linguistic expertise often excelled in the realms of proto-Chinese, Old Chinese, and Middle Chinese, utilizing phonetic sound rules and methodologies, this approach fell short in the case of Vietnamese, both historically and in contemporary studies.

      Unsurprisingly, it was not until the early 20th century that Sinology became an established discipline, and even then, very few scholars could confidently substantiate the connection between Sinology and the exploration of Vietic roots. Renowned linguists such as De Lacouperie, Maspero, Haudricourt, Shafer, Forrest, and Karlgren were among the select few whose work pointed to Sinology as a vital key for understanding Vietnamese etymology. Without a deep knowledge of Chinese language and history, no one could reliably offer a comprehensive view of Vietnamese linguistic origin.

      Despite these competing frameworks, the broader picture can be synthesized by integrating the perspectives of Yue and Austroasiatic Mon-Khmer into one concept, the "Bod" (Terrien De Lacouperie. 1887). It is conceivable that Indo-European theorists may have deliberately substituted the term Yue with Austroasiatic in order to reframe aboriginal Yue entities along a continuum that aligns with established historical linguistic models. This interpretive shift, whether intentional or methodological, echoes earlier typological depictions found in the works of T. D. Lacouperie (1887) and R. A. D. Forrest (1948).

      Geographically, in fact, by substituting the terminology 'Austroasiatic' with the Yue ("Bod" or "BáchViệt")  , the author traces the movements of early indigenous Yue emigrants —LuoYue (雒越), OuYue (歐越) or Xi'Ou (西甌), and MinYue (閩越) or Dong'Ou (東甌), as well as racially mixed groups like the Qin-modified Shu (巴蜀 BaShǔ, "BaThục"), Yue-modified Chu (楚, Chǔ, "Sở"), Yue-modified Han (漢), Hakka (客家, Kèjiā or "Cácchú"), Hokkien, Hainanese, Cantonese, etc.—from China South to northern Vietnam across vast areas Southeast Asia and beyond These groups advanced southward, resettled, and intermingled with native inhabitants along their journey, and in the case of Vietnam, fusing with the Chamic and Mon‑Khmer peoples. In a sense, this process is encapsulated in the official name "Việtnam", which first appeared in 1802. This designation can also be read as a reverse form of "NamViệt", meaning 'the Việt of the South', which usually misinterpreted as 'to surpass in the south' or 'advance southward'. Such connotations highlight the migratory pattern of the ancestral Yue, whose emigration from China South became more pronounced around 300 B.C. in response to Qin expansion (Lu Shih Peng, 1964).



      Figure 1.5: Map of the historical ancient proto-Chinese migratory routes
      Source: Multiple sources on the internet

      The author's perspective on the southward geo-spherical migration of the Yue originating from a northern axis and radiating toward the southern hemisphere can be expanded without invoking competing theories regarding the origins of Austronesian populations, whose dispersal spans the eastern hemisphere over a timeline of 3,000 to 4,000 years, as supported by available historical records. (A) This framework aligns with archaeological evidence indicating that the Yue were not the exclusive creators of bronze drums, artifacts that have also been unearthed in the Shu State (蜀國) of Sichuan in southern China and across parts of Indonesia. In these regions, Austronesian interpretations have informed alternative hypotheses, including the Austro-Thai theory. Fundamentally, all southern migratory trajectories appear to originate from northern sources.

      Practically speaking, the Austroasiatic hypothesis overlooks alternative perspectives on the proto-Yue presence, which extended as far northeast as the Yangtze River and up to the Yellow River basin. For example, the proto-Yue were present in the ancient Lu State (魯國) within Shandong Province (山東), as suggested by the broader ethnological framework of the Taic-Yue stock originating from the Chu State (楚國) near present-day Hubei (湖北) and Anhui (安徽) provinces. Vietnamese legends, too, recount that their earliest ancestors emerged from the Dongtinghu Lake area (洞庭湖) in Hunan Province (湖南), south of Hubei. Together, these regions form a contiguous zone representing the racial principality of the Taic stock.

      The author's postulated frameworks for both "Yue" and "Austroasiatic" theories are further synchronized with available long-standing ancient Chinese legends and history. Different tribes of the ancient Taic-Yue people spread both eastward and westward, contributing to the racial composition of the pre-Qin (先秦) era, which is backed by evidence includes early human fossils discovered in ancient Sichuan Province, where the Bashu State (巴蜀) was once located. These tribes collectively introduced new cultural elements to the pre-Han (前漢) populace, with the key difference being in name changes over time. Notably, the first monarch of the Han Dynasty, Liu Bang, along with his generals and followers, were originally subjects of Chu (楚) as repeatedly emphasized. Had the last Duke of Chu, Xiang Yu (項羽), defeated Liu Bang in the decisive battle, the dynasty might well have been named 'Chu' rather than 'Han'.

      As previously mentioned, after the Han forces defeated Chu, the subjects within the Han Empire's periphery gradually came to identify as Han people (漢人 Hànrén), a process that took considerable time. This marked the emergence of the Chinese Han from a racially mixed population composed of pre-Han peoples and Taic-Yue descendants. These included groups from six ancient states conquered and unified under Qin rule in 221 B.C. The racial composition of the Chu subjects primarily consisted of Taic-Yue descendants, who in turn gave rise to the Southern Yue tribes (百越 BǎiYuè, SV "BáchViệt", 'Bod') through various historical stages spanning the Zhou, Qin, and Han periods.

      In essence, the Vietnamese ethnogenesis reflects a layered process: rooted in ancient Yue ancestry from southern China, subsequently intermixed with Han settlers during a millennium of Chinese rule beginning in 111 B.C. As the Annamese advanced into central and southern Vietnam, further admixture occurred with Austronesian Chamic and Austroasiatic Mon-Khmer populations. The result is a modern Vietnamese demographic profile shaped by centuries of migration, integration, and cultural synthesis.

      The demographic evolution of ancient Annamese populations initially paralleled that of other Southern Yue-descended groups, including the Cantonese (粵), Fukienese (閩越, 'Hokkien'), and WuYue (吳越). Yet this resemblance proved short-lived. The Vietnamese historical trajectory diverged markedly under prolonged Chinese domination, spanning from 235 B.C. to 939 A.D., punctuated only by brief episodes of autonomy. Following the 12th century, the emergent Annamese polity began a sustained southward expansion beyond the 16th parallel, gradually consolidating its territorial reach over the next 1080 years. This arc culminated in 1989, when Vietnam withdrew from Cambodia (formerly Kampuchea) and restored its pre-1979 borders.

      Figure 1.6. The distribution of indigenous languages before the Vietnamese



      Map of the Austroasiatic languages per the Austroasiatic view
      Source: Multiple sources on the internet
      x X x

      The nature of a people's mother tongue, as commonly perceived, often reflects their racial composition, and vice versa. The Austroasiatic Mon-Khmer hypothesis for Vietnamese appears to align with this notion. A playful way to frame this theory is to liken the Vietnamese language to the product of a "forced marriage" between Mon-Khmer and Chinese influences. From an anthropological standpoint, the prolonged colonization of early Annamese populations might reflect a dynamic of role reversals: the "guests" (early Kinh settlers) ultimately became the new sovereign majority, while the indigenous natives assumed subordinate roles in their own land, newly annexed into a foreign state.

      As life progressed in the resettlement, separate from mainland China, let us envision a "what-if" scenario. Imagine a family of new homeowners moving into a residence previously inhabited by others. While settling in, the new occupants discover cultural artifacts buried on the property. The head of the household could easily claim ownership of the artifacts, but it would be dishonest to present them as ancestral heirlooms, treasures passed down by their forebears. Meanwhile, their descendants adopt new surnames, such as Phạm or Trần, except for cases of Chamic or Khmer heritage, marked by surnames like Chế or Thạch. This illustrates how the Vietnamese identity absorbed not only Chinese surnames from a broader set of Chinese-origin names but also names rooted in Chamic or Khmer lineage.

      Linguistically, a nation's language does not always reflect the tongue spoken by its ancestors. Analogous phenomena exist worldwide: for instance, modern French is distinct from the Gaulish language of ancient France, and people in former French colonies like Morocco or Haiti continue to speak French, albeit with distinctive local accents. For the Austroasiatic view, rooted in the heritability of language based on racial identity, to hold water, Vietnamese speakers would need to be "racially pure" Mon-Khmer, or at least comparable to the Muong linguistic stock. However, this does not seem to align with the evidence, just as Cantonese and Fukienese remain grouped within Chinese dialectology despite their divergence. To enforce such a standard would risk undermining broader notions of national identity, particularly for larger nations such as China.

      It is also worth recalling that modern Vietnamese as a fully formed language did not emerge until after Vietnam gained independence from China in the 10th century

      Etymologically, the commonalities in certain basic words can be explained as the result of linguistic contact. Words from one Mon-Khmer language spilled over into the Muong subdialects, which in turn influenced the Vietnamese language. This was made possible by their geographical proximity, particularly in mountainous regions further south, where aboriginal populations retreated in the face of Chinese occupation and Sinicization. Even though mutual intelligibility between Viet-Muong and Muong languages waned long after their split, Muong speakers have remained anthropologically and culturally connected to the Vietnamese Kinh as neighboring kin in many ways.

      Additionally, these shared basic words spread between Vietnamese and Mon-Khmer languages through everyday activities such as trade, bartering, agricultural exchanges, handicraft production, and shared farming practices. In other words, while the Kinh collaborated with Chinese occupiers, they also maintained ties with other diasporas within their territory. This interaction bridged linguistic gaps between Vietnamese and Mon-Khmer languages. These encounters trace back to prehistoric times, starting with the first wave of Mon-Khmer speakers moving into the Red River Delta from southwestern Lower Laos (Nguyễn Ngọc-San, 1993, p. 43).

      Methodologically, Austroasiatic linguists grouped related basic etyma spanning many Mon-Khmer languages into a broad linguistic spectrum of mixed elements. However, some of the Mon-Khmer basic words found in Vietnamese also have cognates in Chinese and Sino-Tibetan languages, referred to in this paper as Sinitic-Vietnamese words. Many fundamental etyma in Vietnamese reveal roots in Yue-related languages like Cantonese, Teochew, Hainanese, and Fukienese, as well as Sino-Tibetan etymologies, further complicating the Austroasiatic hypothesis. The Austroasiatic theorists appear to have grouped these elements under the Mon-Khmer umbrella without addressing their potential origin elsewhere, while the pervasive influence of Khmer served as an overwhelming and, at times, unchallenged foundation for their claims.

      The Austroasiatic theory emerged with its Mon-Khmer linguistic subfamily as a focal point but has also engaged in dismissing the Sino-Tibetan theory, which predates it and posits an alternative root for the Vietnamese language. The issue of linguistic affiliation thus involves not only Austroasiatic Mon-Khmer versus Yue but also Sinitic-Vietnamese versus Sino-Tibetan frameworks. This dynamic is further complicated by the vast number of Sino-Tibetan cognates in Sinitic-Vietnamese and the unique linguistic features shared between Vietnamese and Chinese. While it may be simpler to accept that ancient Annamese developed from a Yue linguistic foundation layered upon a Taic base, the claim that certain basic lexicons in the Viet-Muong subdialects could be loanwords from neighboring Mon-Khmer languages aligns with the understanding that these languages were part of a broader family spanning southern China hundreds of years ago.

      In any case, whether or not the Vietnamese language belongs to the Sino-Tibetan linguistic family, Austroasiatic Mon-Khmer theorists remain focused on genetic classification, proposing that Austroasiatic Mon-Khmer is the mother language that gave rise to Vietnamese, as their theory asserts. Meanwhile, the Sino-Tibetan camp highlights the Sinitic affinity of Vietnamese, as explored in this paper, tracing its historical foundations back approximately 3,000 years, a timeline notably absent in the prehistoric Austroasiatic Mon-Khmer framework.

      Regarding the timeframe in historical linguistics and their affiliations, Merritt Ruhlen, in The Origin of Language (1994 [1944]), quotes Hans Henrich Hock:

      "We can never prove that two given languages are not related. It is always conceivable that they are in fact related, but that the relationship is of such an ancient date that millennia of divergent linguistic changes have completely obscured the original relationship.

      Ultimately, this issue is tied up with the question of whether there was a single or a multiple origin of Language (writ large). And this question can be answered only in terms of unverifiable speculations, given the fact that even the added time depth provided by reconstruction, our knowledge of the history of human languages does not extend beyond ca. 5,000 B.C, a small 'slice' indeed out of the long prehistory of language. " (Hock 1986:566).

      In his work, the author explores both perspectives, i.e., genetic affinity and historical settings, and this research brings home many basic words to be in line with Chinse-Vietnamese interchanges with more than 400 fundamental lexical items from a wide range of Sino-Tibetan etymologies. To make their cognacy more plausible, they are propped up with elaboration on commonly-shared Chinese linguistic peculiarities in Vietnamese to substantiate the core matter of the Sino-Tibetan theory as presented in this paper.

      The author also set up new methodological foundations to approach his Sino-Tibetan Sinitic-Vietnamese theory. I n contrast to the accelerated information-gathering capabilities of the Artificial Intelligence (AI)  era, the author's research methodology originates in the pre-internet age, grounded in traditional scholarship. His findings were documented the old-fashioned way, that is, through direct engagement with printed books, hundreds of them, examined one at a time, page by page and line by line. Each insight was manually recorded on index cards, extracted from a vast corpus of publications. As of 2025, only about one third of these titles have been entered into the Bibliography and References, with compilation still ongoing—a time-consuming but meticulous task.

      Over the span of more than 20 years, the author accumulated over 20,000 research notes by the year 2000, just as the world was entering the full momentum of the internet-information era. These notes were not simply archived but deeply internalized, forming a durable cognitive framework that continues to shape his analytical process. Rarely needing to revisit the original index cards, he has mentally constructed from this foundation a comprehensive perspective on the structural and semantic essence of Vietnamese etymology, recovering a linguistic heritage that had long receded from scholarly view.

      With this substantial body of evidence, he is now systematically assembling corroborative data to support the argument that the majority of cited ST Sinitic-Vietnamese lexical items can be traced to at least one cognate in Chinese, thereby reinforcing the historical and linguistic continuity between the two traditions.

      Digitally, the author entered the new electronic era and continues on to advance the project through incremental releases on his website, prioritizing mobile accessibility and modular presentation. This format ensures broad public access while preserving editorial clarity and semantic precision, allowing the research to scale without compromising its methodological rigor.

      The overarching goal remains eventual publication in mainstream print, aimed at advancing a thesis that invites renewed linguistic inquiry into the Sino‑Tibetan continuum. Central to this thesis is the proposition that modern Sinitic lects emerged as branches shaped primarily by the fusion of Taic‑Yue substrata with proto‑Tibetan, culminating in proto‑Chinese and the layered pre‑Qin-Han linguistic strata across ancient China. By analogy, the integration of Yue elements into the Annamese sphere during the Qin and Han periods likely contributed to the genesis of early Vietnamese, crystallizing around the 10th century.

      Perceptually, regardless of the final contours of this narrative, the author asserts that languages must be approached as holistic, living systems, to be understood in their full contemporary complexity rather than reduced to their earliest reconstructible stages, whether 3,000 or 5,000 years prior. This view parallels our treatment of English: not merely as an Indo‑European relic, but as a dynamic amalgam of Anglo‑Saxon, Germanic, Norman, Romance, Latin, Greek, and other influences altogether that have shaped its present form.

      Methodically, this paper's technical arrangement is designed to engage both novices in Vietnamese historical linguistics and specialists in Chinese and Vietnamese philology. Vietnamese learners with a solid understanding of Mandarin (M), known in Vietnamese as "Quanthoại" (QT, sometimes referred to as "tiếng Quanhoả" or 官話 Guanhua), and now officially named as 'Putonghua' (普通話), and a foundational grasp of historical linguistics will likely find this study particularly enriching. While certain explanations may seem overly detailed or repetitive to highlight widely recognized points, already familiar to experts, or occasional gaps might challenge general readers  (普), these choices aim to strike a balance that caters to diverse audiences. Introductory resources on historical linguistics can be found in the bibliography at the end of this paper.

      This introductory section aims to establish rapport with readers and clarify the author's perspective ahead of the detailed scope of this study, forging an academic connection with those seeking insight into the linguistic origins of Vietnamese. It does not present itself as a formal scientific paper replete with data tables and statistical modeling; rather, it offers a narrative exploration of the Vietnamese language, framed under the Vietnamese subtitle Ýthức Mới Về Nguồngốc TiếngViệt—literally, "New recognition on the origin of Vietnamese."

      For readers who remain skeptical yet intrigued by the etymological ties between Vietnamese and Chinese, it may be worthwhile to await the full book publication of this research. Printed works often invite deeper engagement and a more sustained openness to complex arguments. It is unlikely that the full contours of this inquiry will be patiently absorbed in an online format. In print, however, the material may be approached with greater impartiality, supported by quotable evidence—contingent, of course, on the author's success in securing a reputable publisher.

      Human nature tends to favor ideas that resonate with instinctive beliefs. To fully appreciate the insights offered here, readers ideally should possess a foundational understanding of the historical interplay between Vietnam and China. That said, newcomers are welcome, though, provided they bring sincere curiosity to the subject. This research, at its core, is a retelling of history, a compelling linguistic and cultural narrative that may captivate both seasoned scholars and engaged lay readers alike.

      It would be ideal if we shared mutual interests, as this commonality builds trust and belief, akin to the solidarity of believers in the same religion. A new theory typically begins with foundational premises, facts, quotations, supportive evidence, rules, paradigms, analogies, logic, etc., from which we adopt a shared perspective, accepting them as a basis for further discussion as we progress. For instance, if we propose that 雞 (jī, SV kê) =  VS "gà" (chicken) alongside 蛋 (dàn, SV đản) = VS "trứng" (egg) are cognates of indisputable origin, both likely stemming from Yue roots simply because they all exist long before the Chinese, then there is no need to belabor proof. Once accepted as premises, our focus shifts to examining whether the bird originated from the south or the north, or even delving into the age-old question of whether the egg predates the chicken.

      From a historical linguistics standpoint, it is apparent that, in cases of language contact between two groups, the dominant language tends to assimilate the less dominant one over time. The details of this process hinge upon factors such as the prowess, population size, and cultural sophistication of the groups involved. For example, the language of a conquering population, following an extended period of bilingualism, ultimately becomes adopted by the subjugated group (Roberts J. Jeffers, et al. 1979, p. 142).

      While novices in historical linguistics may initially struggle to embrace such reasoning, this reflects the realities of history. Specifically, in the Sinitic-Vietnamese context, the process of Sinicization in Annam spanned hundreds of years. The author has no intention of debating with detractors who have contested his arguments in the past or will do so in the future. Likewise, the author does not aim to recruit followers among those resistant to the Sinitic-Vietnamese theory, as there is no absolute truth in historical linguistics, it is shaped by interpretation and perspective. People's beliefs are often deeply entrenched, guided by instinct or predisposition. Comments such as "Chinazi propagandist!", "Wikipedia sources are unreliable and unquotable!", or "Bogus views!" are predictable reactions from those opposing this perspective, especially in the AI era.

      To such readers, the author encourages them to leave this forum if they do not align with what has been discussed thus far. This work seeks engagement, not discord.

      How did the author arrive at this juncture in his etymological exploration? Admittedly, he is not formally trained as a historical linguist specializing in Vietnamese. Yet his journey began with fortunate exposure to foundational linguistics courses taught by three towering figures in the field: Professors Nguyễn Tài Cẩn, Hoàng Tuệ, and Bùi Khánh Thế, renowned scholars at the former Saigon University during the late 1970s. What began as academic curiosity soon evolved into a lifelong devotion to the study of Vietnamese etymology and its Sinitic underpinnings.

      The author vividly recalls his first assigned project under Professor Hoàng Tuệ: an inquiry into the term "tiếng" ('sound') in Vietnamese. This deceptively simple word encapsulates a constellation of meanings: sound, morpheme, syllable, word, and language. His comparative research into its Chinese counterpart 聲 (shēng, SV "thanh") proved transformative. The semantic breadth of 'shēng', especially its appearance in expressions like 蠻聲 (Mánshēng), referring to "tiếngMôn" in the Shaozhou Tuhua (韶州土話) dialects of Guangdong, Hunan, and Guangxi, revealed a profound linguistic resonance across cultural boundaries.

      For the author, tiếng and 聲 have become the "Đạo" (道 Dào, 'the Way') through which the vaults of Sinitic-Vietnamese etymology are unlocked. This guiding principle has propelled him to examine other Chinese characters whose meanings stretch far beyond their conventional semantic domains. Such is the enchantment of language: a system at once rigid and fluid, historical and living.

      Why does the author advocate so confidently for a Sinitic hypothesis while remaining skeptical of Austroasiatic models? The interplay between Chinese and Vietnamese is intricate, and the author approaches it with a blend of scholarly rigor and tongue-in-cheek candor for that is a  hypothesis rooted in experience and observation.  Few are willing to wade deeply into these debates, which often resist definitive resolution. Linguistic affiliation theories, especially those modeled on Indo-European paradigms, tend to falter when applied to Austroasiatic contexts. Dissenters from orthodoxy are sometimes dismissed as "uninformed", yet linguistics is a field where science, history, and human insight converge. Progress often comes from those bold enough to challenge prevailing narratives.  Fueled by enduring fascination, the author has spent decades immersed in self-directed study of Vietnamese and Chinese historical linguistics. His efforts have culminated in the painstaking construction of an online dictionary of Nôm words of Chinese origin, an annotated repository of Sinitic-Vietnamese etyma built one entry at a time.

      Over the past thirty years, the author's exposure to Chinese has deepened through scholarly engagement and personal life. His mastery of modern Mandarin (' Putonghua' ) has been shaped by daily conversations with his Chinese-native wife, extensive reading of Chinese literature, and regular consumption of Chinese media, from satellite broadcasts to contemporary dramas on YouTube. This sustained immersion has sharpened his insights into the etymological ties between Chinese and Vietnamese.  What captivates him most is the striking proximity, beyond mere lexical overlap, between modern Mandarin expressions and their Vietnamese counterparts in everyday usage. These parallels, observed in sitcoms and colloquial speech, reinforce his conviction that the linguistic bond between the two languages runs deeper than traditional Chinese linguistics often acknowledges. This affinity also surfaces in classical Chinese novels dating back to the 12th century, suggesting a long-standing intertextual and intercultural dialogue.

      The author believes that any Vietnamese scholar fluent in modern Mandarin and equipped with an etymological lens would likely recognize the validity of this perspective. Yet he cautions against reducing Vietnamese to a mere Yue-descendant variant, akin to Cantonese, Fukienese, Zhuang, or Daic. These languages, shaped by centuries of Chinese rule, have undergone extensive Sinicization, often to the point of being subsumed within the Sino-Tibetan classification, especially the modern Kadai-Daic languages. Vietnamese, however, resists such simplification. Its historical trajectory and linguistic architecture demand a more nuanced and independent recognition.

      Admittedly, this endeavor is not without its moments of monotony. The author often finds himself wondering why he has committed so deeply to this pursuit; what is he doing?  There is no material reward awaiting him. Who, after all, truly cares whether a Chinese etymon is of Yue origin, or vice versa? Regardless of the outcome, Vietnamese will likely continue to be classified under either the Austroasiatic or Sino-Tibetan linguistic family. Yet, as long as the author retains the energy and passion to press forward, he says: let us continue this journey together, until the day he can no longer do so.

      Like a pilgrim in search of sacred revelations, the author approaches this etymological journey with wonder and resolve. Each discovery, whether a breakthrough or a setback, deepens his understanding of the Vietnamese linguistic landscape. Years of exploring Chinese historical linguistics have fueled his curiosity about China's linguistic past. This experience, akin to the fascination English learners feel when delving into Greek, Latin, and Romance languages, has broadened his grasp of Vietnamese etymology while enriching his knowledge of Chinese itself.

      In earlier stages of his research, the author accepted the prevailing view among Vietnamese specialists that emphasized the Mon-Khmer connection. That perspective, however, belongs to the past. With time, experience, and sustained inquiry, he has cultivated a more nuanced understanding. As he delves deeper, he observes that Vietnamese shares more linguistic commonalities with Sinitic languages than Sino-Tibetan languages do within their own family. These parallels extend beyond basic vocabulary and into the realm of expressions and structural features that historical linguists use to establish genetic affiliations.

      In fact, the Austroasiatic school focuses primarily on shared elements between Vietnamese and Mon-Khmer languages. Yet it overlooks key findings in Sino-Tibetan etymology, particularly the basic words that appear consistently in both Chinese and Vietnamese over time, which Austroasiatic theorists often attribute to Mon-Khmer origins. In light of the many Vietnamese words that align etymologically with Sino-Tibetan languages, the proposed Mon-Khmer connections lack essential linguistic features, including disyllabicity and tonality, hallmarks shared between Vietnamese and Chinese. These traits overwhelmingly outweigh the evidence presented by Mon-Khmer proponents within the Austroasiatic camp.

      If Vietnamese linguistic characteristics were systematically tabulated and compared in detail alongside those of Chinese historical linguistics, it would become evident that Vietnamese is, in many respects, a modified form of Chinese. This conviction has driven the author to sort through these complexities and compile this work over more than 25 years. The Sinitic-Vietnamese theory he proposes is not based solely on a comparative list of over 400 fundamental cognates with Sino-Tibetan etymologies, which will be elaborated in the chapter addressing Sino-Tibetan. It is also supported by extensive evidence from anthropology, archaeology, and historical records.

      Due to the scarcity of historical documentation, Austroasiatic specialists are often left to speculate strictly based on linguistic sound change rules. Consequently, their focus has shifted toward comparative analyses of Mon-Khmer basic words and many of which have gradually been reclassified as belonging to other linguistic families. In the absence of conclusive linguistic proof, they have sometimes redirected their attention to neighboring languages, a tendency that has undermined the validity of their arguments concerning Vietnamese-Chinese etymological connections.

      Consider, for example, the Vietnamese word "vịt" ('duck'). It lacks cognates in Mon-Khmer languages. To address this, Austroasiatic scholars have proposed a connection to the Thai word เป็ด /pĕd/, despite knowing that Thai descends from the Daic languages, which in turn originate from the Taic family, that is, the same lineage that gave rise to the Yue languages, ancestors of the ancient Viet-Muong language!

      If, however, we hypothesize an etymological link between "vịt" and Chinese 鴨 yā (SV "áp"), historical records may offer supporting evidence. Dong Zuobin (董作賓, 1933), in Discussing Tan (《譠》), p. 162, references a location in the Tan State of the Shang Dynasty (present-day Shandong Province) called 武原城 Wǔyuánchéng (Vietnamese: Thành Vũnguyên). Locals referred to it as 鵝鴨城 Éyāchéng (Vietnamese: Thành Nganvịt, literally 'Citadel of Ducks and Geese'), likely due to phonetic resemblance in their dialect at the time.

      This historical dimension of Sinitic etymology, exemplified by cases like "vịt" and "ngan" (also "ngỗng", both of which can be used to reconstruct words such as 源 yuán for "nguồn" and "dòng"), underscores the depth of the Chinese-Vietnamese linguistic connection, an area where the Austroasiatic Mon-Khmer hypothesis falls short.

      Built upon the historical framework outlined above, and expanded in subsequent chapters, this research offers a comprehensive account of how the modern Vietnamese language evolved—both diachronically and synchronically—and how it relates to other Sinitic segments within the Sinitic sub-branch of the Sino-Tibetan linguistic family.

      Another key contribution from the author is a clarification regarding the use of the term 'Sinitic', used here as a practical convention to denote elements associated with Chinese linguistic and cultural domains. As previously noted, 'Chinese' is not an ethno-religious designation like 'Jewish', but rather a cultural construct, akin to 'America'. From a linguistic standpoint, the concept of 'Chinese' as from 'China' is a unified polity only emerged following the establishment of the Qin Dynasty, when a variant of proto-Tibetan was layered atop preexisting Taic-Yue substrates. The language we now call Chinese was named after this political consolidation, and thus carries a distinct historical trajectory intertwined with the evolution of the Middle Kingdom.

      The term "Sinitic", or 'Chinese' in broader usage, derives from the unification of ancient states into the Qin Empire, an event comparable in scope to the formation of the European Union in modern times. During this period, vestiges of indigenous Taic-Yue linguistic elements permeated the emerging imperial lexicon, whether acknowledged by Chinese linguists and Sinologists or not. Southern Yue languages, including Cantonese, Fukienese, and Wu, have since been institutionally classified within the Sino-Tibetan linguistic family, often through official imperial decree. Lexicographically, 'Chinese' has come to encompass all these dialects as part of a unified linguistic identity (see Tang Lan, 1965, p. 184).

      Had history unfolded differently—say, if the Chu state had triumphed over both Qin and Han in their decisive campaigns—China might today be known as "Chu". Historical records suggest that Chu was a Daic-Yue polity, likely of Taic origin, ancestral to the Daic and Zhuang peoples. Its population may have spoken a variant of an ancient 'Chu-nese' language, rather than the form now conventionally labeled 'Chinese'. Likewise, had the NamViệt Kingdom succeeded in overtaking the Han Empire, the dominant term might have become 'Việt', or /Jyut6/, or 'Yue' as pronounced in Mandarin (see Lu Shih-Peng, 1964; Bo Yang, 1983–93).

      In essence, 'Chinese' is not a fixed ethno-racial identity, but a cultural construct confined within the evolving boundaries of the Chinese polity. Its name may shift with regimes, but its linguistic continuity transcends nomenclature. These hypotheticals underscore the contingency of naming: terminological conventions are shaped by historical victors and political consolidation, while the deeper linguistic substrate often endures across dynastic transitions. For instance, during the Manchu Qing dynasty (1644–1911), the polity was officially designated as "Qing", yet its linguistic and cultural core remained recognizably Chinese because the Manchurians were a part of it. 

      Consider further the hypothetical in which Imperial Japan had won World War II and rebranded the Middle Kingdom as "Dai Đông Á" (Great East Asia). In such a scenario, the term 'Sinitic' might have been supplanted by an entirely different designation, perhaps 'not-X'. This thought experiment illustrates how linguistic nomenclature is often a product of convenience, necessity, and power, rather than intrinsic linguistic reality. When juxtaposed with Taic-Yue or Austroasiatic Mon-Khmer frameworks, such naming conventions can obscure deeper continuities. Ultimately, the linguistic essence transcends the labels imposed upon it.

      Regarding the integrity of this survey, the author affirms that it is an original and human-authored work, especially in light of its digital format as an ongoing research project. AI serves only as a tool for final proofreading and surface-level editing. Without the author's creative intellect and scholarly commitment, this work would not, and could not, exist.

      It is worth acknowledging the skepticism some readers express toward academic studies published exclusively online, often dismissing them as “bogus” or likely AI-generated. While digital formats offer undeniable advantages in accessibility and scalability, concerns about reliability and longevity remain valid. Online works are subject to constant revision, and their long-term availability is far from guaranteed. Over time, websites may vanish from search indexes due to inactivity, or disappear altogether when hosting services lapse or accounts go unpaid.

      In this context, the author's decision to publish incrementally online reflects both necessity and intent: to share findings in real time while preserving the human voice behind the research. The enduring value of this work lies not in its format, but in the originality of its insights and the rigor of its methodology.

      This document should be regarded as a prelude to the forthcoming printed edition. In the realm of linguistic inquiry, no conclusion is ever truly final, and this research is no exception, regardless of whether it ultimately appears in bound form. The author maintains that readers are generally less inclined to engage with an online publication in its entirety, as they might with a physical volume acquired at considerable expense.

      In practice, the author approaches the reference works cited in the bibliography with similar reverence, though, as previously noted, the bibliography remains incomplete. Hundreds of titles, meticulously arranged across his personal bookshelves, are consulted with care and deliberation, forming the intellectual scaffolding upon which this study rests.

      This paper adopts a nontraditional approach by not devoting an entire section to exhaustively listing all sound change rules, natural or conditioned, between Sinitic-Vietnamese and Chinese loanwords. Such comprehensive treatments, as exemplified in Nguyễn Tài Cẩn's studies of the Sino-Vietnamese sound system (1979, 2000, 2001), are often expected in research of this scope. Instead, readers will encounter a synopsis of phonological patterns illustrated through examples and concise commentary. The emphasis is placed on irregular or distinctive sound correspondences, such as the ¶ /y- ~ b-/ pattern: 由 "bởi" (because), 油 "béo" (greasy), 邮 "bưu" (post), 柚 "bưởi" (pomelo), 游 "bơi" (swim), all pronounced /yóu/ in Mandarin. Another example is 公母 (gōngmǔ), which corresponds to Vietnamese expressions such as "trốngmái" (male and female), "sốngmái" (life-or-death struggle), or "vợchồng" (husband and wife).

      Should this work later prove to have academic value, specialists in specific fields, such as lexical data tabulation and categorization, can undertake the task of establishing possible sound change patterns and formulating their corresponding rules. This type of endeavor is extraordinarily detailed, if not inherently complex, given that frequency-dependent sound changes tend to occur in synchrony and are often irregular, rather than uniformly systematic as observed in, for example, Germanic languages. While such phenomena are not uncommon, these irregularities are particularly pronounced in the Sinitic-Vietnamese context.

      Readers inclined to skim for illustrative examples may freely navigate between sections or pursue areas of personal interest. In doing so, they will encounter scattered yet thematically linked instances throughout the text. However, to fellow scholars, the author offers a word of caution: please avoid quoting passages out of context or drawing conclusions from isolated errors or incomplete datasets. Such imperfections are inevitable in a work still undergoing revision, and occasional typographic lapses may persist. Premature judgments, such as those the author has previously endured, often result in unwarranted criticism. One notable example involved an exploratory link between 將 ( jiāng) and Vietnamese " sẽ" ('will'), which was dismissed as "unreliable'" and "bogus" by a linguistic forum due to a misalignment between " nướctương" 醬油 ( jiāngyóu , 'soy sauce') and " xìdầu" 豉油 ( chǐyóu , 'bean sauce'), an error stemming from careless data handling. In such cases, readers may be tempted to infer exceptional sound change rules, such as " jiāng" ~ " sẽ" via the speculative pattern ¶ /j- ~ s-/, /-iang ~ -Ø/. Yet a single misstep does not invalidate the broader inquiry into phonological correspondences.

      High-profile etyma requiring detailed treatment may unavoidably occupy substantial space. Exceptional or anomalous cases often resist neat categorization and highlight why enumerating sound change rules can become unwieldy, sometimes warranting independent study. These irregularities, which do not generalize across similar phonological environments, demand careful deliberation. The goal is to equip readers to either interpret such subtleties through conventional linguistic frameworks or explore emergent patterns via unconventional heuristics. Ultimately, this endeavor underscores the speculative nature of historical phonology and the interpretive latitude inherent in linguistic reconstruction.

      Rather than presenting exhaustive lists of mechanical sound change rules, often overlooked or unread, we will prioritize engaging case studies and targeted examples. These will illuminate the specific processes by which conclusions regarding Sinitic-Vietnamese etyma have been reached. By venturing beyond the well-trodden paths of frequently cited correspondences, readers are invited to navigate the complexities of sound change and cultivate the analytical tools necessary to extract and apply linguistic rules independently.

      While regularity governs most phonological transformations, this research foregrounds examples involving Chinese lexemes in their diverse forms and phonetic variants, many of which have permeated the Vietnamese lexicon since antiquity. This linguistic infiltration spans multiple historical phases, notably the millennium following 111 B.C., when the Annamese region was under Chinese rule until its liberation in 939. During the Ming Dynasty's incursion in 1410, Mandarin briefly reemerged as the official language, playing a prominent role in diplomatic and administrative exchanges with the Chinese imperial court. (囯)

      Phonetically, there are instances where sound changes have given rise to multiple Vietnamese variants of a single etymon. Comparatively, similar cases can be observed in Japanese Kanji and Go-on readings for individual Chinese words. Take 道 dào ('way') as an example. In Vietnamese, we can identify several distinct "readings" that convey different concepts, interestingly, most of which correspond to the range of meanings found in the Chinese equivalents. For instance:

      • 'đạo' (way, religion, sect, morals, skill, line),
      • 'dạo' (time),
      • 'đường' (road, line),
      • 'nẽo' (path),
      • 'nói' (speak),
      • 'bảo' (tell),
      • 'tưởng' (suppose), etc.,

        Each of these Vietnamese words may seem like a translated version of the Chinese word, but this is not necessarily the case. Rather, each derived Sinitic-Vietnamese form is a variant that is cognate with the same Chinese etymon 道. This phenomenon would be easier to understand if the old Chinese-based Nôm characters were still widely used in Vietnamese writing. Unfortunately, this was not always the case, especially given that modern Putonghua syllables are shorter than their Middle Chinese counterparts.

        The phonological change rules illustrated in this paper are neither exhaustive nor intended as definitive references. As this research remains a work in progress, it continues to undergo revision and refinement, with plans for a first print edition to reach select university campuses—ideally those with active communities of historical linguists. The methodologies presented here are exploratory and suggestive rather than conclusive, though their foundational principles remain consistent unless explicitly revised.

        Given the evolving nature of this study, the demonstrated approaches should be understood as practical models—examples of how the author has applied two innovative etymological frameworks to generate preliminary results. Readers will observe the investigative process used to identify Vietnamese words of Chinese origin (Sinitic-Vietnamese [VS]) and, in turn, gain the tools to replicate this process with confidence and clarity.

        These newly developed methodologies have proven effective in uncovering the etymology of Sinitic-Vietnamese words and in formulating tentative sound change rules, tracking transformations between forms, or identifying 'what changes into what.' For instance, this approach underpins the analysis of 道 dào, as previously discussed. Readers will have the opportunity to apply these techniques in later chapters, particularly through the worksheets provided in Chapter 13 . They will also encounter a curated selection of Sinitic-Vietnamese etyma—a small but meaningful subset of the broader findings presented in this research. 

        Caution is warranted when interpreting loanwords among the examples presented. As a general rule, if a Vietnamese word closely resembles its Chinese counterpart in both phonological form and semantic meaning, it is likely a direct loan. Recognizing such cases is essential for distinguishing inherited etyma from later borrowings and for maintaining analytical precision throughout this study.

        While the linguistic resemblance between Vietnamese and Chinese will be addressed in greater detail in later chapters, it is worth noting here that their structural and lexical affinities are significantly closer than those observed between Chinese and many other Sino-Tibetan languages. The term Sinitic-Vietnamese (VS), also referred to as HánNôm (漢喃), encompassing both Hán and Nôm strata, is used to denote either Vietnamese words of Chinese origin or cognates shared by both languages that descend from common ancestral roots. Examples include "sông" 江 (jiāng, 'river'), "ngà" (牙 yá, 'tusk'), and "dừa" 椰 (yé, 'coconut').

        Among their shared linguistic features, beyond morphonological and semantic parallels, nearly every linguistic trait present in Chinese finds an equivalent in Sinitic-Vietnamese. These features are so deeply embedded in Vietnamese usage that they are often mistaken for indigenous Vietic words or regarded as 'pure' Vietnamese. Some are considered quasi-Sino-Vietnamese variants, especially those represented by Nôm characters incorporating Chinese components.

        For the Sinitic-Vietnamese etyma investigated here and identified as having Chinese roots, such conclusions are based on holistic alignment with Chinese linguistic attributes. These include phonetic and morphemic structure, phonological and semantic traits, syntactic and lexical parallels, tonal systems, CVC syllabic architecture, and grammatical arrangements in sentence construction.

        The closer a Vietnamese word resembles its Chinese counterpart, the more likely it is to be a loanword. However, this research also examines whether resemblance necessarily implies borrowing. For example, the Vietnamese "tếu" 'funny' may be hypothesized as a loan from 笑 xiào (SV "tiếu" 'laugh'), which is cognate with VS "cười". Alternatively, "tếu" may be cognate with 逗 dòu /tow⁴/ 'tease', SV "đậu" /ɗɐw⁶/, where the voiced /ɗ-/ reflects an older development and the unvoiced /t-/ a more recent one. This word may have been reintroduced into Middle Vietnamese via spoken Mandarin, likely during the Ming Dynasty. Readers may compare the contemporary usage of 逗 in Chinese with its appearance in classical literature such as Dream of the Red Chamber (紅樓夢 Hónglóumèng).

        With the findings presented in the Sino-Tibetan chapter, including the genetic affinity demonstrated through shared linguistic peculiarities and cognates, it becomes increasingly plausible to reconsider Vietnamese as part of the Sino-Tibetan linguistic family. Such a reclassification could be achieved through the methodologies outlined in this research, which adopt broader and innovative approaches. These can be applied alongside existing tools from Chinese historical linguistics, offering insights into Vietnamese etymology across disciplines such as anthropology, archaeology, and history, particularly regarding the origins and biological composition of the Vietnamese people and their state. The underlying premise is that populations of shared racial ancestry tend to speak variant languages of common origin.

        Throughout this paper, each etymon is accompanied by its corresponding Chinese character and pinyin (拼音) transcription to facilitate sound identification. In many cases, the pinyin alone suffices and may be less visually distracting than the character itself, especially when the character is constructed with "giảtá" (假借) or  'loangraph', which requires readers to decipher embedded phonetic codes.  A loangraph refers to a Chinese character borrowed solely for its phonetic value and repurposed for a different concept. For example, the Vietnamese "lại" 來 (lái, 'come') may have originally been associated with "lúa" ('paddy, millet, grain'). If loangraphs were transcribed only in pinyin, they might resemble English homophones with divergent meanings such as 'yard', 'glass', 'page', and 'lie'.

        Pinyin, the official romanization system of the People's Republic of China for transcribing Mandarin (普通話 pŭtōnghuà, 'national language'), has gained widespread global adoption, including in Taiwan, which began integrating it nearly three decades ago.

        For accurate sound transcription, this study primarily employs the International Phonetic Alphabet (IPA). IPA symbols are used to represent dialectal and archaic pronunciations, as well as precise phonetic values, enclosed in square brackets ["xxx"], in contrast to approximate phonemic values indicated by slashes "/xxx/". This distinction helps clarify subtle phonetic nuances in the cited lexicons.

        Examples include:

        • 'dung' [juŋʷ1], [jowŋʷ1], [zʊŋʷ1] /zowng1/ (not [duŋ1])
        • 'thìn' [t'ɨjn2], /tʰɤjn2/, /tʰɨn2/, /tʰejn2/ (not [thin2])
        • 'thu '[t'ʊ1], /thow1/, /tʰʊ1/ (not precisely [thu:1] /thu:1/).

          These distinctions are especially relevant in cases involving diphthongs, where comparative analysis depends on capturing fine phonemic variation. For instance:

          • 'tin' [tin1], /tin1/, /tɪn1/ (not [tɤjn1] /tein1/) (音)

          To streamline typographic presentation, phonetic symbols may be rendered in simplified forms such as [-ow-] and [-ejn], or alternatively as /-ou-/ and /-ein/, when the intended sound values are contextually clear and unambiguous. This convention will be applied consistently across other phonetic environments, with supplementary notes and examples provided throughout the text to ensure clarity and continuity.

          In many instances, IPA transcriptions offer a more precise reflection of Vietnamese phonetic values, especially in relation to Chinese character correspondences—than conventional pinyin. For example:

          • Pinyin d aligns with [t]

          • Pinyin t corresponds to [tʰ] or /th/

          • Pinyin r maps to /j/

          • Pinyin gu and ku are phonetically realized as [ku] and [kʰu], not [gu] and [ku], respectively

          This  transcriptional approach parallels the methodology employed by Pulleyblank (1984) in his reconstruction of Old Chinese (OC), where he explored phonetic values ambiguously recorded in classical annals and inscriptions.

          To avoid typographic clutter and potential confusion with IPA diacritics, tonal numerals (ranging from 1 to 9) will be appended to each phonetic form. These numerals indicate tonal categories across various Chinese dialects—such as Cantonese (Guangzhou), Fukienese (Hokkien, Fuzhou, Amoy), Teochew (Chaozhou), and Hainanese—as well as other regional languages including Daic, Thai, and Vietnamese. This system ensures both phonetic precision and cross-linguistic comparability.

          Tonal numeral symbols are conventionally used in the transcription of Cantonese, Fukienese, and other Chinese dialects to indicate pitch contours and tonal categories. In the case of Vietnamese, tones are annotated following the traditional eight-tone framework—more precisely, a system of four tonal categories bifurcated into upper and lower registers. This structure is rooted in classical sources such as the Guǎngyùn 廣韻, Jerry Norman’s Chinese (1988, p. 55), and foundational Vietnamese linguistic studies, notably Nguồn gốc và Quá trình Hình thành Cách đọc Âm Hán-Việt (“The Origin and Transformational Process of the Sino-Vietnamese Pronunciation”) by Nguyễn Tài Cẩn (1979, 2001).

          The tonal categories are as follows:

          1. , 3. ʔ 5. ´ 7. ´ -p, -t, -c, -ch

          2. ` 4. ~ 6. . 8. . -p, -t, -c, -ch

          The use of tonal numerals will be limited and reserved for cases where clarification is essential, particularly to prevent misinterpretation across Chinese dialects. Tonal values assigned to the same numerical markers often vary significantly between dialects. For example, Mandarin tones (1, 2, 3, 4) differ markedly from those in Cantonese (1, 2, 3, 4), as documented by Wang Li et al. (1953), and diverge further from Vietnamese tonal conventions.

          To maintain clarity in Vietnamese phonetic transcription, modern diacritics will be the primary notation system, used alongside IPA symbols, for instance, [à], [ả], [ã], etc., except where such usage risks confusion with IPA phonetic values (e.g., nasalized /ã/). For precise tonal interpretation, readers may consult Quốcngữ diacritics for Vietnamese or Pinyin tone marks for Mandarin (e.g., ā, á, ǎ, à, a), both of which offer distinct tonal representations despite superficial visual overlap.

          In select cases, tonal markings will be deliberately omitted. This reflects the author's view that tonal values in many Sino-Vietnamese and Sinitic-Vietnamese forms, like their Chinese dialectal counterparts, have undergone extensive historical shifts. These tonal evolutions, often cyclical and unpredictable, lack a universally reliable rule for reconstruction. In some instances, tones may even revert to their original contours as they existed at the time of lexical absorption into Vietnamese. Such tonal fluidity is well attested in Chinese historical phonology, alongside other systemic changes such as shifts in initial consonants and syllabic finals (see Chao Yuen-Ren, Tone and Intonation in Chinese, 1933, pp. 119–134).

          Phonemically,  Phonemically, Vietnamese initial and medial consonants exhibit a range of articulatory values that are not always transparently reflected in the orthography. For instance, the following correspondences are commonly observed:

          • b- → [ɓ]
          • d- → [ɗ]
          • ch- → [ʨ]
          • kh- → [kʰ]
          • ph- → [pf]
          • r- → [ʐ]
          • th- → [tʰ]
          • tr- → [ʈ]
          • nh- → [ɲ], occasionally rendered as ɲ-, jn-, or nh- depending on typographic or contextual constraints

            Similarly, vowel clusters such as  -uy  and  -iê  are more accurately transcribed in IPA as [wej] and [iə], rather than [wi] and [ie], reflecting the true phonetic realization rather than the orthographic approximation. Vietnamese spelling conventions often obscure these distinctions, particularly in final consonant environments.

          To ensure clarity and consistency, final consonants will be transcribed using the following IPA representations:

          • -p → [p]
          • -t → [t]
          • -ch → [jt]
          • -c → [k]
          • -nh → [jŋ]

          In cases involving labiovelar articulation—especially when preceded by a rounded vowel (e.g., o-, ɔ-) or a glide medial (-w-), the following transcriptions will be used:

          • [-kʷ] → -kw, -wk, or -kʷ
          • [-ŋʷ] → -wŋ, -ŋw, or -ŋʷ

          The velar nasal / ng/ will be rendered as either [ŋ] or [ng], contingent on its phonetic environment and the need for typographic clarity. These conventions will be applied systematically throughout the text to maintain phonological precision and editorial coherence.

          Subsequent chapters elaborate on all preceding elements and extend each example through polysyllabic grouping across Chinese, pinyin, and Vietnamese. This includes:

          • Detailed correspondences with Middle Chinese finals and tonal categories
          • Chronologically layered borrowing trajectories
          • Diagnostic markers of Yue substratal influence
          • A comprehensive polysyllabic lexicon, indexed by Chinese characters, pinyin forms, and Vietnamese equivalents

          The overarching objective is to produce a synthesis that is ready for publication—methodologically rigorous, typographically exact, and fully transparent in its analytical claims.

          CONCLUSION

          The title What Makes Chinese So Vietnamese? distills the chapter’s central thesis: long before the emergence of a unified Chinese identity, there was Yue, and its imprint remains indelibly woven into the phonological and structural fabric of modern Vietnamese. Sinitic-Vietnamese is not merely a collection of borrowed forms; it constitutes a naturalized linguistic heritage, accessible through the lens of polysyllabicity and comparative historical methodology.

          Drawing from Sino-Tibetan research that reveals patterned cognates and structural parallels between Vietnamese and Chinese, this work invites a reconsideration of Vietnamese as a potential member of the Sino-Tibetan family. Such a reclassification would not rely solely on traditional comparative methods, but rather on the expanded, polysyllabic framework developed herein, one that complements and extends the tools of Chinese historical linguistics. The implications reach beyond linguistics, offering new perspectives for anthropology, archaeology, and population history, especially regarding the deep ancestral ties and sustained cultural contact that often underlie linguistic convergence.

          The practical value of Chinese linguistic studies is well established, with global institutions dedicating significant resources to its advancement. This momentum can, and should, be leveraged to benefit Vietnam in three key areas. 

          First, Vietnamese etymological research stands to gain by integrating Sinitic-Vietnamese studies into the broader cognitive and methodological domain outlined in this work.

          Second, the structural parallels between Chinese and Vietnamese, particularly the shared polysyllabic tendencies, suggest a compelling model for orthographic reform. Chinese, now formally recognized as polysyllabic and reflected in its Pinyin system, offers a precedent for modernizing Vietnam’s outdated monosyllabic script. Originally devised in the 18th century by European missionaries for religious dissemination, the current Vietnamese writing system remains largely unchanged, functioning like a colonial-era locomotive dragging its digital-age passengers across deteriorating tracks. A polysyllabic reform would not only enhance cognitive accessibility but also align Vietnamese orthography with its linguistic reality.

          Finally, the findings presented here lay the groundwork for a new generation of Vietnamese lexicography—one that includes etymological explanations for each entry, a feature conspicuously absent from existing dictionaries. Such a development would represent a transformative leap forward in Vietnamese linguistic scholarship, anchoring future research in a framework that is historically grounded, methodologically sound, and intellectually expansive.


          SYMBOLS AND CONVENTIONS

          Finally, a few housekeeping details regarding terminologies, conventions, and classifications will be addressed to ensure a consistent framework for discussing Sinitic-Vietnamese subjects throughout this work.

          In this paper, the author will employ conventions commonly utilized in the field of historical linguistics, alongside their alternate usages and some custom-made symbols of his own design. Readers are expected to already have familiarity with standard linguistic symbols, the International Phonetic Symbol (IPA), and Vietnamese orthography (Quốcngữ) (Q).

          At the conclusion of this research, you will find an extensive bibliography listing references. For books in print, many of these may still be accessible in the libraries of academic institutions across the United States. While numerous related linguistic websites are valuable resources, some may eventually become unavailable over time. Readers seeking outdated URLs cited in this research may refer to the Internet Archive (https://archive.org/) (Syntax: https://web.archive.org/web/http:..../ that will display all posting history.)

          Abbreviations and acronyms will be defined upon their first introduction. To improve clarity, examples cited within paragraphs will be wrapped onto separate lines and numbered or bulleted (•). Lengthier comments on patterns of sound change and the evolution of Vietnamese words under examination will be enclosed in square brackets, such as [xxx yyy zzz], to provide detailed explanations supporting arguments related to the listed etymologies. This is, after all, the central purpose of this research. English translations of cited vocabulary will be included as needed following each term, though these may not be exhaustive. In cases where translation is irrelevant, it may be omitted.

          The commonly used symbols include

          • ">" indicating "evolves into" (diachronically)
          • "<" signifying "derived from" (diachronically)
          • "=>" meaning "giving rise to" (by a phonetic rule)
          • "~>" representing "giving rise to" (by analogy)
          • "<=" indicating "built with"
          • "~" denoting "alternating with," "correspondent to," or "cognate to" (synchronically)
          • "$" used for literary forms, as opposed to vernacular or colloquial usage
          • "#" signifying metathesis or reverse order ("iro")
          • "@" marking association, assimilation, or sandhi processes ("liên tưởng," "đồng hóa")
          • "©" indicating archaic or obsolete usage ("cổ")
          • "®" signifying contraction, clipping, sound dropping, or deletion ("rụng")
          • "§" used for comparison ("so sánh," cf., confer)
          • "¶" or |P signifying patterns of sound change
          • "%" representing possible alternatives
          • "&" indicating combination with
          • "/" indicating conditional change (e.g., x > y /_V#) or alternation (x/y)
          • "\" representing alternative influence or equivalency with "/"
          • "[xxxx]" denoting exact phonetic value or providing explanatory notes
          • "/xxxx/" marking approximate phonetic value
          • Nasalized vowels: /ã/, /ẽ/, /õ/, etc.
          • Vh indicating "Vietnamized," i.e., localized
          • Capital letters like "X-, -Y-, Z-" symbolizing consonant articulation classes in phonetic transcriptions, e.g., /P-/ for labial sounds
          • "=" indicating equivalency or equivalence
          • "(?)" denoting unidentified or unknown sources
          • /-ʔ/ marking guttural endings or syllabic breaks in diphthongs or triphthongs
          • /Ø-/ marking guttural initials such as ŋ- or ʔ-
          • "|" signifying syllabic division, homonymous elements, opposition, or parallel forms (also used as || for separation)
          • "{x- ~ y-}" marking conditioned sound changes or interchanges
          • "" and "*" indicating hypothetically reconstructed forms: * for ancient (AD to Middle Age) sounds and ** for archaic Proto-forms (B.C., potentially pre-historic).

            Most images, maps, and illustrations are either original creations by the author or sourced from publicly available resources, including Wikipedia. These are licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation License (GFDL) .

            x X x


            ENDNOTES


            (車)^ According to Starostin, in Middle Chinese 車 also reads /tʂa/, FQ 尺遮 (whence Mand. chē, Viet. xa), but this reading is rather recent (judging from rhymes in Guangyun 廣韻, not earlier than Eastern Han) and must have stemmed from some Old Chinese (OC) dialect. Vietnamese has also a colloquial loan from the same source, that is "xe" /sɛ/. If the reconstruction is indeed *kla, one could think of an early borrowing from OC, hence, "cộ". Meanwhile, interestingly, there exist also 檋 jù (SV cục) as "cộ" cognate to variants 檋, 輂, 輁, 梮) jù where the former characters having the phonetic M 車 chē, jū, jù [ MC kʊ < OC *kla ].

            Vietnamese also contains a colloquial loan from the same phonetic lineage, xe (/sɛ/). If OC reconstruction indeed posits */kla/, this suggests an early borrowing, accounting for the Vietnamese cộ. Additionally, cộ finds a cognate in 檋 jù (SV cục), along with its phonetic variants 檋, 輂, 輁, and 梮 jù, where the initial characters share the phonetic foundation 車 chē, jū, jù [ MC cʰia, kɨə̆ < OC *kʰlja, *kla ] with the latter lexicons are late development, as usual, of word-formed module {ideographic radical + signific stem}.

            (蕃)^ "An exonym for Tibet that appeared in Tang dynasty. Some scholars argue the second syllable, 蕃, was originally read with the -n coda in Middle Chinese (i.e. pʉɐn or bʉɐn, the former of which regularly gives rise to modern Mandarin fān). They argue that the modern Tǔbō reading is recent, possibly originating from French sinologist Jean-Pierre Abel-Rémusat's (1788-1832) argument that the second syllable should be pronounced this way to match Old Tibetan བོད་ (bod, "Tibet") (Pelliot, 1915). Rhymes in poetry from Tang and Yuan dynasties also suggest that the second syllable 蕃 was read with the -n coda during those times (Yao, 2014). " (See 吐蕃 - Wiktionary)

            (工)^ For example, the aboriginal form /krong/ is cognate with both VS sông (river) and Chinese 江 jiāng (Cantonese: /kong1/). While the former is accepted as presented, the latter can be substantiated through its phonetic stem 工 gōng (SV công). This phonetic correlation requires no further proof, as the variant pronunciation derived from 工 /kong/, which contributes to the phonetic structure of 江 jiāng, reinforces the legitimacy of the Yue pre-existing etymological root.

            (百)^ "Bod" is simply another form of the name “Bak,” as in 百姓 (Baixing), 百越 (Bách Việt or Bai Yue), discussed by Lacouperie (ibid., see Chapter 9):

            "Bak was an ethnic and nothing else. We may refer, as proof, to the similar name—rendered, however, by different symbols—which they gave to several of their early capitals: PUK, POK, PAK, all names known to us after ages, and whose similarity to Pak and Bak cannot be denied. In the region from which they had come, Bak was a well‑known ethnic name; for instance, Bakh in Bakhdhi (Bactra), Bagistan, Bagdada, etc., and it is explained as meaning ‘fortunate, flourishing."

            This interpretation aligns with what the same author discusses in Chapter Six (Lacouperie, ibid., pp. 116‑119) concerning the ancestral Bak of the early Chinese, in contrast to the pre‑Chinese populations.

            (M)Linguistic Considerations in Transliteration: In this paper, all transliterations of historical names follow Mandarin pronunciations for ease of reference, though their modern phonetic forms may not accurately reflect how they were originally spoken. For instance, the contemporary

              • Yue (越, 粵, 戉, 鉞) → Viet
              • NanYue (南越) → NamViệt
              • OuYue (歐越) → ÂuViệt
              • Annan (安南) → Annam
              • LuoYue (雒越) → LạcViệt
              • MinYue (閩越) → MânViệt
              • DongYue (東越) → ĐôngViệt
              • WuYue (吳越) → NgôViệt

              Additionally, phonetic reconstructions vary, and not everyone agrees on the ancient pronunciation. Some scholars propose /Viet8/, while others favor /Jyet8/ or /Jyut6/. This uncertainty is reflected in modern Vietnamese dialectal pronunciation, where Việt is articulated differently in the southern sub-dialect, alternating between /v-/, /j-/, and /z-/.

              (C)Hanzhong: In the Qin dynasty the area was governed as the Hanzhong Commandery, whose seat was in current day Nanzheng County, south of the Hanzhong urban area.[5] In 207 BC, the Qin dynasty collapsed. Liu Bang, who would later become the founding emperor of the Han dynasty, was made lord of Hanzhong. He spent several years there before raising an army to challenge his arch-rival, Xiang Yu, during the Chu–Han Contention. In 206 BC, after the victory at Gaixia, Liu Bang named his imperial dynasty after his native district, as was customary. However, he chose Hanzhong rather than his birthplace Pei County (present-day Xuzhou, Jiangsu Province). Thus, Hanzhong gave its name to the Han dynasty. (Source: Wikipedia)

              (X)Political Influence on Linguistic Policy: The People's Republic of China's language policies under Xi Jinping's administration (beginning in 2017) explicitly restricted local TV programs from broadcasting in regional dialects, mandating exclusive use of Northern Putonghua. This exemplifies political intervention in linguistic development, a subject explored in greater depth in forthcoming chapters.

              (華)Yue Loanwords in Chinese: Examples of Yue-derived loanwords in Chinese include:

                • đường:  táng (sugar)
                • dừa: 椰  (coconut)
                • trầu: 柄榔 bīngláng (betel nut, cf. Muong blau)
                • sông: 江 jiāng (river, cf. Muong krong)
                • chó: 狗 gǒu (dog, cf. Proto-Vietic klo).

              (H)Persistence of 'Annamese' in Hainanese Speech: The term 'Annamese' (安南話) remains in use within Hainanese speech, pronounced as /A1nam2we1/. Hainanese is a MinNan sub-dialect, part of the Fukienese (Hokkien, Amoy) linguistic group, spoken by inhabitants of Hainan Province, China.

              (W)^ On the sidelines, as the place name Tràngan appears multiple times in linguistic discussions throughout this paper, it is worth noting its historical and cultural significance. Located in present-day Ninhbình Province in northern Vietnam, where the first king of the Lê Dynasty established his initial capital, Tràngan is renowned for its breathtaking waterways, framed by towering limestone karsts.

              (秦)^ (1) Tần, (2) Chệt. (3) Tầu, (4) Tàu 秦 Qín (Tần) [ M 秦 Qín < Middle Chinese tʂjin < OC *tʂin | Chinese dialects: Cant. ceon4, Hẹ cin2, Tn ćhiẽ 12, Ta ćiẽ 12, Dc ćhĩ 12, Nx chin12

              Kangxi Dictionary: Entry for Qin (秦):

              Ancient Forms and Pronunciation:
              In Tang YunGuang YunJi YunLei PianYun Hui, and Zheng Yun: Pronounced qín, with fanqie reading 匠隣切 (jiang lin qie) or 慈隣切 (ci lin qie). Sound: qín (螓).
              Definition:
              A country name.
              According to Shuowen Jiezi, Qin was the territory bestowed upon the descendants of Bo Yi. It is fertile land suitable for grain cultivation.
              In Book of Songs·Qin Feng·Che Lin Commentary, Qin refers to a valley name in Longxi, located northeast of Bird Rat Mountain in Yongzhou.
              Annotations: Today, it is Qin Pavilion and Qin Valley.
              Historical Context:
              During the Spring and Autumn Period, the State of Qin existed. The Han Dynasty established Tianshui Commandery there, which was later renamed Qinzhou during the Northern Wei Dynasty.
              In Shiming, Qin means "crossing" (津), as its terrain is fertile and enriched with moisture.
              Three Qin:
              In Records of the Grand Historian·Xiang Yu: Xiang Yu divided Guanzhong into three regions, granting the surrendered generals titles:
              Zhang Han as King of Yong (雍王),
              Sima Xin as King of Sai (塞王),
              Dong Yi as King of Zhai (翟王).
              Together, they were referred to as Three Qin.
              DaQin (大秦): 
              In Later Han·Records of the Western RegionsDa Qin refers to the region west of the sea (also called Sea West Country). Its inhabitants were tall and upright, resembling the people of China, hence the name Da Qin.

              Notes: In phonology, the character 秦 Qín ("Tần") ends in an open nasal -n, making it difficult to transform into -w, a rounded, closed-lip sound. According to Shuowen Jiezi, the pronunciation of 秦 Qín (originally referring to a type of grain) was akin to 舂 cōng (SV "thông", corresponding to VS "tàu"). Comparing phonological transformation patterns, this resembles the shift seen in 痛 tòng → "đau." Additionally, it was borrowed for the pronunciation of 牆 qiáng ("wall"), which corresponds to SV "thương" ~ VS "đau."

                      Before and after the Warring States period (Eastern Zhou), the term 秦 Qín was used across various regions in what is now China to refer to the State of Qin, which Emperor Qin Shi Huang unified along with six other states in 246 B.C. In Vietnamese culture, the Double Fifth Festival (Tết Đoanngọ), celebrated on the 5th day of the 5th lunar month, was once a major folk tradition. One custom involved wrapping and throwing rice cakes into the river to prevent fish from consuming the remains of Qu Yuan (Khuất Nguyên), a loyal scholar of the State of Chu, who drowned himself rather than be captured by Qin forces.

                      Based on this historical context, there is an immediate association with resistance, and even contempt, when referring to the Qin State (Tần). Today, the region that was once the State of Chu is located in Hubei Province, which may have once been a part of or closely linked to the southern BáchViệt (Hundred Yue) territories. These included provinces such as Yunnan, Guangxi, Hunan, Guangdong, Fujian, Zhejiang, Jiangsu, and others over 2,000 years ago. This phonetic connection further supports the plausible link between Tần and Tàu, as in the Vietnamese word Tàuô, which aligns with the black-colored uniforms worn by Qin officials.

                      Some argue that "Tàu" derives from "tàughe" ('boats'), and that ngườiTàu ('Chinese people') refers to those arriving in Vietnam by boat or living aboard ships. However, this interpretation is merely speculative. The most reasonable linguistic link remains Tần = "Tàu". During that era, the people of the former Warring States, which were conquered by Qin, deeply resented Tần ("Tàu").


              Another relevant linguistic observation involves Cantonese speakers in Vietnam, who often refer to themselves as Thòngdành (唐人 Tángrén, "Tang people") or người Đường ("Tang people"). In phonological transformation, thòng in 唐人 Tángrén or Thòngdành could have evolved into tàu. However, it is worth noting that 唐 Táng = SV đàng, đường ends in an open final, yet follows a phonological pattern wherein /-ương/ shifts to /-au/. In ancient usage, 唐 Táng carried meanings such as "great road" or "main path" (đường cái, đàng cái).

              Despite this possibility, the explanation that Tần = Tàu is stronger. Unlike their animosity toward Qin, the Vietnamese did not harbor the same resentment toward Cantonese speakers. While Cantonese people are commonly referred to as ngườiTàu in Vietnam, collective consciousness suggests that Vietnamese speakers may have internally recognized that Cantonese belonged to a different branch of the BáchViệt people, one that had been completely Sinicized (Hánhoá). This is reflected in historical figures such as Triệu Đà, who declared himself King of NamViệt (NanYue), with his capital at Phiênngung, now modern-day Guangzhou.

              Additionally, in phonological analysis, 中 Zhōng ("Trung") could have also evolved into "Tàu" due to phonetic shifts: /ʈ-/ → /t-/, and /-ŋʷ/ → -w/. This follows similar phonetic transformations observed in 痛 tòng (SV thống) → "đau". Hence, "Trung" could plausibly have shifted into "Tàu".

              Examples:

                • 秦晋之 Qín-Jìnzhīyuán ("KếtduyênTần-Tấn", 'Alliance between Qin and Jin')
                • 秦人 Qínrén ("Người Tàu", 'Chinese people')
                • 三秦 Sān Qín ("Ba Tàu", 'Three Qin regions')
                • 秦越 Qín-Yuè ("Tàu-Việt", 'Qin-Yue').

                The term China and Chinese trace their origins to the Qin Dynasty (246–210 B.C.). Qin also appears as a family surname, a tribal name, and a designation for regions in ancient China (including Shaanxi Province). In Vietnamese, the term Chệt or Chệc carries a derogatory tone, though it is believed to derive from 潮 cháo (Teochew 潮州 Cháozhou). The phonetic progression 潮 cháo → Triều → Tiều could have eventually resulted in Tàu

                三秦 Sānqín (1) TamTần, (2) BaTàu [ @ M 三秦 Sānqín \ Vh @ 三 sān ~ ba (cf. 仨 sa), @ 秦 Qín ~ 'Tàu' | M 三 sān, sàn, sā, sēn < Middle Chinese sɑm, sʌm < OC *sjə:m, *sjə:ms | FQ 蘇甘, 蘇暫 || M 秦 Qín < MC tʂjin < OC *tʂin (See 'Tàu') || Handian: ◎ Three Qin (三秦 Sānqín) refers to the Guanzhong region. After Xiang Yu defeated Qin and entered Guanzhong, he divided the territory among the surrendered Qin generals Zhang Han, Sima Xin, and Dong Yi, thus calling the Guanzhong area Three Qin. ◎ "The city towers support Three Qin, smoke watches over Five Crossings" , Tang Dynasty, Wang Bo's "To Du Shaofu upon His Appointment to Shu Prefecture."

                (1) After the fall of Qin, Xiang Yu divided Guanzhong into three regions, appointing the surrendered Qin generals:
                Zhang Han as King of Yong (雍王)
                Sima Xin as King of Sai (塞王)
                Dong Yi as King of Zhai (翟王).
                        Together, they were known as Three Qin. See Records of the Grand Historian: Qin Shi Huang Chronicle (《史記·秦始皇 本紀》). Later, Three Qin came to refer to the region now known as Shaanxi Province. Wang Bo's poem "To Du Shaofu upon His Appointment to Shu Prefecture" describes it: "The city towers support Three Qin, wind and smoke overlook Five Crossings." 
                        Feng Bi's poem "Map of Rivers and Mountains" further mentions: "The terrain extends west to control the distant Three Qin, the river flows south to encompass Two Hua."
                (2) Three Qin also refers collectively to:
                Qinzhou (秦州),
                Eastern Qinzhou (東秦州),
                Southern Qinzhou (南秦州).
                In The Book of Wei (《魏書·尒朱天光傳》): "From Three Qin, the He River, Wei River, Gua Prefecture, Liang Prefecture, and Shanshan, all came to submit."
                This text is also cited in Comprehensive Mirror in Aid of Governance (《資治通鑑·梁武帝中大通二年》), with historian Hu Sanxing annotating: "Three Qin refers to Qinzhou, Eastern Qinzhou, and Southern Qinzhou."  
                Note: San Qin, central Shanxi Plain; the Vietnamese "BaTàu" is derogatory term to call Chinese. 

                (字)^ The Chữ Nôm script renders 𡨸喃, which is also written as 字喃.

                (差)^ Let's examine another case, one that is arguably more "Vietnamese" than "Chinese," though still constructed using Chinese linguistic material. Consider phải and trái, which are distinct from their Chinese equivalents but conceptually align with the notions of "right and wrong" versus "left and right."

                In Vietnamese, trái denotes both "wrong" and "left." The former meaning may be linked to sai trái 差錯 chācuō (SV saitô, "wrong"), where 錯 cuō is associated with 差 chā ("sai" in Vietnamese). Phonologically, VS trái appears connected to 左 zuǒ (SV tả, "left"). Meanwhile, the concept of phải functions similarly to the English word "right" in both the directional and moral senses, as seen in phảichăng 平等 píngděng (SV bìnhđẳng, "equal, righteous"). This association extends to the phrase phảitrái 是非 shìfēi (SV thịphi), meaning "right and wrong."

                Notably, Vietnamese phải ('right' in the sense of correctness) does not derive from the Chinese word 右 yòu (SV hữu, 'right side'), as seen in "tảhữu
                " (左右 zuǒyòu, 'left and right'). However, phonological parallels suggest an underlying relationship between 右 yòu and phải within the {¶ /y- ~ B-/} transformation pattern. This pattern appears in pairs such as:

                  • 郵 yóu (SV bưu, "post")
                  • 由 yóu (VS bởi, "because")
                  • 柚 yóu (VS bưởi, "grapefruit")
                  • 游 yóu (VS bơi, "swim").

                        Such correspondences imply that 右 yòu may have historically shared phonetic characteristics with VS phải. It is plausible that phải once sounded closer to /bɨw/ in prehistoric times.

                The broader takeaway here is that many modern Vietnamese words have been coined using Chinese linguistic material. The pair phải and trái (是非 shìfēi) reflect a pattern of antonymous disyllabic word formation in Vietnamese, paralleling structures found in:

                  • cao thấp 高低 ("height"),
                  • to nhỏ 大小 dàxiăo ("size"),
                  • nặng nhẹ 輕重 qīngzhòng ("weight")

                (安)Linguistic Parallels in Former Colonies: Similar to the role of English as a global lingua franca in former British colonies, early Mandarin may have functioned in a comparable capacity in Annam prior to 939 AD. Even today, Hanoi residents continue to associate refinement and elegance with Tràngan people, referring to themselves with a sense of cultural prestige. This metaphor mirrors the early 20th-century sentiment of "Saïgon est Paris de l’Orient," despite the fact that the French only arrived in Saigon in 1868 and their colonial presence in Vietnam lasted until 1954.

                (Y)Pig Terminology in Vietnamese and Its Yue Origins: For "pig," northern Vietnamese speakers use lợn (豚 tún, SV độn), whereas in the south, it is called heo (亥 hài, SV hợi). The latter is an archaic, authentic Yue term found in both Vietnamese and Chinese zodiac systems, where 亥年 Hàinián (VS NămHợi or NămHeo) corresponds to the "Year of the Boar." Meanwhile, lợn 豚 tún (SV độn), appearing in the Kangxi Dictionary, is more accurately a doublet of 豘 tún, which carries the same meaning.

                The key point to emphasize is that Yue linguistic elements predate Chinese ones, as 亥 hài was likely transcribed from an ancient Yue term for heo, both etymologically and culturally (See APPENDIX D, E, F, G)

                (V)NanYue (Chinese: 南越; pinyin: NánYuè; Cantonese Yale: Nàahm-yuht; Vietnamese: NamViệt) was an ancient kingdom encompassing parts of present-day Guangdong, Guangxi, and Yunnan in China, as well as northern Vietnam. . Today, visitors can explore the magnificent ruins of mausoleums once built by the kings of NanYue, located in Guangzhou City, Guangdong Province, China.

                (Z)Shared Folktales Between Zhuang and Vietnamese Cultures: The Zhuang folktale of the Magic Sword and the Vietnamese legend of Trọng Thuỷ and Mỵ Châu narrate strikingly similar stories, both detailing the historical transition of Âu Lạc (歐雒) into the Nam Việt Kingdom. (cf Truyệncổ Dòng BáchViệt and https://vi.wikipedia.org/wiki/Mỵ_Châu.)

                (未)Goat and Its Linguistic Associations: The Chinese character 未 wèi can be transliterated as both Sino-Vietnamese vị ("upcoming") and SV mùi, as seen in Năm ẤtMùi 乙未年 Yǐwèinián ("Year of the Goat"). In Sinitic-Vietnamese, dê (goat) is cognate with 羊 yáng (SV dương, VS dê), which aligns with Teochew /jẽ/, all denoting "goat." The zodiac name 羊年 Yángnián ("Year of the Goat") corresponds with Sinitic-Vietnamese NămDê.

                        An important elaboration here is that 未 wèi originated as a loanword from the ancient Yue linguistic family, whereas 羊 yáng is a pictograph depicting the head of a goat or sheep. Linguistically, 未 wèi and 羊 yáng may be considered doublets, connected both semantically and phonetically. This relationship is exemplified in 美 měi (SV mỹ, "beautiful"), where 羊 yáng above 火 huǒ ("fire") metaphorically conveys "beautiful taste" or "deliciousness." Furthermore, 美 měi and 未 wèi (cf. mùi) exhibit phonetic and semantic connections.

                   It is plausible that an early form of "dê" entered the Chinese language in dual forms for zodiac classification, possibly sounding similar to 未 (wèi) centuries before being reintroduced to the Yue populace of the NamViệt Kingdom or Annam.

                (S)A classic example of a Sinitic-Vietnamese word is 江 jiāng (VS sông, ‘river’), which was an ancient loan from the Yue form /krong/. Similarly, 目  and VS mắt ('eye') may have originated from a shared ancestral root, likely tracing back to a pre-Taic linguistic stratum in the distant prehistoric past. Other notable examples include the following: 子鼠 Zǐshǔ ("Týchuột", 'Tý rat'), 丑牛 Chǒuníu (SửuTrâu 'Sửu buffallo'), 寅虎 Yínhǔ (Dầncọp 'Dần tiger'), 卯貓 Mǎomāo (Mãomẹo 'Mão cat') [ NOT => 卯兔 Mǎotù? ("Mão thỏ"? 'Mão rabbit') ], 辰龍 Chénlóng (Thìnrồng 'Thìn dragon'), 巳蛇 Sìshé (Tỵrắn 'Tỵ snake'), 午馬 Wǔmǎ ("nămNgọ", 'Ngọ horse'), 未羊 Wèiyáng (Mùidê 'Mùi goat'), 申猴 Shēnhóu (Thânkhỉ 'Thân monkey'), 酉雞 Yǒujī ("Dậugà", 'Dậu chicken'), 戌狗 Xūgǒu ("Tuấtchó", 'Tuất dog'), 亥猪 Hàizhū ("Hợitrư", 'Hợi pig').

                (A)^ Western theories often overlook historical Yue linguistic and cultural facts, favoring new constructs over existing knowledge. Many Western scholars have hesitated to engage deeply with older historical sources, particularly those requiring proficiency in Chinese, leading them to invent frameworks from scratch rather than building on established research.

                (T)^See APPENDIX L Bùi Khánh-Thế, Ứng xử Ngôn ngữ của Người Việt đối với các Yếu tố gốc Hán.

                (文)Historical Linguistic Transformation: The evolution of Han-Viet lexicons, as reflected in both daily speech and literature, is further corroborated by historical events that unfolded in ancient Annam following the collapse of the Tang Empire (906–939 AD). For an in-depth analysis of the transformation from Middle Chinese to Sino-Vietnamese, see Nguyễn Tài Cẩn's Nguồn gốc và Quá trình Hình thành Cách đọc Âm Hán-Việt (1979).

                (普)A few hey points before proceeding: For general readers, here are a few introductory guidelines before delving further into this work.

                1. Time commitment: This research is intended for publication in print format and is not suited for cursory browsing on the internet. Be prepared to invest ample time in engaging with its content.

                2. Conceptual framework: If the introductory chapter feels dense or difficult to grasp, do not be discouraged. If you are eager to learn, consider a simplified perspective: treat Austroasiatic as a linguistic branch stemming from pre-Yue Taic languages, and build your understanding from that premise. Alternatively, you may begin with the assumption that Yue, distinct from both Sinitic and Austroasiatic, serves as the foundation for this discussion. This approach clarifies why Austroasiatic classifications tend to be retroactive, tracing a circuitous route from south to north.

                3. Navigating Austroasiatic research: Do not let the overwhelming amount of Austroasiatic information online intimidate you. Much of it reiterates the same interpretations drawn from similar sources. Scholars in the Sino-Tibetan linguistic circle (focused on Yue studies) understand the limitations of such analyses. The author assume that if you have read this far, you align with the Sino-Tibetan perspective; otherwise, you likely would not have had the patience to engage with these discussions, let alone with the equivalent of hundreds of printed pages ahead. To maintain clarity, avoid reactive engagement with Austroasiatic arguments, as they often lead to distractions rather than progress.

                Linguistic insights for different audiences:

                • For language learners: Much like the thrill of tasting "phở" for the first time, learners may be intrigued to learn that "phở" is etymologically cognate with 粉 fěn (SV "phấn", meaning 'noodle'). This root has branched into several Vietnamese words, including "phấn" 'chalk', "bún" 'noodle', "bột" 'flour', and "bụi" 'dust', all tracing back to the same semantic origin. (See Han-Viet.com)

                • For linguists: Experts may immediately recognize the plausibility of cognates such as:

                  • 雞 jī (SV "kê" ~ VS "gà", 'chicken') 
                  • 蛋 dàn (SV "đản" ~ VS "trứng", 'egg') 
                  • 蒜 suàn (SV "toán" ~ VS "tỏi", 'garlic') 
                  • 打 dǎ (SV "đả" ~ VS "đánh", 'strike')
                  • 公 gōng (SV "công" ~ VS "ông", 'mister') ~ 翁 wēng (VS "ông", 'old man', etymologically linked to "lông", 'feather, hair')

                  However, for general readers, digesting these etymological connections requires time and effort. Explanatory elaborations may help, but some assumptions should be accepted as foundational premises without excessive scrutiny, such as the correspondence between 打 dǎ and "đánh". Further phonetic details, like its association with 丁 dīng (SV "đinh", 'young man'), would only add complexity. 丁 dīng also gave rise to words like 釘 dīng (SV "đinh", 'nail') and 打包 dǎbāo, which corresponds to Vietnamese "đóngbao" 'to package'. Readers may, of course, question whether "trai" 'young man' originated from 丁 dīng, but such inquiries extend beyond the immediate scope of this work.

                (囯)Austroasiatic Interpretations of Sino-Vietnamese Usage: Although unproven, this perspective is noteworthy as it provides Austroasiatic scholars with a rationale for the widespread use of Sino-Vietnamese words in daily Vietnamese speech. Their argument suggests that these words were adopted into common usage through linguistic evolution rather than being inherently native expressions belonging to speakers of the same language.

                (音)Phonological Insights for Chinese Philologists: Chinese philologists may find value in examining subtle articulation discrepancies in Vietnamese, which could offer solutions to complexities such as chongniu (重紐, rime doublets) and phonemic division patterns (I, II 等, first and second class distinctions) in Middle Chinese historical phonology.

                (P)The Singular 'They': Regarding pronoun usage, the author acknowledges that the singular they is increasingly recognized as a practical alternative to "she," "he," or "s/he" in various contexts. The Washington Post formally adopted this usage in its stylebook in December 2015, and the U.S. Examiner followed suit on September 22, 2016. Furthermore, they was named Word of the Year by the American Dialect Society in 2015.

                (Q)^ For guidance on approximate pronunciation in modern Vietnamese, consult or refer to Vietnamese-English Dictionary by Nguyễn Đình-Hoà (1966) or Nguyễn Văn Khôn (1967).