Hypothesis of Common Yue Origin of Vietnamese and Chinese
Vietnamese history is inseparable from the long arc of Yue–Han contact. From the Red River Delta through successive dynasties, waves of migration, colonization, and cultural exchange layered the language with Sinitic elements. Archaeological finds such as Đôngsơn drums, genetic studies of Yue‑Han admixture, and the persistence of Yue cultural practices all point to a shared foundation between southern China and northern Vietnam. Over centuries, Vietnamese identity emerged not as an isolated branch but as a composite, shaped by Yue substrata, Han overlay, and later southward expansion into Chamic and Khmer territories.
The etymological core of Vietnamese is not a single inheritance but a stratified lexicon. Sino‑Vietnamese readings, vernacular Sinitic‑Vietnamese forms, and native substrata coexist, often in doublets or alternates. What appears as 'pure Vietnamese' frequently reveals deeper connections: chim 'bird' with Mon‑Khmer parallels, cá 'fish' with Austroasiatic cognates, thỏ 'hare' with Chinese roots. The integrity of the language lies not in purity but in the interplay of these strata. Vietnamese etymology must therefore be studied as a layered system, where cultural history and linguistic borrowing are inseparable.
This study advances the hypothesis that Vietnamese and Chinese share a common Yue substratum, reflected in deep lexical and phonological correspondences. Rather than positioning Vietnamese as an Austroasiatic language with heavy Sinitic borrowing, the argument here reframes Vietnamese as a co‑participant in a Sino‑Tibetan/Yue continuum. Evidence is drawn from core vocabulary – kinship terms (mẹ, bố, con), natural elements (trời, nước, đất), and human categories (người) – which align systematically with Chinese equivalents (母 mǔ, 父 fù, 子 zǐ, 天 tiān, 水 shuǐ, 土 tǔ, 人 rén). These parallels suggest not mere borrowing but shared ancestry, with Vietnamese preserving substratal forms while Chinese evolved through its own historical trajectory. The article invites readers to reconsider entrenched Austroasiatic assumptions and to explore the possibility of a deeper genetic affiliation between Vietnamese and Chinese.
I) Foundations of the hypothesis
This study applies a comparative-historical linguistic approach to examine the hypothesis of a common Yue origin for significant portions of the Vietnamese and Chinese lexicons. Historical records, archaeological evidence, and phonological reconstruction are brought together to trace lexical strata across time and space. The primary dataset consists of Sinitic-Vietnamese vocabulary, encompassing both formal Sino-Vietnamese readings and colloquial Vietnamese forms. Supplementary data include Mon-Khmer basic cognates whose relationship to a Yue origin remains unresolved. Lexical items are drawn from historical texts, modern usage, and dialectal sources in both Vietnamese and Chinese.
Sound correspondences between Vietnamese and Chinese forms are identified, with attention to regular shifts, loanword adaptation patterns, and the retention of archaic phonemes. Special focus is given to features preserved in Vietnamese that have been lost or altered in modern Chinese dialects. Lexical meaning is examined diachronically, noting cases of semantic retention, narrowing, broadening, or shift, and cultural context is considered, especially for terms tied to indigenous practices, folk concepts, and ceremonial vocabulary.
Linguistic data are situated within the historical timeline of Yue-Han contact, including periods of colonization, migration, and trade. Archaeological evidence such as inscriptions and artifacts is used to corroborate linguistic findings. Where possible, the study assesses whether a given lexical item is more plausibly a Vietnamese loan into Chinese, a Chinese loan into Vietnamese, or a shared inheritance from Yue, weighing phonological conservatism, semantic distribution, and historical plausibility. By combining philological rigor with comparative reconstruction, the analysis aims to clarify the depth and nature of Yue influence on Vietnamese and to situate that influence within the broader Sino-Tibetan and Mon-Khmer linguistic landscape.
Perpetually situated within China’s cultural and political orbit, Vietnam has navigated a delicate balance, acting as a subordinate state while striving to preserve its sovereignty. Unlike Japan or South Korea, which have decisively stepped out from China’s civilizational shadow, Vietnam has remained closely bound within its gravitational pull. Sino-centric dynamics form the foundational framework for the hypothesis asserting the Chinese origin of identified Sinitic-Vietnamese lexicon within the Vietnamese language, commonly referred to as Sinitic-Vietnamese (VS). Across Vietnam’s long and turbulent history, the specter of Chinese invasion has recurred with a rhythm as familiar as seasonal illness, shaping a persistent backdrop of geopolitical tension between the two nations. Each Vietnamese generation has regarded these threats as tangible and pressing, particularly during periods when the northern empire expanded beyond its bounds.
Having established the historical and methodological foundations, the discussion now turns to the corpus itself. The following datasets present the lexical evidence on which this study’s arguments rest, organized to reveal both the depth and breadth of Sinitic-Vietnamese strata. Each entry is drawn from the primary and supplementary sources described above, and is aligned with its Modern Mandarin form, Middle Chinese reconstructions (Baxter, Pulleyblank), and Old Chinese reconstructions (Baxter-Sagart, Zhengzhang Shangfang), among others. Where relevant, parallels in other Chinese dialects, Mon-Khmer languages, and Sino-Tibetan languages are noted to illuminate phonological correspondences and semantic relationships.
The corpus is arranged to allow the reader to trace individual items across time and space, from their earliest attested forms to their modern reflexes in Vietnamese. This structure makes it possible to observe regular sound changes, loanword adaptation patterns, and the retention of archaic features. It also highlights semantic developments, including narrowing, broadening, or shift, and situates each item within its cultural and historical context.
By moving directly from the conceptual framework into the data, the reader can see how the comparative-historical approach operates in practice. The tables and annotations that follow are intended not only to document the lexical material but also to demonstrate the analytical process, and show how each item contributes to the larger picture of Yue influence on Vietnamese and the interplay between inherited vocabulary and loanwords.
The hypothesis of a common Yue origin rests on the observation that Vietnamese and Chinese share a stratum of basic vocabulary that cannot be explained solely by borrowing. Kinship terms such as mẹ (母 mǔ, SV mẫu) and bố (父 fù, SV phụ) are not peripheral but belong to the core lexicon, the words most resistant to change. Similarly, con (子 zǐ, SV tử) and người (人 rén, SV nhân) show systematic correspondences with Chinese, suggesting a substratal link.
These correspondences extend to natural elements: trời (天 tiān, SV thiên) and nước (水 shuǐ, SV thuỷ), alongside đất (土 tǔ, SV thổ). The coexistence of native forms (trời, nước, đất) with Sino‑Vietnamese overlays (thiên, quốc, địa) illustrates a doublet system: substratal Yue roots preserved in daily speech, while Sinitic prestige forms entered ritual, administrative, and scholarly domains.
The persistence of these doublets points to a shared ancestry rather than unilateral borrowing. Vietnamese and Chinese appear to have diverged from a common Yue foundation, with each language developing distinct phonological and semantic trajectories while retaining recognizable cognates.
A. Historical background
Table 1 – TIMELINE OF VIETNAM'S HISTORY*
For the most part of its history, the geographical boundary of present day Vietnam covered 3 ethnically distinct nations: a Vietnamese nation, a Cham nation, and a part of the Khmer Empire.
The Viet nation originated in the Red River Delta in present day Northern Vietnam and expanded over its history to the current boundary. It went through a lot of name changes, with Văn Lang being used the longest. Below is a summary of names:
| Period | Country Name | Time Frame | Boundary |
|---|---|---|---|
| BaiYue (Prehistoric Yue tribes) | 2879-2524 B.C. | Stretching from the near bank of the Yangtze River to the southernmost area now called Quảng Trị, adjacent to Champa Kingdom, including the Yunnan, Kweichow, Hunan, Kwangsi and Kwangtung provinces of China. | |
| Hồngbàng Dynasty | Vănlang | 2524-258 B.C. | It was bordered to the east by the East Sea, to the west by Ba Thục; today Sichuan), to the north by Dongting Lake (Hunan), and to the south by Lake Tôn (Champa). The Red River Delta is the home of the LạcViệt culture. |
| Thục Dynasty | ÂuLạc | 257-207 B.C. | Red River delta and its adjoining north and west mountain regions. |
| Triệu Dynasty | NamViệt | 207-111 B.C. | ÂuLạc, Guangdong, and Guangxi. |
| Han Domination | Giaochỉ (Jiaozhi) | 111 B.C.-39 AD | Present-day north and north-central of Vietnam (southern border expanded down to the Ma River and Ca River delta), Guangdong, and Guangxi. |
| Trưng Sisters | Lĩnhnam | 40-43 | Present-day north and north-central of Vietnam (southern border expanded down to the Ma River and Ca River delta). |
| Han to Eastern Wu Domination | Giaochỉ | 43-229 | Present-day north and north-central of Vietnam (southern border expanded down to the Ma River and Ca River delta), Guangdong, and Guangxi. |
| Eastern Wu to Liang Domination | Giaochâu (Jiaozhou) | 229-544 | Same as above |
| Anterior Lý Dynasty | Vạnxuân | 544-602 | Same as above. |
| Sui Domination | Giaochâu | 602-618 | Same as above |
| Tang Domination | Annam | 618-866 | Same as above |
| Tang Domination, Autonomy (Khúc family, Dương Đình Nghệ, and Kiều Công Tiễn), Ngô Dynasty | Tĩnh Hảiquân | 866-967 | Same as above |
| Đinh, Anterior Lê and Lý Dynasty | ĐạicồViệt | 968-1054 | Same as above. |
| Lý and Trần Dynasty | ĐạiViệt | 1054-1400 | Southern border expanded down to present-day Huế area. |
| Hồ Dynasty | Đạingu | 1400–1407 | Same as above. |
| Ming Domination and Posterior Trần Dynasty | Giaochỉ | 1407–1427 | Same as above. |
| Lê, Mạc, Trịnh–Nguyễn Lords, Tâysơn Dynasty, Nguyễn Dynasty | ĐạiViệt | 1428-1804 | Gradually expanded to the boundary of present day Vietnam. |
| Nguyễn Dynasty | Việtnam | 1804–1839 | Present-day Vietnam plus some occupied territories in Laos and Cambodia. |
| Nguyễn Dynasty | Đạinam | 1839–1887 | Same as above |
| Nguyễn Dynasty and French Protectorate | French Indochina, consisting of Cochinchina (southern Vietnam), Annam (central Vietnam), Tonkin (northern Vietnam), Cambodia, and Laos | 1887–1945 | Present-day Vietnam, Laos, and Cambodia. |
| Republican Era | Việt Nam (with variances such as Democratic Republic, State of Vietnam, Republic of Vietnam, Socialist Republic) |
Democratic Republic of Vietnam (1945–1976 in North Vietnam), State of Vietnam (1949–1955), Republic of Vietnam (1955–1975 in South Vietnam), Socialist Republic of Vietnam (1976–present) |
Present-day Vietnam. |
Almost all Vietnamese dynasties are named after the king's family name, unlike the Chinese dynasties, whose names are dictated by the dynasty founders and often used as the country's name.
The Hồngbàng Dynasty was a dynasty of the LạcViệt nation before recorded history. The Thục, Triệu, Anterior Lý, Ngô, Đinh, Anterior Lê, Lý, Trần, Hồ, Lê, Mạc, Tâysơn, and Nguyễn are usually regarded by historians as formal dynasties. Nguyễn Huệ's "Tâysơn Dynasty" is rather a name created by historians to avoid confusion with Nguyễn Ánh's Nguyễn Dynasty.
*Compiled from source: https://en.wikipedia.org/wiki/Early_Lê_Dynasty
Historically, both prior to and following China’s domination of Annam, Chinese immigrants undeniably introduced Confucian culture into Vietnamese society. In Vietnam, numerous facets of daily life, ranging from customs and traditions to family names and place names, mirror their Chinese counterparts, often to the point of replication. Chinese culture has long been held in high esteem and practiced with such rigidity that changes, whether beneficial or detrimental, have occurred gradually. This stands in contrast to Japan and Korea, where Chinese festivals and holidays were decisively abandoned in favor of localized cultural identities.
For example, while the Lunar New Year Festival (Tết) and the Mid-Autumn Festival (Tết Trung Thu) have largely disappeared from public life in Korea and Japan, they continue to be passionately celebrated in Vietnam. These festivities are deeply intertwined with ancestral tomb-clearing rituals observed during the Spring and Winter Solstices, practices that remain firmly rooted in the Vietnamese cultural psyche.
However, other Chinese-origin festivals such as Tết Nguyên Tiêu (元宵節 'Lantern Festival') and Tết Đoan ngọ (端午節 'Dragon Boat Festival') have seen a decline in observance, particularly due to the disruptions caused by prolonged warfare, most notably during the final decades of the 20th century. Despite this decline, there are clear signs of a cultural resurgence: these traditions have begun to reemerge in the early 21st century with renewed vigor and public enthusiasm.
In the wake of the 1979 border conflict with China, a wave of nationalism emerged in Vietnam. During this period, authorities attempted to recalibrate the national Lunar New Year Festival by aligning it with the Vietnamese lunar calendar in such a way that it would precede China’s celebration by one month. The result? In February 1985, Vietnamese citizens ended up celebrating Tết twice, once according to the revised calendar and again in sync with China’s traditional date. Ironically, the second celebration was even more elaborate and joyous than the first. To appreciate the cultural magnitude of Tết in Vietnam, one need only consider that its duration and significance rival the combined festivities of Thanksgiving, Christmas, and New Year in the West.
"1985 is one of the few years where Vietnamese and Chinese calendars differ significantly: the Vietnamese New Year was 1 month earlier than the Chinese one. The reason can be detected from the above table (informatik.uni-leipzig.de). The Winter Solstice 1984 falls on 12/21/1984 Hanoi time, but on 12/22/1984 Beijing time, the same day as the New Moon. The month 11 of the Chinese year must contain the Winter Solstice, so it is not the month from 11/23/1984 to 12/21/1984 like in the Vietnamese calendar, but the one starting 12/22/1984. Consequently, the subsequent months (12, 1,...) also start about one month later than the corresponding months of the Vietnamese calendar. While New Year in Vietnam falls on 1/21/1985, it is on 2/20/1985 in China. The two calendars agree again after a leap month is inserted to the Vietnamese calendar (month from 3/21/1985 to 4/19/1985, as seen above). Also, in year 1984 the Chinese lunar month from 11/23/1984 to 12/21/1984 is the first lunar month after Winter Solstice 1983 that does not contain a Major Term and is therefore a leap month."
In the 21st century "there are 3 years where the Lunar New Year begins at different dates in Vietnam and in China. In 2007 the Vietnamese New Year is on 2/17/2007, the Chinese one on 2/18/2007. In 2030 the dates are 2/2/2030 and 2/3/2030, and in 2053 they are 2/18/2053 and 2/19/2053. "
Source: http://www.informatik.uni-leipzig.de/~duc/amlich/calrules_en.html
Over time, Vietnamese cultural and linguistic elements have decisively supplanted the indigenous Chamic and Khmer characteristics that once defined the central territories of the now-vanished Champa and Khmer kingdoms – regions annexed by Việtnam in the relatively recent past. Strikingly, many of their placenames are direct duplicates of those found in the old Middle Kingdom.
With the exception of Vietnam’s northwestern provinces – where native toponyms reflect the languages of indigenous minority groups who remain the demographic majority – most other regions have undergone sweeping renaming. Late resettlers from northern Vietnam, including migrants from southern regions of China, gradually replaced local placenames from north to south. This transformation extends from the current northern central territory at the 16th parallel all the way to the southernmost tip of Càmau Province, facing the Gulf of Thailand, spanning more than 3,260 kilometers of coastline.
Illustrative examples include:
- Tháinguyên (Tàiyuán 太原)
- Sơntây (Shānxī 山西)
- Hànội (Hénèi 河內)
- Hànam (Hénán 河南)
- Hàbắc (Héběi 河北)
- Hàđông (Hédōng 河東)
- Hàtây (Héxī 河西)
- Trùngkhánh (Chóngqìng 重慶)
- Tràngan (Cháng’ān 長安) – also known as Trườngyên, both used for the 10th-century capital during the Lý Dynasty and still applied to Hànội in early 20th-century usage
- Bắcninh and Tâyninh (Běiníng 北寧 ‘Pacified North’ and Xīníng 西寧 ‘Pacified West’) – paralleling Xīníng in Xīnjiāng and contrasting with Nánníng 南寧 ‘Pacified South’ in Guăngxī Thuậnhoá (Shùnhuá 順化) Quảngnam (Guăngnán 廣南 ‘Greater South’) – in contrast to Guăngdōng 廣東 ‘Greater East’ and Guăngxī 廣西 ‘Greater West’
Even ostensibly native Vietnamese terms often reveal Sino-Vietnamese roots. For example:
Kẻchợ (Jīngchéng 京城, SV kinhđô) meaning 'Capital', comparable to Japanese Keijō Đànẵng, historically transcribed as Xiàngăng 峴港 (SV Hiệncảng), was pronounced Kẻon by Fukienese and Hainanese communities as early as the 18th century
Beyond major cities, countless townships and villages bear names constructed from Sino–Vietnamese elements, such as:
- Hoàihương (Huáixiāng 懷鄉)
- Bồngsơn (Péngshān 蓬山)
- Bìnhtân (Píngxīn 平津)
- Longan (Lóng’ān 隆安)
- Gianghĩa (Jiāyì 嘉義)
- Longxuyên (Lóngchuān 龍川)
This naming convention mirrors colonial practices in the United States, where English placenames were transplanted to the East Coast, e.g., New England, New York, New Hampshire.
The number of Chinese placenames used in Vietnam is virtually incalculable. In addition to inherited names, new ones have been coined using Sino-Vietnamese vocabulary, particularly in territories acquired from the Champa and Khmer kingdoms as recently as the 18th century. These names evoke a nostalgic familiarity with HánViệt tradition while still retaining traces of aboriginal identity. Examples include Quynhơn, Nhatrang, Phanrang, Sóctrăng, each reflecting the southward expansion of early Vietnamese migrants over the past few centuries.
Figure 1 – Map of the ancient Kingdom of Champa (Campadesa - 2nd to 18th century)
The territory of Champa, depicted in green, lay along the coast of present-day southern Vietnam. To the north (in yellow) lay ĐạiViệt; to the west (in blue), Angkor.
(Source: https://en.wikipedia.org/wiki/Champa)
Archaeological excavations in Việtnam have unearthed bronze drums, most notably the Đôngsơn and Ngọclữ types, buried deep beneath thick layers of earth. These artifacts reflect a highly advanced metallurgical tradition, yet the modern Vietnamese inhabitants living atop these layers had no prior knowledge of their existence. Nevertheless, they claim descent from the creators of these drums. The decorative motifs etched onto the surface and rim depicting wooden boats and long-feathered birds are attributed to the LạcViệt 鵅越 (LuóYuè) people and closely resemble designs found on bronze drums still used by the Zhuang ethnic group, known in Vietnamese as Nùng.
The Zhuang are the largest minority in southern China’s Quảngtây Autonomous Region, numbering over 18 million, not including those residing in Vietnam’s northern highlands. Unlike the forgotten relics buried in Vietnamese soil, Zhuang communities continue to use these drums in ritual and ceremonial contexts, preserving a cultural continuity that suggests direct descent from the original artisans. In contrast, the self-identified Yue descendants, namely, the Vietnamese, appear disconnected from the spiritual and technical heritage embedded in these artifacts.
This disconnect may stem from centuries of warfare that fractured aboriginal cultural links. The survival of bronze drums in Vietnam is likely due to their burial, which spared them from the Han Dynasty’s widespread melt-down policies. Despite the enduring bronze drum subculture in China South, which evidences Yue cultural roots in ancient Annam, prolonged Chinese rule contributed to the extinction of Yue metallurgical knowledge. Following Han domination, waves of Chinese immigrants accelerated the colonization of Annam.
Vietnamese archaeologists have claimed ownership of these relics, asserting they belong to native Yue ancestors who once inhabited northern Việtnam. However, the Bronze Age predates the emergence of the Annamese, an ethnic amalgam of Yue and Han. Thus, the terms Annamese or Vietnamese denote evolving indigenous identities shaped by successive Han migrations. This mirrors how the term Sinitic came to represent the broader concept of “Chinese.”
Claims by overzealous Vietnamese nationalist scholars regarding artifact origins in southern Vietnam are historically tenuous. The assertion that these relics were created by "Vietnamese ancestors" is unfounded, as the region was only annexed in the late 18th century. Cultural artifacts found there belonged to the ancient Chamic and Khmer civilizations. As recently as five centuries ago, the border of ĐạiViệt ended at present-day Thanhhoá Province. The Chamic kingdoms to the south had ruled the area for over a millennium under hereditary monarchies. Only after their decline in the 13th century did Annamese dynasties begin territorial expansion. From that point, Kinh settlers migrated en masse beyond Thuậnhoá Province, where the capital Huế was established in the early 19th century. By then, Việtnam had extended its reach to the southern tip of Càmau Province, facing the Gulf of Thailand.
Anthropological evidence from the past six decades supports the hypothesis that Taic aboriginals, ancestors of the Yue, intermixed with nomadic Tibetan-origin peoples to form the proto-Chinese population. These pre-Sinitic groups migrated from infertile northwestern regions toward the fertile lands of China South as early as 4,000 years ago (see Shifan Peng, 1987). The terms Dai and Tai are often used interchangeably, with Taiwanese scholars in the 1960s using 臺 for Tai (Ding Bangxin, 1977) and 傣 f or Dai.
Over millennia, facing invasions by northern Tartarian horsemen, various Yue tribes (BáchViệt 百越) fled southward, eventually reaching the Indo-Chinese peninsula, including northeastern Myanmar, southeastern India, Thailand, and Càmau Cape (formerly Ttœ̆kkhmau, 'Black Ink'), Cambodia . This migratory pattern aligns with Austroasiatic linguistic theories, linking Yue ancestry to groups like the Munda, Mon-Khmer, and Chamic branches of Austroasiatic and Austronesian languages. Shared lexical items, such as Khmer numbers 1–5 or Chamic demonstratives, support this connection.
Historically, the Khmer and Chamic peoples founded two of Southeast Asia’s most powerful kingdoms: the Khmer Empire and the Champa Kingdom. Genetic studies link the ancient Chamic people to the Li ethnic group of Hainan Island. These Li may not have known that their southern cousins built a kingdom lasting over 1,600 years, from the 2nd century to 1832, recorded in Chinese history as Lâmấp (林邑) and Chiêmthành (占婆國).
As for Austroasiatic linguistic influence on Vietnamese, its roots lie deep in aboriginal substrata. Basic lexical remnants, such as 'cá' (fish) from OC */nga/ and 'mắt' (eye) from OC */mukw8/, appear across regional languages: Khmer /ka:/, Proto-Austroasiatic /*ka/, and Malay 'mata'. These shared forms suggest that Khmer and Chamic lexicons are embedded in Vietnamese, classified under the Austroasiatic Mon-Khmer family. Yet this phenomenon likely reflects the result of geographic contact and lexical diffusion, rather than direct lineage.
For example, Chinese zodiac animal names were first adopted by pre-Qin-Han peoples and later integrated into Vietnamese with localized forms: SV 'tý' 子 zǐ (VS chuột), 'sửu' 丑 chǒu (trâu), 'dần' 寅 yín (cọp = chằn), 'mão' 卯 máo (mẹo = mèo), and so on (see An Chi, Rong chơi Miền Chữ nghĩa, 2016, Vol.1, pp. 80–86, 159–183).
Ethnically speaking, it is unsurprising that in the 21st century, Vietnam’s population, officially composed of 54 recognized ethnic minorities, allows any citizen to trace their ancestry to one of these groups. This is especially plausible when considering historical and geographic factors. For instance, native archaeologists born in Sahuỳnh, a region once part of the Champa kingdom and annexed by the Trần Dynasty in the 13th century, may proudly assert that the cultural artifacts of the Sahuỳnh Civilization unearthed in their homeland were created by their ancestors, regardless of whether those ancestors were originally indigenous.
However, from a strict national perspective, it is inaccurate to claim that the creators of these artifacts were truly ancestors of the Vietnamese people. The Kinh majority now with over 85.32% of the population, distinct from the ethnic minorities, only gradually settled near these archaeological sites over the course of only nearly 1000 years, following a southward migratory trajectory from present-day Thanhhoá Province. This long historical movement complicates any direct ancestral claims and underscores the layered and composite nature of Vietnamese identity.
The core issue discussed above pertains specifically to Vietnamese nationals identified as of Kinh ethnicity in the most recent census, including those whose family lineage traces back to ancestors born and raised in regions where archaeological artifacts have been discovered. Such claims are only justifiable within a historical timeline that aligns with the full annexation of lands following the decline of the Champa kingdom or Khmer empire. Individuals of Chamic or Khmer descent may rightly assert ancestral ties to these artifacts. The author himself was born in the Champa region in central Vietnam, as were his parents; however, cultural relics excavated from that area do not belong to long-deceased native artisans from his paternal line. Considering the age of these artifacts, it would be historically inaccurate to claim descent from their original creators, especially when the author's paternal ancestors migrated from China as recently as the 19th century.
As previously emphasized, the Kinh people, whose early ancestors formed the demographic majority, emerged from a genetic amalgam symbolized here as {4Y6Z8H}, representing grafted Yue-Han lineages rooted in Taic ancestry. Anthropologically, Han dynasty Chinese, descendants of ancient Chu and other Yue states, invaded Annam and intermingled with indigenous populations in the Red River Basin, many of whom also traced their origins to China South. Over the past 900 years, Vietnamese forebears replicated this migratory pattern, gradually expanding southward. The final leg of this journey brought the Kinh majority to Camau Cape at the southern tip of the Indochinese peninsula. Thus, the rightful cultural ownership of artifacts found in central and southern Vietnam depends heavily on the time period in which those relics were created, represented here as {4Y6Z8H+CMK} versus {CMK}.
When discussing "roots", we return to the biological foundation, genomes that manifest in physical traits such as appearance and complexion. To the untrained eye, even Western observers may struggle to distinguish Vietnamese individuals from Chinese in mixed groups, such as second-generation students in American institutions. Similarly, from a linguistic standpoint, while Westerners may easily differentiate Vietnamese from Mon-Khmer languages, they often find it difficult to distinguish between Cantonese and Vietnamese speakers. This is due to the fact that Vietnamese functions more as a Sino-xenic language, Yue-based but infused with extensive Sinitic elements, than as a Mon-Khmer tongue.
It is no secret that Vietnamese shares a substantial portion of its vocabulary with Chinese, more so than with any other linguistic source. These shared features stem from deep historical imprints left by ancient Chinese forms and regional dialects. Why do their speech patterns resemble each other so closely? Evidence suggests a biological connection. In terms of genetic affiliation with neighboring populations in China South, advances in DNA biotechnology are poised to help anthropologists uncover more precise genetic data regarding the Vietnamese people’s composition, herein symbolized as {4Y6Z8H+CMK}.
As a matter of fact, genetically, on the DNA side, at present time there appear new scientific studies made available on the internet at our finger tips, for example, the quoted abstract from https://www.ncbi.nlm.nih.gov/ cited in the textbox below is one among them.
Table 3 - HLA-DR and -DQB1 DNA polymorphisms in a Vietnamese Kinh population from Hanoi.
Vu-Trieu A, Djoulah S, Tran-Thi C, Nguyen-Thanh T[sic], Le Monnier De Gouville I, Hors J, Sanchez-Mazas A.
Source: Department of Immunology and Physiopathology, Medical College of Hanoi, Vietnam.
Abstract:
We report here the DNA polymerase chain reaction sequence-specific
oligonucleotide (PCR-SSO) typing of the HLA-DR B1, B3, B4, B5 and DQB1
loci for a sample of 103 Vietnamese Kinh from Hanoi, and compare their
allele and haplotype frequencies to other East Asiatic and Oceanian
populations studied during the 11th and 12th International HLA
Workshops. The Kinh exhibit some very high-frequency alleles both at
DRB1 (1202, which has been confirmed by DNA sequencing, and 0901) and
DQB1 (0301, 03032, 0501) loci, which make them one of the most
homogeneous population tested so far for HLA class II in East Asia.
Three haplotypes account for almost 50% of the total haplotype
frequencies in the Vietnamese. The most frequent haplotype is
HLA-DRB1*1202-DRB3*0301-DQB1*0301 (28%), which is also predominant in
Southern Chinese, Micronesians and Javanese. On the other hand,
DRB1*1201 (frequent in the Pacific) is virtually absent in the
Vietnamese. The second most frequent haplotype is
DRB1*0901-DRB4*01011-DQB1*03032 (14%), which is also commonly observed
in Chinese populations from different origins, but with a different
accessory chain (DRB4*0301) in most ethnic groups. Genetic distances
computed for a set of Asiatic and Oceanian populations tested for DRB1
and DQB1 and their significance indicate that the Vietnamese are close
to the Thai, and to the Chinese from different locations. These
results, which are in agreement with archaeological and linguistic
evidence, contribute to a better understanding of the origin of the
Vietnamese population, which has until now not been clear.
PMID:9442802[PubMed
- indexed for MEDLINE]
From the first chapter the author has gone a great length to substantiate a hypothesis that today's Vietnamese Kinh racial stock come out of a mixed stock, so is their language as a result of the proto-Chinese moving in into China South from the southwest hundreds of years prior to the Western Han period (206 B.C.). After hundreds of years the new racially-mixed populace then continued to emigrate southward on a lager scale to today's Vietnam's northern region, and after that, part of land had been annexed to the Han's map. For the prehistoric evidences, archaeologically,
Table 4 - Affinities of the Mạnbắc people
to later early Metal Age Đôngsơn Vietnamese
The excavation of the Man Bac site (c. 3800–3500 years BP) in Ninh Binh Province, Northern Vietnam, yielded a large mortuary assemblage. A total of 31 inhumations were recovered during the 2004–2005 excavation. Multivariate comparisons using cranial and dental metrics demonstrated close affinities of the Man Bac people to later early Metal Age Dong Son Vietnamese and early and modern samples from southern China including the Neolithic to Western Han period samples from the Yangtze Basin. In contrast, large morphological gaps were found between the Man Bac people, except for a single individual, and the other earlier prehistoric Vietnamese samples represented by Hoabinhian and early Neolithic Bac Son and Da But cultural contexts. These findings suggest the initial appearance of immigrants in northern Vietnam, who were biologically related to pre- or early historic population stocks in northern or eastern peripheral areas, including Southern China. The Man Bac skeletons support the ‘two-layer’ hypothesis in discussions pertaining to the population history of Southeast Asia.
(See Morphometric affinity of the late Neolithic human remains from Man Bac, Ninh Binh Province, Vietnam: key skeletons with which to debate the ‘two layer’ hypothesis, co-authored by Hirofumi MATSUMURA, Marc F. OXENHAM, Yukio DODO, Kate DOMETT, Nguyen Kim THUY, Nguyen Lan CUONG, Nguyen Kim DUNG, Damien HUFFER, Mariko YAMAGATA (2007) at http://www.jstage.jst.go.jp/article/ase/116/2/135/_pdf
The findings above support not only the theory behind the formation of Vietnam’s dominant Kinh population, but also reflect broader ethnological patterns observed across China South and Southeast Asia. Over thousands of years, successive waves of Chinese migrants, some of Altaic origin from China North, including eastern Hakka groups, migrated southward, gradually displacing native populations in regions such as Mạnbắc of Ninhbình Province.
Interestingly, among these early resettlers, one of the most picturesque sites in the province where water and mountains converge was named 'Tràngan', a name echoing Chang’an (長安), the ancient capital of imperial China. This makes Tràngan > the second location in Vietnam, alongside Hanoi, to bear a name directly inspired by the historical heart of the Middle Kingdom. (It is worth noting that the original Chang’an is now known as Xi’an, located in Shaanxi Province, China, and served as the capital during the Tang dynasty.)
This migratory process may have occurred at various points over the past 3,800 years. According to prevailing hypotheses, the racial composition of today’s Vietnamese Kinh people reflects a blend of displaced Chinese migrants from both China North and China South. The latter are considered descendants of the BaiYue 百越, or BáchViệt, collectively known as the Yue 粵 (also 越, 鉞 in ancient records), encompassing ethnic groups such as Dai 傣 (VS Tày), Zhuang 莊 (Bouxcueng, VS Nùng), Tong 垌, Shui 水, Maonan 毛南 (Môn), Miao 苗 (Mèo, Hmong), and other southern minorities.
The ancestors of the Yue people are believed to have descended from the Taic people (原始 傣族) prior to 3000 B.C. These Taic groups were not only progenitors of the modern Dai people, now found in Yunnan, Guangxi, Thailand, Laos, and northern Vietnam, but also populated early states such as Zhou 周朝, Chu 楚國, and the pre-Qin-Han polities. Over generations, these populations intermingled with other ethnic groups during the Warring States period, and later with Qin 秦 and Western Han 西漢 peoples, forming the Han majority and the broader Chinese national identity. Many of their descendants, including native minorities, still reside in these regions today (see Xu Liting, 1981).
Before the rise of the Yue, especially those of Chu and NamViet polities, and Altaic Turkic groups, the Taic people were dominant across vast territories in China South. Their lands stretched along both banks of the Yangtze River, extending eastward to the East China Sea and southward to provinces such as Yunnan, Sichuan, Guangxi, Guizhou, Hubei, Hunan, Jiangxi, Guangdong, Fujian, and Jiangsu, including northern Vietnam.
Populations of the Eastern Zhou period (770-221 B.C.) are thought to be racially mixed, combining Taic ancestry with earlier Shang Dynasty (1600-1050 B.C.) and Western Zhou (1046-771 B.C.) peoples. These groups may have emerged from interactions between Tibetan nomads and proto-Taic communities during the Xia Dynasty. The Qin State (778-222 B.C.), the most powerful among the Warring States, absorbed many of these populations. Continuous warfare with Yin invaders and other rival states disrupted agricultural life for Yue communities along the southern Yangtze, forcing many to flee southward in search of safety and sustenance.
Terrien de Lacouperie (1887), in The Languages of China Before the Chinese, examined the linguistic heritage of pre-Chinese races. He cited Mencius (孟子 Mengzi, the 4th century B.C.), who noted the distinct shrillness of the Chu language compared to that of Qi (齊 of Shandong) . In the Zuozhuan (左傳) chronicle (663 B.C.), a Chu child named 'Tou-wutu' whose name combined words for "suckling" ('Tou' or 'nou') and "tiger" ('wutu' 於虎兔), was saved and nursed by a tigress, later becoming Tze-wen, a minister of Chu. The terms “tou” and “wutu” reflect Taic-Shan vocabulary: "dut" in Siamese means "suckle", and "htso", "tso,", or "su" refer to "tiger.". In Vietnamese, similar expressions exist: 'cọp đút', 'hùm đút', or 'hổ đút'. Though these forms have decayed over time, they persist in Tchungkia dialects of Jiangxi (江西), of the ancient Chu proper. which resemble Taic-Shang speech "to such an extent that Siamese-speaking travelers could without much difficulty understand it." The Erya (爾雅) dictionary contains 928 regional loanwords, many transcribed from Taic-Chu languages using Chinese homonyms. These linguistic remnants suggest that Taic and Yue languages predate Chinese.
During the rise and fall of the Qin Empire (221-206 B.C.), many Yue natives were absorbed into the unified state. The new entity known as either Qin, Chin, Chine, or Chinese, comprised racially mixed populations from conquered northeastern and southeastern territories, roughly equivalent to half of modern China.
After Qin’s collapse, the Chu State was defeated by Liu Bang, the Viceroy of Hanzhong, who founded the Han Empire (208 B.C.). The term "Han" derives from this lineage. Although the Han Chinese identity formally emerged after 208 B.C., it retroactively applies to earlier populations.
Following the Han conquest of the NanYue Kingdom in 111 B.C., colonization of Annam (Giaochâu Prefecture) continued through successive dynasties until 939 A.D. The Sinicization of native Yue peoples, ancestors of today’s Guangxi and Guangdong inhabitants, including those of Giaochi Prefect, accelerated. The formation of the Middle Kingdom can be viewed as the birth of a "Chinese Union".
Ironically, the same expansionist model later applied by Chinese dynasties was replicated by Annamese monarchs. From their base in the Red River Basin, they expanded westward and southward, encountering Austronesian and Austroasiatic speakers, namely, Chamic and Mon-Khmer peoples. Highland minorities who resisted assimilation were labeled Mọi, or "barbarians", echoing the Chinese term Man (蠻 SV Man) for non-Han peoples.
This raises a historical contradiction: if Mon-Khmer groups shared racial ties with Vietnamese Kinh (1), why were they subjected to widespread discrimination? In contrast, northern minorities of Dai or Yue origin received relatively fair treatment. Chinese immigrants, culturally distinct, were often treated favorably (2) and typically assimilated into Vietnamese society within one or two generations.
Today, wherever Vietnamese communities exist globally, individuals can attest to the subtle but pervasive influence of Chinese heritage in Vietnamese identity. It is no surprise, then, that what is "Vietnamese" often feels unmistakably "Chinese.".
By now, it is clear that after Annam was established as a prefecture of China, it was administered in much the same way as Fujian and Guangdong provinces. This arrangement persisted until its formal independence in 939. The name 'Vietnam' did not appear in international usage until as late as 1920. The ancient Annamese entity emerged through waves of immigration from China South, under the expansive reach of the Middle Kingdom.
Initially, many of the newcomers to Annam were long-march soldiers, exhausted from continuous campaigns of conquest and pacification. These were followed by émigrés, including large numbers of disgraced political exiles and their families, who had been purged by volatile dynasties they once served (Bo Yang, 1983-1993) (3). Alongside them came a broader influx of resettlers: impoverished peasants fleeing war, famine, and oppression in their native provinces. For most, the journey south was one-way; few ever returned to their homeland.
As previously noted, the decision to settle permanently in what is now northern Vietnam may have stemmed from a deeply rooted migratory impulse. Many of these men married, or were married into, local indigenous families, forming new kinship ties with native wives. This pattern of integration mirrors similar cases observed in modern Taiwan, where migration and intermarriage have likewise shaped cultural identity.
Over the years and many generations later, those early immigrants, whether long resettled from regions further north or arriving after the Han Chinese expansion to the south, were fully assimilated into the highly Sinicized Annamese society. This was especially true for southern Chinese groups such as Hakka (客家 Hẹ), Cantonese (廣東 Quảngđông), Hainanese (海南 Hảinam), Fukienese (福州 Phúckiến), and Tchiewchow (潮州 Triềuchâu). Their assimilation may have occurred slowly but steadily, one generation at a time, each eventually merging into the melting pot of racially mixed people identified in official census records as the Kinh ethnicity.
Together, they form the Vietnamese nationals alongside more than 50 other major ethnic groups, such as the Miao (苗 Hmong, or 'Mèo'), Zhuang (壯 Nùng), Dai (傣 Tày), Tai Noir (黑傣 Tháiđen), Tai Blanc (白傣 Tháitrắng), Chamic (占婆 Chămpa), and Khmer (高棉 Caomiên).
Chinese immigrants brought with them their culture and dialects, which infused fresh colloquial elements into the early Vietic language, supplementing the more prestigious Mandarin lingua franca spoken by ruling officials. This process gave rise to both Sinitic-Vietnamese and Sino-Vietnamese. While there are studies analyzing the transformation of Middle Chinese phonology into Sino-Vietnamese and reconstructing its possible sound system (Nguyễn Tài Cẩn, 1979), there is still no comprehensive research on how both official Mandarin and its vernacular forms penetrated ancient Annamese languages, leading to the development of Sinitic-Vietnamese vocabulary after a millennium of Han domination.
This long process shaped both the semantic and phonological aspects of Sinitic-Vietnamese words of Chinese origin. In modern Vietnamese, there are disyllabic forms and common expressions that can be traced to early Mandarin, for example:
- 'bậnviệc' = 忙活 mánghuó (busy)
- 'bưngbít' = 蒙蔽 méngbì (hoodwink)
- 'mắcbịnh' = 犯病 fànbìng (get sick)
- 'ănmày' = 要飯 yàofàn (beggar)
The presence of these etyma suggests the deep influence of ancient Chinese on Vietnamese by the end of the Tang Dynasty (907 A.D.). Etymologically, Sinitic-Vietnamese cognates with basic words in Amoy (廈門 Xiàmén) and Cantonese indicate a shared aboriginal linguistic substratum, likely of Taic origin. Examples include Hainanese and Amoy /bat7/ 'biết' (know), /kẽ/ 'con' (child), /suã/ 'soài' (mango), and Cantonese /t'aj3/ 'thấy' (see), /lei2/ 'lưỡi' (tongue), /o5/ 'ỉa' (poo).
Vietnamese grammar also reveals a distinctive local word order, probably inherited from the aboriginal Yue language, in which adjectives (modifiers) follow nouns (the modified). For example, 'gàcồ' 雞公 (rooster). This reverse word order, similar to that of the Zhuang (Nùng) and Dai (Tày), is a clear inheritance from the original Taic speech. Later Old Chinese grammatical forms of this type can still be found in some southern Chinese dialects, such as Cantonese, Amoy, and Hainanese (/kaj1kong1/).
For these latter linguistic groups, Chinese academic institutes officially classify them as Chinese dialects due to their large Sinitic vocabularies and other linguistic features, grammar, tonality, and vocabulary, largely on par, both semantically and phonologically, with those of Chinese. (4)
While Cantonese and Fukienese speakers within their own regions continued to be increasingly Sinicized under the influence of the Sino-sphere across successive Chinese dynasties from ancient times to the present, the Annamese people and their independent state, by contrast, have for the past 1,200 years charted their own course and forged a distinct identity. They were fortunate to avoid the fate of their ancient Fukienese and Cantonese neighbors, who became fully Sinicized under northern rulers without even realizing it. As a result, despite sharing certain parallels in historical development during antiquity, the Chinese language never fully supplanted the core syntactic structure of the Vietnamese language. This is why Vietnamese did not become a Chinese dialect, even though the integration of northern Chinese diasporas, such as colonial officers and their accompanying military forces, into the Annamese majority took place over many centuries.
Cultural factors, such as a shared foundation in Confucian values, undoubtedly eased the integration of Chinese immigrants into the Annamese social fabric, which readily absorbed new settlers. Linguistic similarities in the host country further accelerated their assimilation into their new homeland. Without understanding this anthropological accelerant of how the children of Chinese immigrants became members of the Kinh majority, we cannot fully grasp the true nature of both the origins of the Vietnamese language and the identity of its speakers.
By contrast, Chinese immigrants who settled in other Asian countries often experienced a different process of adaptation, one heavily dependent on the generosity and tolerance of the host nation. In many cases, their assimilation into the host culture, language, and society was far less complete than in Vietnam. Nowhere else in Austroasiatic- or Austronesian-speaking regions can we observe a process quite like that in Vietnam. For example, in Malaysia, Indonesia, and even in Confucian societies such as Japan and Korea, Chinese minorities, descendants of immigrants who arrived many generations ago, have often faced persistent discrimination.
In fact, demographic statistics show that in Southeast Asian countries such as Malaysia and Indonesia, the proportion of ethnic Chinese is significantly higher than that of Vietnam at the mere 1% as recorded in her official 2009 census. This striking fact strongly supports the conclusion that Chinese immigrants in Vietnam have been almost entirely assimilated. It raises the question: "What has actually happened to so many of those earlier Chinese immigrants in Vietnam, now that they seem to have disappeared from census data?" The evident answer is, "They have already become an integrated part of the Kinh populace."
Figure 5 – Vietnam's territorial expansion
Map of Vietnam showing the conquest of the south
(the Namtiến, 1069-1757).
The integration of early Chinese immigrants into Annam began in the aftermath of the Qin invasion, when troops marched southward. This was followed by the arrival of Han colonial administrators and their infantry, and later by successive waves of Chinese migrants drawn to Annam’s fertile lands and rich vegetation. These newcomers merged with earlier settlers and their descendants, gradually forming part of the Kinh majority in ancient Vietnamese society. Alongside this demographic transformation, the colonial authorities introduced Mandarin as the lingua franca. Over time, it evolved in tandem with local speech, producing a hybrid language reflected in the extensive presence of both Sinitic-Vietnamese and Sino-Vietnamese vocabulary.
Historically, the annals of China's Annam Prefecture are sparse. Chinese records occasionally mention rebellions and their suppression, but they provide little detail about local uprisings or decisive battles that ultimately led to Annam's independence (see Bo Yang, 1992–93, volumes 52–67). Meanwhile, Annam itself lacked comprehensive historical records prior to the 10th century. Genealogical chronicles of notable families were rare, vague, or of limited historical value. As a result, anthropologists seeking to study the origins of the Vietnamese must often rely on Chinese sources, sometimes discovered only incidentally through citations in unrelated works. Even though Annam produced literary figures of note, such contributions were often dismissed as trivialities in Chinese records. Yet during the Tang Dynasty, Annam was a prosperous region, sending some of its most gifted individuals to serve at the imperial court in Chang’an, where they attained high office. At the same time, Bắcninh Province next Vietnam’s own Chang’an, became renowned as the source of hundreds of local women who were sent to the Tang court every year, where they were expected to contribute to the continuation of Chinese royal lineages.
Linguistically, the Sino-Tibetan hypothesis of the Vietnamese language is more persuasive than Austroasiatic or Austronesian theories, largely because historical records support a Sinitic foundation. Nearly every word can be traced to a root. If Chinese philologists can reconstruct Old Chinese, the same methods could be applied to Ancient Vietic and Middle Vietnamese, since Vietnam’s history prior to the 10th century was closely tied to China’s. For example, An Chi (2016, Vol. 1, pp. 177-180) argued that the Vietnamese word 'vượn' (monkey) derives from 申 shēn = 猿 yuán, while 'khỉ' comes from 狐 hú = 猴 hóu. Thus, prior to 939, the language of the early Annamese people can be understood through Sino-Tibetan etymologies, in addition to the Sinitic-Vietnamese lexicon derived from official and vernacular Mandarin as well as regional Chinese dialects.
Analogies from later history illustrate how linguistic and cultural development parallels political events. First, during French colonization of Indochina (1861-1954), a new intelligentsia emerged, including Vietnam’s last monarch, Bảo Đại, who spoke French more fluently than his native tongue. Second, in terms of racial mixing, the short decade from 1965 to 1975 saw the presence of American soldiers in South Vietnam, then a country of fewer than 22 million people, result in nearly 50,000 Amerasian children, roughly one in every 440 births. Third, according to kyotoreview.org, by 2022 more than 133,000 Vietnamese women had married Taiwanese husbands. One might ask: given Taiwan’s population of about 25 million, how many mixed-race children have been born into these families? By extension, how many racially mixed Vietnamese were born during the thousand years of Chinese colonization?
In this sense, Taiwan today may be seen as a parallel case to Vietnam, and if projected back in time to 939, an independent Taiwan could well resemble Vietnam’s historical trajectory a millennium later.
The analogy of Taiwan is useful in illustrating the process of Sinicization in ancient Annam, which began with a comparatively small population – estimated at around 900,000 inhabitants according to the earliest Han records shortly after 111 B.C. It is important to note that Annam had to absorb a far larger influx of Chinese soldiers, numbering in the hundreds of thousands, who advanced steadily southward from the Qin and Han dynasties over the course of a thousand years beginning in the 2nd century B.C. As Chinese colonists established firm footholds in Annam, additional immigrants from the mainland followed, much like the later pattern observed on the island of Formosa.
Anyone other than a staunch Vietnamese nationalist can readily recognize the reasoning behind the affirmation of Chinese admixture with earlier resettlers in ancient Annam. The origins of Sinitic-Vietnamese etyma can likewise be explained through this analogy. Comparable cases are found elsewhere in the world. For instance, the French no longer speak their ancestral Gaulish tongue but instead use a Romance language of Latin origin, closely related to Italian and Portuguese. Bulgarian, too, is a hybrid language heavily shaped by loanwords. Beyond Europe, in Central and South America, the legacy of Spanish colonization demonstrates both genetic and linguistic transformation: Spanish conquistadors intermingled with indigenous populations, producing entirely new societies within less than four centuries. Conquest, in these cases, profoundly altered the racial and cultural composition of the populace, as is evident today.
In Vietnam, most early Chinese immigrants blended seamlessly into the Kinh majority. However, many later arrivals over the past four centuries often retained their distinct Chinese identity if they chose. These migrants came largely from the southern provinces of Guangxi, Guangdong (Canton), and Fujian ('Hokkien'), and were categorized into groups such as Chaozhou (Teochow), Cantonese, Hakka, Hainanese, and Hokkienese. A particularly significant wave arrived after the fall of the Ming dynasty, known as the Minhhương (明鄉) – descendants of Ming loyalists who fled the Manchu conquest. They were resettled in the vast, sparsely populated southernmost regions of Vietnam under unique geo-historical circumstances. Among them, the Tchewchow formed the majority of latecomers, and over time they were thoroughly absorbed into Vietnamese society. This assimilation is evident in the localized variants of Chinese surnames: Hoàng vs. Huỳnh (黃), Vũ vs. Võ (武), Hàn vs. Hàng (韓), Lưu vs. Lều (劉), and many others.
In effect, these surnames represent nearly all Vietnamese family names of Chinese origin, serving as living proof of their descent from a broader Chinese lineage inherited across generations. Ask a Vietnamese today about their surname, for example, Trần vs. Chen (陳), Trương vs. Zhang (張), and three or four out of ten may still be able to trace their genealogy back to Chinese roots. (See What Makes Chinese So Vietnamese? - Appendix I.)
The continuous southward migration of Chinese immigrants eventually reshaped the composition of the Kinh ethnicity in Vietnam, just as it transformed the Vietnamese language through the gradual layering of vast stocks of Chinese vocabulary. The emergence of Vietnamese as we know it may also have been the result of the forceful imposition of Chinese as a lingua franca during the centuries when Annam functioned as a prefecture of China. Inevitably, Chinese influence permeated every aspect of Vietnamese, leaving permanent marks across the linguistic spectrum, from the most basic stratum – distinguishable from core indigenous remnants of proto-Taic origin – to the elevated scholarly lexicon. These words remain in widespread use in daily life today. Indeed, if all Chinese-derived vocabulary were removed from modern Vietnamese, it would be nearly impossible to form a complete intelligible sentence; at best, speech would sound rigid and archaic, resembling the classical Chinese style of wenyanwen (文言文), especially since even grammatical function words (虛詞 xūcí, 'hưtừ'), including prepositions, are of Chinese origin.
Vietnam’s path to statehood unfolded under successive names: NamViệt, Annam, Giaochỉ, Giaochâu, ĐạiNgu, ÐạicồViệt, ÐạiViệt, ÐạiNam, and eventually Việtnam. The evolution of its people and language is increasingly corroborated by archaeological and prehistorical discoveries along the routes traversed by their ancestors. In antiquity, transportation and communication were extremely difficult, yet the trajectory of Vietnam’s national development parallels that of other societies worldwide, such as those in South America, South Africa, Singapore, and Taiwan.
The case of Taiwan offers a particularly illuminating comparison. After imperial China seized the island from the Dutch in the early 17th century, three centuries later, in 1949, Kuomintang rulers and soldiers fleeing the mainland established their exile government there. To consolidate power, they imposed authoritarian rule on the island’s indigenous peoples – comparable to the fate of the Mường minority in ancient Vietnam – who gradually became a marginalized minority within their ancestral homeland. By 2024, Taiwan’s population had reached approximately 23.6 million, shaped by waves of mainland resettlers and the enforced adoption of Mandarin as the national language. If we project this scenario back to the 2nd century B.C., when Jiaozhi (交趾) had a population of roughly 900,000 according to Han census data, and imagine Taiwan with only one twenty-third of its current population surviving as an independent nation into the 21st century, its trajectory would closely resemble that of Vietnam, both in terms of its people and its language, given the communication limitations of the time.
Colonial dynamics further accelerated the process of assimilation in Annam, producing what can be described as more than a millennium of self-inflicted Sinicization. This term underscores the fact that Annamese monarchs voluntarily adopted the Chinese despotic model, including its linguistic framework. Linguistic adoption was inevitable, as Vietnam relied on the Chinese writing system until the early 20th century. Alongside this, Confucianism, Taoism, and Buddhism profoundly shaped Vietnamese culture and belief systems, later blending with indigenous traditions in locally developed religions such as Caodaism and Hoahaoism, which combined ancestral worship with imported doctrines.
Efforts to establish a distinct national writing system emerged in the 15th century, as seen in works like Phậtthuyết… (Doctrine of Buddhism on…). The creation of Nôm characters (ChữNôm, 𡨸喃, from 字南 ZìNán, SV TựNam) adapted Chinese ideographs to represent native sounds, including local place names and Sinitic-Vietnamese variants of Chinese words. Nôm literature flourished from the 16th century onward. By the late 19th century, however, under French colonial rule, Quốcngữ – t he romanized orthography devised by Western missionaries – gradually replaced Chinese characters, enforced both by colonial decrees and national consensus.
Modern Vietnamese orthography thus reflects three major lexical strata:
- HánViệt 漢越 (HànYuè / Sino-Vietnamese, SV): the largest body of vocabulary derived from Chinese.
- HánNôm 漢喃 (HànNán / Sinitic-Vietnamese, VS): lexicons of Chinese origin adapted into Vietnamese usage.
Native substrata: words from Daic, Chamic, Austroasiatic Mon-Khmer, and other sources, including later loanwords.
The first two, of Chinese origin, dominate the Vietnamese lexicon. Put differently, had Vietnam remained a prefecture of China throughout its history, adding another 1,200 years of direct rule, Vietnamese would almost certainly be classified today as a Sino-Tibetan language, akin to Cantonese or Fukienese as of now. Just as Latin and Greek elements are indispensable to English, Sinitic elements form the very essence of modern Vietnamese. Based on solid linguistic evidence, we can identify clear commonalities between archaic Chinese and their Vietnamese equivalents, including a wide range of pre-Sino-Vietnamese (Tiền-HánViệt) forms rooted in proto-Vietic speech. Altogether, hundreds of Old Chinese forms have, over the centuries, found their way into Vietnamese, shaping it in profound and enduring ways .
II) Evidence from core vocabulary
The hypothesis of a common Yue origin is best illustrated through core vocabulary items that are resistant to borrowing. These words – kinship terms, natural elements, and human categories – show systematic correspondences between Vietnamese and Chinese. The grid below highlights these parallels:
| Vietnamese | Chinese | Pinyin | Sino-Vietnamese | Meaning |
|---|---|---|---|---|
| mẹ | 母 | mǔ | mẫu | mother |
| bố | 父 | fù | phụ | father |
| con | 子 | zǐ | tử | child, son |
| người | 人 | rén | nhân | human, person |
| trời | 天 | tiān | thiên | sky |
| nước | 水 | shuǐ | thuỷ | water |
| đất | 土 | tǔ | địa | soil, earth |
| ông | 翁 / 公 | wēng / gōng | ông / công | elder, grandfather |
| chị | 姊 | zǐ | tỷ | older sister |
| em | 妹 / 弟 | mèi / dì | muội / đệ | younger sibling |
Notes:
-
These are basic words, not specialized borrowings – they belong to the everyday lexicon.
-
Vietnamese often preserves substratal forms (trời, nước, đất) alongside Sino‑Vietnamese overlays (thiên, quốc, địa).
-
The coexistence of native and SV forms creates doublets, a hallmark of shared ancestry plus later prestige borrowing.
-
Kinship terms (mẹ, bố, con) show direct cognacy with Chinese equivalents, suggesting a Yue substratum rather than incidental borrowing.
A. Core matter of Vietnamese etymology
Cao Xuân Hạo (2001), a renowned contemporary Vietnamese cultural and linguistic scholar, in his article Tiếng Việt là Tiếng Mãlai? (Could Vietnamese be of Malay origin?), argues that most words regarded as original – từ thuầnViệt – are in fact not aboriginally pure. From his perspective, in linguistics there is no such thing as absolute purity. He emphasizes that it matters little whether Vietnamese is classified as having Chinese, Thai, Mon-Khmer, or Austroasiatic cognates; the central issue remains the same. As he illustrates, basic words such as chim ‘bird’ (Mon-Khmer origin), vịt ‘duck’ (Thai origin), cá ‘fish’ (Austroasiatic origin), and thỏ ‘hare’ (Chinese origin) are all still considered ‘pure Vietnamese’ (p. 90).
As has been suggested repeatedly, the story of nation formation is more significant than the question of the precise origin of its people or their ancestral speech. What matters in any language is its integrity as a whole, not the provenance of a handful of basic words. The holistic character of Vietnamese in its present form carries more weight than debates over whether its core lexicon is Austroasiatic or Austronesian, claims that remain inconclusive or speculative. With the discovery of Sino-Tibetan etyma in the Vietnamese basic lexical stock, readers may reconsider which linguistic family Vietnamese most appropriately belongs to (see Chapter 10 on Sino-Tibetan etymologies.)
Etymologically, Chinese words entered Vietnamese through borrowings from several dialects, especially Yue 粵 (Cantonese) and Minnan 閩南 (Fukienese, Tchiewchow, Hainanese, etc.), in addition to items already shared in the common lexical pool. For example: Fukienese /kẽ/ ~ VS con 'child'; Teochow /yẽo/ ~ VS dê 'goat'; Hainanese /bat7/ ~ VS biết 'know'. Dialectal variants of the same root, reintroduced by immigrants from different regions of southern China, are recognized as doublets that is a phenomenon common in Chinese itself. In this general context, ‘Chinese’ refers broadly to the Sinitic family, encompassing Mandarin and other dialects. Such synonymic layering accounts for the heavy vernacular influence of Chinese lects on Vietnamese across historical stages. For instance, the concept of 'cold' appears in multiple forms: 寒 (hán, SV hàn) vs. Hainanese /kwa2/ → VS cóng ‘freezing cold’; alongside VS giá, rét, and lạnh, cognate with Chinese qī (淒), liè (冽) , and lěng (冷), respectively, usage preferences differing between northern and southern speakers.
Later, doublets with identical pronunciations may have been popularized by prominent literati, facilitating their adoption. Some characters were retained in colloquial speech, while others were replaced by forms from different dialectal sources, leading to differentiated contexts. For example, cộ 'carriage' could be written 車, 檋, 輂, 輁, or 梮 (jù). A total number of Chinese glyphs is estimated at around 74,900, while the Kangxi Dictionary records about 50,000 entries, many of them dialectal variants or extinct doublets.
The integration of Old Chinese loanwords into early Vietic was driven by powerful social forces. On one hand, the common people, often illiterate, played a crucial role: sentries, village chiefs, market vendors, artisans, laborers, and especially native wives married (sometimes voluntarily, sometimes by decree of the emperor) to Han soldiers or officials. This explains why so many scholarly Sino-Vietnamese words entered everyday usage. On the other hand, Middle Chinese loanwords from the Tang Dynasty, though originally part of the scholarly court language (Mandarin of the time), also filtered into daily life, much as they did in Cantonese. Without this vernacular adoption, Middle Chinese words in Sino-Vietnamese form could never have achieved such widespread and frequent usage in modern Vietnamese.
In contemporary times, many have witnessed accounts, sometimes dramatized in media, of Vietnamese women, often from impoverished backgrounds, who once served as maids to French colonialists or companions to American soldiers in the recent past. This pattern has continued into the 21st century, now taking the form of Vietnamese women seeking to escape poverty through marriage to men in Singapore, Taiwan, China, and South Korea. Looking back historically, the influence of such unions on local vernaculars has been significant, to the point that new dialects could emerge for practical communication, just as ancient Annamese once did. In a similar way, 'Taiwanese' developed as a distinct speech form. At its core, all of this unfolded for economic reasons.
The author observes that the contextual manner of Vietnamese speech bears notable similarities to northern Mandarin, as illustrated in several examples throughout this survey. Linguistically, this situation resembles that of contemporary Vietnamese brides and their Taiwanese husbands today. Looking further back, during the Qin and Han periods, native brides likely spoke a form of ‘pidginized’ Ancient Chinese in order to communicate with their 'outlander' husbands. Unlike the foot soldiers stationed in southern China, however, the long-march cavalrymen from northern China, by virtue of their higher social status, may have contributed more of their own northern dialectal vocabulary into local communal speech, while simultaneously adopting individual words from the indigenous language.
In either case, these newcomers were adapting to new environments, often with the intention of permanent resettlement. Children born into privileged families of the ruling class were more likely to receive formal education and, as adults, to participate in local governance. Their literary usage, derived from the mainstream Mandarin linguistic stock, was reinforced by the vernacular lingua franca of the imperial court and official correspondence. Within only a few generations, many of these families might fall into decline, a common phenomenon in both China and Vietnam, yet some individuals continued to pursue scholarship, serving as teachers and eventually assimilating into the Kinh majority. These were the people who had once spoken the language of mandarins and officials across successive dynasties.
This process helps explain why so many Middle Chinese scholarly terms remain embedded in modern Vietnamese and are used in everyday contexts. Examples include sínhlễ (聘禮 pìnglǐ, ‘betrothal’), vuquy (于歸 yúguī, ‘bridal nuptial ceremony’), kínhtrọng (敬重 jìngzhòng, ‘respect’), or ẩmthực (飲食 yǐnshí, ‘food and drink’). These words are indispensable in Vietnamese today. At the same time, learned scholars of the past spoke 'Annamese' at home, already infused with numerous early Sinitic-Vietnamese lexical items, some predating the Sino-Vietnamese layer of Middle Chinese, as well as newly coined forms derived from Old Chinese materials such as syllabic stems, roots, and affixes. For example, VS chủxị ('host') < VS chủtiệc < SV chủtịch < 主席 M zhǔxí < MC /tɕiozjek/.
The process of linguistic localization continually accelerated the integration of loanwords into the mainstream Vietnamese lexicon. This mechanism functioned much like the rapid adoption of computer jargon and texting slang in modern times. Such phenomena are common across languages, including those of the Indo-European family, as demonstrated in well-documented cases such as Albanian or Haitian French.
Over time, colloquial Annamese developed into regional subdialects in
the north, central, and south. These variants blended indigenous
elements with Chinese dialectal influences to varying degrees. For
example:
- ăn 唵 vs. xơi 食 shí / 吃 chī (eat)
- uống 飲 yǐn vs. hớp 喝 hè (drink)
- buồn 悶 mèn vs. phiền 煩 fán (sorrow)
- khoái 快 kuài vs. vui 娛 yú (joyful)
- lùn 短 duǎn vs. thấp 低 dī (short)
- bảnh 昺 bǐng vs. sáng 亮 liàng (bright)
- mơi 明 míng vs. mai 明ㄦ mír (tomorrow)
- mới 萌 méng vs. xịn 新 xīn (new)
- cũ 舊 jiù vs. cổ 古 gǔ (ancient)
- heo 亥 hài vs. lợn 腞 dùn (pig)
- cọp 虎 hǔ (SV hổ) vs. hùm 甝 hán (tiger)
Such borrowings and their variants enriched the Sinitic-Vietnamese vocabulary. Over the last two millennia, this process also differentiated homonyms through tonal distinctions and fostered polysyllabicity, particularly disyllabic forms, a trend that continues today.
As these loanwords matured, they became fully localized, appearing in both inseparable compounds and independent forms. Remarkably, the transformation from Chinese to Vietnamese was generally smooth, requiring little intervention from the intelligentsia, who often dismissed colloquial mispronunciations as vulgar. For instance, the disyllabic kinhkhủng (驚恐 jīngkǒng, 'terrifying') evolved to convey both terrifying and, more recently, terrific. In modern usage, khủng alone has come to mean 'terrific', a semantic development absent in contemporary Chinese.
Before the 17th century, literary works from earlier periods required extensive annotation for modern readers to grasp their vocabulary. This reflects the rapid pace of linguistic modernization and the phonological shifts that occurred independently of colloquial speech. At one stage, two literary registers coexisted: Classical Chinese (Wenyanwen 文言文), e.g., SV niên for 年 nián, and the spoken vernacular, e.g., VS năm (𢆥) , which remained unrecorded until the emergence of ChữNôm in the 12th century. Nôm adapted Chinese characters to transcribe local phonetics, bridging the gap between written and spoken forms.
Meanwhile, Yue and MinNan dialects (Cantonese and Hokkienese) underwent a different trajectory. After the fall of the NanYue Kingdom in 111 B.C., their speakers remained within the Sino-sphere and, under sustained Sinicization, their languages were absorbed into the Sino-Tibetan family. Earlier Yue substrata were gradually buried beneath layers of Chinese superstrata, though vestiges survive in Vietnamese: 戶 hù = cửa (door), 胡 hú = cổ (neck).
The development of ancient Annamese parallels cultural continuities such as the Đôngsơn and Ngọclữ bronze drum traditions, which trace back to Phùngnguyên culture. Yue entities existed long before Han colonization (111 B.C.-939 A.D.), and remnants of Sahuỳnh, ÓcEo, and Khmer civilizations in the south predated Annamese expansion there. In essence, Taic-Yue elements preceded the emergence of Taic-Han influences, which later recombined with Yue substrata to form what became known as Annamese.
With modernization, phonological change slowed, particularly after the adoption of the Romanized Vietnamese script (Quốcngữ) in the early 20th century, first imposed by the French colonial administration. Today, mass communication, internet access, and rapid transportation have further reduced regional accent gaps and pronunciation discrepancies.
In the modern era, scientific and technological vocabulary has entered Vietnamese largely through Japanese (via Chinese mediation), French, and later English. The influx of new terms continues at a rapid pace, alongside creative acronyms, abbreviations, and texting shorthand. Examples include khủng for kinhkhủng (terrific), ko for không (no), or playful text forms like Hum ni là sn of e, dc gì hit for Hôm nay là sinh nhật của em, đâu có gì hết. Proposals such as Bùi Hiền’s (2018) Tiếq Việt reform, introducing letters like F, W, J, and Z, further illustrate this trend.
Ultimately, the essence of Vietnamese lies in its holistic structure. Both Sino-Vietnamese and Sinitic-Vietnamese elements form the living core of the language. Just as English cannot be understood without its Latin and Greek components, Vietnamese cannot be properly classified by isolating a handful of Austroasiatic residues. To crown Mon-Khmer as the defining feature of Vietnamese on the basis of a few lexical survivals would be misleading. The Sinitic elements are not peripheral, they are the very substance of the language as it exists today.
Table 5 - Cases of Mon-Khmer ~ Vietnamese cognates analogous to ethnic cooking, more than a salad bowl
Specialty dishes often change slightly as they move from one place to another. For instance, a bowl of 'hủtiếu Namvang' (Phnom Penh–style noodle soup) or 'cơmgà Hảinam' (Hainan–style chicken rice) prepared in Saigon can taste even better than in their original homelands of Cambodia or China. In many cookbooks, these recipes are sometimes adjusted by Western chefs of ethnic cuisine, who add extra spice to suit new culinary trends in Western Asian cooking schools. This is metaphorically comparable to Austroasiatic influences in linguistics: Vietnamese dishes that began as Mon-Khmer or Hainanese creations are now infused with local Vietnamese ingredients to satisfy local palates. Similarly, a bowl of 'phở' (beef noodle soup) in Lyon, France, does not taste the same as what a discerning diner might experience in Saigon or California. Historically, however, French beef stocks contributed to the invention of 'phở', which was then cooked with Vietnamese anchovy sauce (though the word 'phở' does not derive from 'pot-au-feu'). In fact, the essence of Vietnamese 'phở' (粉 fěn) lies in its blend of cinnamon and anise – spices long used in Chinese cooking and the very condiments that once drew European colonialists to Asia. At the same time, it was Westerners who eventually introduced the Latin alphabet into the Vietnamese language.
It is crucial not to overlook prominent linguistic features, and newcomers to the field should avoid retracing well‑worn paths that merely reaffirm a small set of Mon‑Khmer cognates in Vietnamese. Such limited evidence cannot substantiate an Austroasiatic origin for the language, since historical linguistics must ultimately be anchored in history itself. A strictly prehistoric Austroasiatic approach risks collapsing into the outdated notion that all languages descend from a single common source. This framework strips Vietnamese of its historical grounding and distorts its overall balance: more than 90 percent of its roughly 420 fundamental items can be traced to Sino‑Tibetan etymologies (see What Makes Chinese So Vietnamese - Chapter 10), layered over only about 10 percent of the hypothesized Austroasiatic Mon‑Khmer base. In contrast, the holistic perspective advanced here emphasizes that it is the totality of the language – not a narrow subset – that truly matters.
Moreover, proponents of the Austroasiatic Mon‑Khmer hypothesis have often overlooked the fact that many of the fundamental words they cite also occur as cross‑cognates in Chinese or other Sino‑Tibetan etymologies within the same basic lexical domains. Although these overlaps are well documented, they were excluded from percentage counts – 'mắt' (目 mù, 'eye'), 'lúa' (來 lái, 'paddy'), 'chim' (禽 qín, 'bird'), 'vịt' (鴄 pī, 'duck'), 'cá' (魚 yú, 'fish'), 'cọp' (虎 hǔ, 'tiger)', etc. – likely due to limited awareness of the deeply intertwined histories and cultures of China and Vietnam when the hypothesis was first advanced. Such crossover phenomena, in which basic Mon‑Khmer words also appear in both Vietnamese and Chinese, may have been observed by Austroasiatic theorists, but they were interpreted narrowly through the lens of genetic affiliation rather than situated within a broader historical‑linguistic framework.
In a parallel development, the neglect of cultural and historical dimensions – such as the intimate kinship terms for parents, siblings, and other close relatives (to be examined in detail later) – may be compared to the unresolved puzzle of how bronze drum relics came to be discovered in the distant Indonesian archipelago. Strikingly, no comparable bronze artifacts have ever been unearthed within the former territories of the ancient Khmer Kingdom. It is implausible to suggest that these drums were transported by sea as tributes from later Chamic Muslims on pilgrimage to Indonesia’s Islamic centers, since the chronology does not align with the early period of the Han invasion of Annam, when the Champa Kingdom was still firmly rooted in Hindu religion and culture.
Archaeological evidence reinforces this point: no bronze artifacts have been found within the boundaries of the former Champa territories, nor within the aboriginal regions inhabited by the proto‑Chamic Li minority (黎族) in the Tongzha mountainous autonomous area of southern Hainan Island, China where the prehistoric ancestors of the Cham people in Central Vietnam had emigrated from.
From an anthropological perspective, no cultural artifacts comparable to bronze relics have been discovered linking these three regions, nor does any written history support the claim that Austroasiatic Mon‑Khmer peoples formed the aboriginal base for constructing a linguistic‑historical profile.
Thus, the Austroasiatic Mon‑Khmer explanation remains only a tentative hypothesis, especially when weighed against the more than 420 Sino‑Tibetan etyma identified in the Vietnamese core lexicon, which provide a far stronger foundation for classification.
By similar reasoning, an important linguistic question arises: how could several Mon‑Khmer basic words have entered Vietnamese by the end of the 12th century, when the country’s southern frontier still extended no further than Thanhhoá in present‑day north‑central Vietnam? Historical records indicate that it was only in the 13th century and afterward, well after the emergence of ĐạiViệt (大越, the Great Viet) as a consolidated state, that its people began their southward expansion, ultimately conquering and erasing the 1,600‑year‑old Champa Kingdom and annexing nearly one‑third of the territory east of the former Khmer realm. For those etyma whose origins remain uncertain, the answer must be sought through deeper historical inquiry. Linguistic questions cannot stand alone; they must be corroborated by written history.
For instance, history also records that the ancient Khmer Kingdom lost its western lands to Siam (modern Thailand), then inhabited by the Dai (傣) aboriginal people. The Thai are descendants of the Dai, who themselves descended from the Taic peoples that once ruled the Chu State (楚國) in ancient China. Cognates of their basic vocabulary are preserved in the Erya (爾雅) dictionary (De Lacouperie [1887] 1963).
From an analytical perspective, one can observe clear etymological connections between Thai and Vietnamese basic words, as noted by Haudricourt in the early 20th century. Examples include Vietnamese gạo and Thai ข้าว /kʰâːw/ (稻 dào, "rice"), or Vietnamese gà and Thai ไก่ /kài/ (雞 jī, "chicken"). Such cognates point to a shared Taic‑Yue heritage that predates the later southward expansion of ĐạiViệt. (5).
These two Vietnamese and Thai cognates appear to date back to a period preceding the split of the hypothetical Taic-Yue (傣越) linguistic family into the Dai (傣, 臺 Tai, as discussed by Ding Bangxin 1977: 36–45) and Yue (越) branches in China South. Between these developments lies a clear Khmer gap, underscoring the historical reality that Mon‑Khmer basic words entered Vietnamese only in regions where Mon‑Khmer speech communities existed as isoglosses.
Austroasiatic Mon‑Khmer theorists have tended to treat non‑Mon‑Khmer basic words as separate cases, dismissing any fundamental items not cognate with Austroasiatic languages. Their assumption has been that Vietnamese vocabulary must derive from Mon‑Khmer, rather than the reverse. Consequently, they have cast an Austroasiatic blanket over the lexicon, labeling all Sinitic‑Vietnamese words that do not fit the Mon‑Khmer stock as mere Chinese loanwords. Yet if we set aside the term loanword, it becomes clear that many of these items are not borrowed at all. Instead, they display structural and phonological affinities with Cantonese and Fukienese dialects, both classified by Chinese institutes as Sino‑Tibetan, or with other Sino‑Tibetan etymologies, none of which are rooted in Austroasiatic Mon‑Khmer (see Chapter 10 on Sino‑Tibetan Etymologies).
It has long been observed that stronger and more advanced societies exert influence over weaker ones. History affirms this principle, most clearly in the Han dynasty’s domination of Vietnam, enforced through harsh measures such as General Ma Yuan’s (馬援, Mã Viện) destruction and melting down of bronze drums during the early years of conquest. By the same reasoning, certain unverifiable basic words in Vietnamese may trace their origins to the Khmer Empire, which between the 9th and 13th centuries rose to extraordinary power in Southeast Asia. The scale of its dominance remains visible today in the monumental ruins of Angkor Wat and Angkor Thom; they are vast citadels and palaces that continue to inspire awe across the world.
Recent years, as reported by the BBC London, the western technology had find out that there would be more walled cities of the ancient Khmer Kingdom to be discovered in the years come after they laser-scanned the Cambodia's tropical jungle and found some still stayed hidden deeply underneath dense layers of the rain-forest.
By that same period, nevertheless, the young Annam State had just barely emerged from the long submergence in the Chinese sphere that everything was permeated with Chinese elements, culturally and even racially. It was postulated that in very late period that the Mon-Khmer isoglosses spread out and got in touch with the Middle Vietnamese when the ancient Annam had been still located in the upper north of 16th latitude. In fact, the ancient Vietnam still remained as a vassal state of their China long after her independence and easily succumbed to its power until these days.
In the study of lexical development, etyma that can be traced to verifiable roots within a span of roughly 1,000 to 1,500 years are most plausibly loanwords, transmitted through contact, especially among neighboring languages. In the case of Vietnamese, the evidence points overwhelmingly toward Chinese influence, a consequence of Vietnam's long history as a prefecture under Chinese rule. Geographically, prior to this extended period of contact, the indigenous LạcViệt people were located further north in the Red River Basin, while to the south lay the thousand‑year‑old Kingdom of Champa, positioned between ancient Annam and the Khmer Empire.
Rather than addressing the racial mixture of populations in mainland China, Austroasiatic Mon‑Khmer theorists have attempted to explain the cognateness of Chinese and Vietnamese basic words through cultural causality, arguing that whatever China possessed, Vietnam must also have acquired. So the reverse is not true: China does not share certain uniquely Vietnamese cultural elements. A striking example is nướcmắm ("fish sauce"), with its modern Chinese equivalent, 魚露 (yúlù), is a loan translation whose morphemic structure parallels 魚汁 yúzhī ("fish extract"), deriving rom Fukienese, a Minnan dialect. In fact, this semantic pathway even contributed to the English word "catsup" or "ketchup", as previously noted. (See Michael Barris's Ketchup's Chinese origins a sticky subject for US foodies.) Interestingly, in Vietnamese nướcmắm, the order of the morphemes is reversed compared to 鹹液 (xiányè, SV hàmdịch, Cant. /ham2jik8/), suggesting that the original etymon may have referred to anchovy sauce, a condiment long consumed by peoples of Yue origin.
The Austroasiatic Mon‑Khmer theorists, therefore, could not avoid acknowledging that Vietnamese basic cognates reflect not only Mon‑Khmer but also Chinese influence as fundamental to the formation of the modern language. Confronted with the overwhelming influx of Chinese vocabulary, however, they stopped short of classifying Vietnamese as a mixed language. Instead, they treated the approximately 98 percent of its lexicon cognate with Chinese as mere loanwords, while attributing the remaining fraction, less than 2 percent, to Austroasiatic roots. On this basis, they reclassified Vietnamese under a sub‑branch of Mon‑Khmer, rather than considering the reverse possibility.
This reclassification appears to have been largely a matter of convenience. The label "Austroasiatic Mon‑Khmer" was employed to account for basic words shared between Vietnamese and the Mường dialects. Yet Vietnamese has never been the source from which other Mon‑Khmer languages diverged. What was overlooked is the possibility that ancient Taic or Yue languages may represent the deeper ancestral roots of multiple linguistic families – including Chinese and Mon‑Khmer – depending on the perspective adopted. The present author emphasizes this latter view, grounded in the linguistic peculiarities of Vietnamese, particularly its pervasive Chinese elements.
The difficulty lies in the fact that Austroasiatic specialists lacked historical evidence to support their hypothesis. Their arguments rested almost entirely on linguistic mechanics, citing a handful of Mon‑Khmer basic words that may be regarded as etymological relics, preserved mainly among mountain minorities who contributed only marginally to the genetic and linguistic makeup of the Kinh majority. In reality, the Vietnamese have long represented the ancient Yue, historically known as the " >Yue of the South " >, that is, "Vietnam ", > the final stronghold of the Yue. Thus, Vietnamese historical linguistics must be understood as the study of the Yue language, which was closely related to the ethnic languages of southern China, not to the Mon‑Khmer languages of the Indochinese peninsula. Mon‑Khmer could only be considered part of the Yue continuum if it could be demonstrated that Mon‑Khmer groups migrated southward from China into Indochina centuries before the rise of ancient Annam, rather than originating from the southwest or northwest in the direction of the Munda isoglosses.
From an etymological perspective, geographical proximity and cultural contact naturally facilitated lexical exchange. Just as Mon‑Khmer words entered Vietnamese, practical Vietnamese terms could also have diffused into Mon‑Khmer languages. Many of these words likely reached Vietnamese through the Mường, whose genetic and linguistic closeness to the Vietnamese is undeniable. The Mường, in turn, maintained contact with Mon and Khmer speakers after the Viet‑Mường split, which coincided with the arrival of the Han Chinese. In this sense, the Mon‑Khmer elements in Vietnamese can be understood as linguistic inheritances transmitted through Mường speech, functioning as a buffer between Vietnamese and Mon‑Khmer lexical strata.
From a broader perspective, the author concedes that this interpretation remains a minority view, and his lone voice insufficiently persuasive to overturn the entrenched Austroasiatic Mon‑Khmer hypothesis, which continues to be refined and upheld by successive generations of linguists. It is apparent that new entrants to the field often follow the path established by their predecessors, constructing their arguments upon the same old foundational premises. In the debate over tonality, for example, many have embraced Haudricourt’s theory that ancient Annamese was originally toneless due to its Mon‑Khmer origin, and that his model of tonogenesis explains the transformation of atonal Vietnamese into a tonal language rather than attributing its tonal system to the extensive infusion of Old and Middle Chinese vocabulary.
Theoretically, tones should arise as a natural feature of language, as Haudricourt argued in the case of ancient Annamese. Yet it is clear that spoken languages cannot artificially acquire such attributes; one cannot simply “add” tones to convert a toneless language into a tonal one. In practice, Vietnamese speakers instinctively applied tones even to early French loanwords – such as cờlê, mỏlết, bíttết, bơsữa, and càphê – thereby integrating them seamlessly into the tonal framework of Vietnamese. This demonstrates that tonality is intrinsic to the language. Haudricourt’s hypothesis of Vietnamese tone genesis is therefore untenable: according to his model, Vietnamese only became fully tonal in the 12th century through internal phonological transformations rather than prolonged contact with Chinese, an explanation that is logically inconsistent.
Admittedly, in their earliest stages, all human languages may have originated as toneless or monosyllabic sounds. That, however, is not the central issue here. From a Sinitic perspective, it is more plausible that proto‑Yue or early Taic languages developed tonal distinctions by intonating proto‑consonantal pitches into four tones. For instance, 恐龍 kǒnglóng (SV khủnglong, "dinosaur") can be reconstructed from /klong/, where the disyllabicization of the complex consonantal initial /kl‑/ helps account for the structural similarities.
It is evident that tones did not accompany Chinese loanwords in unrelated non‑tonal languages, nor did those languages subsequently develop tonality. The clearest examples are the toneless Chinese borrowings in Japanese and Korean. In both cases, Chinese loanwords were stripped of their tonal distinctions. During the Tang dynasty (618–907), Japan and Korea systematically borrowed a substantial body of Chinese vocabulary. Remarkably, they also devised ways to extract phonemic values from Chinese characters to create their own national writing systems, a revolutionary departure from the Sino‑centric mindset. Yet, despite this innovation, the intrinsic structures of Japanese and Korean prevented them from accommodating both tones and semantics simultaneously. As a result, they continued to pronounce Chinese loanwords without tonal distinctions. For instance, to Korean ears, 防火 fánghuǒ ("prevent fire and 放火 fànghuǒ(">set fire are both rendered as banghwa. By contrast, modern Vietnamese speakers still distinguish phònghoả and phónghoả as two entirely opposite concepts. In other words, every Vietnamese word is morphemized with one or more of the eight tonal categories inherited from Middle Chinese.
Ironically, modern Putonghua (Mandarin), said to descend directly from Middle Chinese, now retains only four tones. This suggests that northern Chinese populations, many of Manchurian or Altaic origin, were themselves unable to preserve the full tonal system of Middle Chinese, which comprised four tones in two registers (often counted as eight in total). If even they could not maintain the original tonal distinctions, it is implausible to expect speakers of Mon‑Khmer languages, who may struggle to differentiate subtle tonal contrasts, to have done so. The difficulty is evident when Western learners of Mandarin attempt to distinguish the four tones in simple syllables such as ma1, ma2, ma3, and ma4. Austroasiatic theorists, therefore, undermine their own position by overlooking the profound historical impact of both Vietnamese and Chinese tonal systems on the development of Vietnamese.
Academically, the Austroasiatic Mon‑Khmer hypothesis is not the only explanatory path. The portion of Mon‑Khmer cognates in Vietnamese is both scant and of marginal significance. Many of these supposed cognates can be traced instead to Chinese origins, for example, chồmhỗm (“squat”) /chromom/ vs. 犬坐 quánzuò, or chòhõ (“stand at ease with legs apart”) /choho/ (đứng chànghảng) vs. 伸站 shēnzhàn.
A more fruitful approach to Vietnamese etymology requires a new methodology grounded in two pillars: first, the extensive body of Sino‑Tibetan basic vocabulary, over 420 fundamental items by recent counts, demonstrating common roots, as illustrated in Shafer’s monumental Sino‑Tibetan study (1972); and second, the growing body of evidence for cognateness between Chinese and Vietnamese etyma, with undeniable correspondences across virtually every linguistic category.
Initially, the author, too, accepted the Austroasiatic Mon‑Khmer theory, as did author Bình Nguyên Lộc (1972). His conviction rested largely on its wide acceptance and the numerical dominance of its adherents over Sino‑Tibetan proponents, compounded by his own lack of familiarity with Mon‑Khmer. Over time, however, his research in Vietnamese historical linguistics led him to reconsider. He observed that Austroasiatic specialists often employed an umbrella approach, drawing conclusions solely from the Mon‑Khmer elements in Mường sub‑dialects and then extending these to Vietnamese as a whole. This reasoning, however, fails to account for discrepancies, such as the Khmer numerals one through five, which are frequently cited but remain problematic.
The comparative wordlists themselves raise questions. Austroasiatic linguists, trained in Western methods, often relied heavily on local informants or interpreters during fieldwork – individuals who may not have fully understood linguistic principles, let alone comparative or historical linguistics. By contrast, when I began analyzing Shafer’s Sino‑Tibetan listings (1972) alongside reconstructions of Old Chinese phonology by Karlgren, Schuessler, Wang Li, Zhou Fagao,Nguyễn Tài Cẩn, and others, I found that Sinitic‑Vietnamese fundamental vocabulary shared far more commonality with Sino‑Tibetan than with Austroasiatic Mon‑Khmer.
Over the years, I have collected examples from both Chinese classics and modern Chinese media that demonstrate how deeply classical Chinese and vernacular Mandarin have permeated Vietnamese. Interestingly, modern Vietnamese preserves certain lexical usages and expressions absent even in Cantonese or Minnan dialects. For instance, while Cantonese uses /fajng1kao1/ and Hainanese /k’waj5majk8/ for "sleep", VS employs ngủ, a cognate of 臥 wò (SV ngoạ).
Additional Sinitic‑Vietnamese etyma further illustrate the close integration of Vietnamese daily speech with Chinese vernacular forms, independent of the classical written tradition once dominant in Annam. Examples include:
- 何故 hégù ("how come") → cớsao
- 為啥 wèishá ("why") → vìsao
- 卸罪 xièzuì ("to blame") → đổlỗi
- 賴他 lài tā ("because of him") → tạinó
and
- 幹活 gànhuó ("to work") → làmviệc
- 忙活 mánghuó ("to be busy") → bậnviệc
- 生活 shēnghuó ("life") → cuộcsống
- 勤勞 qínláo ("diligent") → làmsiêng
- 勞動 láodòng (“labor”) → làmlụng
- 再來 zàilái ("do it again") → làmlại
- 上來 shànglái ("come up here") → lênđây
- 離近 líjìn ("come closer") → lạigần
-
離開 líkāi ("leave") → rờikhỏi
etc.
Taken together, these examples demonstrate that Vietnamese is far more deeply and systematically connected to the Sinitic tradition than the Austroasiatic Mon‑Khmer hypothesis allows.
The cited samples above represent new findings prepared by the author. They may be regarded as a long‑overdue breakthrough in Vietnamese historical linguistics, one that Austroasiatic theorists, despite decades of effort, have not been able to produce. Recall that Austroasiatic specialists have often exaggerated the significance of a handful of "basic words", many of which were collected during field trips with the assistance of local guides from Mon‑Khmer minority communities, often under the sponsorship of short‑term institutes.
Although the broader linguistic community has not embraced the Sino‑Tibetan hypothesis of Vietnamese, Austroasiatic theorists have effectively confined the language within a Mon‑Khmer framework for so long that they have hindered serious progress in Vietnamese etymology for decades. As a result, new researchers entering the field often rely solely on academic coursework and then imitate their predecessors, conducting field surveys, asking questions of local informants, and repeating outdated methods. These antiquated approaches have produced little of substance over the past sixty years, digging deeper trenches without uncovering new insights, and widening the gap that others hesitate to cross.
In truth, the same methodological weaknesses can afflict any camp. A lack of expertise in related languages, such as Chinese dialects or Mon‑Khmer isoglosses, can lead to flawed conclusions. Etymological research requires not only access to resources but also intellectual rigor, analytical precision, and creative insight. It is not enough to manipulate tools, apply mechanical rules, or tabulate word lists to justify the legitimacy of a few shared items. Among more than 20,000 Sinitic‑Vietnamese words in daily use, how many Mon‑Khmer cognates can truly be identified as fundamental? Do those few cognates constitute the essence of the language? Vietnamese speakers can communicate effectively without them, which underscores their marginal role.
No one can master all relevant languages, Thai, Zhuang, Mon, Khmer, Mandarin, Cantonese, Hainanese, Fukienese, Vietnamese, and others. Students may pass examinations in historical linguistics through aptitude and training, but specialists often reach conclusions based on limited knowledge. Those who begin from flawed premises risk perpetuating errors and passing on misinformation, unless corrected by well‑informed experts.
This research introduces a novel approach to Sinitic‑Vietnamese etymology, offering students a methodology for tracing Vietnamese etyma that align with Sino‑Tibetan and Sinitic roots, comparable to dialectal forms across the Chinese linguistic sphere. The work is original, first appearing online some twenty years ago, and has since been cited as supporting evidence by others in the field.
Comparative historical linguistics demands more than cleverness; it requires sensitivity to cultural identity and linguistic belonging. Nationalism often shapes perceptions: for example, descendants of Chinese immigrants in Vietnam may identify fully as Vietnamese, just as Taiwanese or Singaporeans of Chinese descent embrace their local national identities. Such factors can cloud academic judgment.
The purpose of this research is not to "prove" a Sino‑Tibetan genetic origin for Vietnamese, nor to deny the existence of Mon‑Khmer cognates. Rather, it seeks to introduce new data and methodologies to re‑evaluate thousands of Vietnamese words with demonstrable Chinese roots. These tools can also reassess words previously classified by earlier linguists in both camps.
The lexical commonalities are undeniable. Austroasiatic theorists have highlighted overlaps with Mon‑Khmer, but they cannot account for the breadth of basic vocabulary that clearly aligns with Chinese: nạ = 娘 niáng ("mother, bố = 父 fù(">father, mẹ = 母 mǔ ("mother"), xơi = 食 shí("eat"), ăn = 唵 ǎn("eat"), uống = 飲 yǐn ("drink"), ngủ = 臥 wò ("sleep"), mắt = 目 mù("eye"), đầu = 頭 tóu("head"), sọ = 首 shǒu ("cranium"), ngực = 臆 yì ("chest"), phổi = 肺 fèi ("lung"), bụng = 腹 fù ("stomach"), gạo = 稻 dào ("rice"), chim = 禽 qín ("bird"), cá = 魚 yú ("fish"), lửa = 火 huǒ ("fire"), lá = 葉 yè ("leaf"), nhà = 家 jiā ("home"), lợn = 豚 tún("pig"), săn = 田 tián("hunt"), and many others – not to mention those etymons that do not look so obvious .
Unsurprisingly, Vietnamese linguists in the Mon‑Khmer camp will defend their position vigorously. Yet this research demonstrates that the number of Mon‑Khmer basic words cited are about 170 items, pales in comparison to the vast body of Vietnamese words with Chinese origins. If fundamental vocabulary is defined as roughly 200 items, as is widely accepted in linguistics, then the same logic used by Austroasiatic theorists would classify Vietnamese as Sino‑Tibetan.
Cross‑linguistic classification supports this view. Cantonese and Minnan, two major Chinese dialect groups, are classified within the Sinitic branch of Sino‑Tibetan primarily on the basis of shared Sinitic stock, not merely basic vocabulary. By the same reasoning, Vietnamese should be considered alongside them. All of these languages descend from Yue lineages within the Taic‑Yue continuum, with Sino‑Tibetan affinities. Theories of proto‑Taic and pre‑Sinitic contact prior to the Zhou Dynasty (1122–256 BCE) further support this affiliation.
While there is no need to rush to reclassify Vietnamese as Sinitic, the evidence shows that its basic lexicon is plausibly cognate with Sino‑Tibetan. Moreover, Sinitic‑Vietnamese words share not only vocabulary but also structural and phonological traits, including subtle features characteristic of closely related languages.
The following sections in the table below will present striking linguistic similarities in greater detail, with further examples and elaborations to reinforce these points.
Table 6 – Core matter of Vietnamese etymology
-
The tonal system
Vietnamese employs an 8‑tone system, often described as 6 in modern orthography, which does not account for the two 'entering tones' (thanhnhập, 入聲 rùshēng). This system corresponds almost perfectly to the Middle Chinese tonal scheme of four tones in two registers. Vietnamese tones can be mapped directly onto those of modern Cantonese and Minnan dialects, and they also allow Tang poetry to be recited in full accordance with its strict rules of tonal melody and rhyming syllabic finals (Xu Liting 1982:219).
Mandarin today preserves only 4 tones, yet their pitch values align closely with Vietnamese homonyms. Other southern Chinese dialects, especially Minnan and Yue, retain between 7 and 10 tones. Cantonese, for example, distinguishes 9 tonal categories: ma1, ma2, ma3, ma4, ma5, ma6, mak7, mak8, mak, which correspond to Vietnamese ma, mà, mả, mã, má, mạ, mác (mát, máp), mạc (mạt, mạp), mac (mat, map).
Comparable tonal values are also found in minority languages of southern China, such as Zhuang, Dai, and Miao, spoken across Yunnan, Guizhou, Guangxi, Hunan, Guangdong, and Fujian provinces. In these cases, each tone carries nearly the same phonetic value as its equivalent in Chinese dialects, suggesting an inherent and shared tonal system that points to a common linguistic family.
By contrast, Mon‑Khmer languages lack such complexity. Their intonation patterns are roughly equivalent to only two Vietnamese tones, /ma/ and /mà/, underscoring the fundamental difference between the tonal systems of Vietnamese and those of Mon‑Khmer.
-
Main sentence structure:
Both Chinese and Vietnamese share the basic {Subject + Verb + Object} (SVO) pattern-
我 愛 小燕! (Wǒ ài Xiǎoyàn!) → Tôi yêu Tiểu-Yến! 'I love Xiaoyan'.
Variants include object‑fronting:
-
飯 我 吃了. (Fàn wǒ chī le.) → Cơm tôi ăn rồi. 'Meal, I already ate.'.
-
這 本書 我 看 了. (Zhè běnshū wǒ kàn le.) → Quyểnsách này tôi xem rồi! 'This book I have already read'.
-
把 水果 帶 過來 請客. (Bǎ shuǐguǒ dài guòlái qǐngkè.) → Bưng tráicây đem quađây mời khách. 'Bring the fruits over here to treat our guests'.
Dual subject:
-
小燕 她 愛我. (Xiǎoyàn tā ài wǒ) → Tiểu Yến nó yêu tôi. 'Xiaoyan she loves me'.
In the final example, aside from constructions involving direct and indirect objects, neither language permits a dual S+V+OO structure in which the objects are expressed redundantly, as in 'Tiểu Yến nó...' At the same time, both languages allow the omission of either the subject or the object when the referent is contextually understood. This feature is of particular significance in comparative linguistics, as it highlights a structural parallel that serves as an important diagnostic criterion for establishing genetic or areal relationships among languages within the same family.
-
-
"Isolate" construction
Both Chinese and Vietnamese lack inflectional affixes to mark grammatical functions or syntactic relations, unlike Indo‑European languages. Instead, they rely on fixed lexical items to form stative, copulative, passive, active transitive, and qualificative constructions.
Grammatical function words
Vietnamese Chinese Pinyin English meaning không 不 (bù) 'negation' có 有 (yǒu) 'there is' / 'exist' là (thì, SV thị) 是 (shì) 'to be' bị 被 (bèi) 'passive' (in Chinese also active, in Vietnamese only passive) được 得 (dé) 'active/resultative' nó thôngminh 她聰明 (tā cōngmíng) 'she is intelligent' cóphải 是否 (shìfǒu) 'is it…?' cóphảilà 是不是 (shìbùshì) 'is that…?' không (final particle) 不 / 否 (bù / fǒu) '…isn’t it?', '…don’t you?' Morphemic syllables functioning like affixes
These morphemic forms operate in parallel across both languages because Vietnamese not only borrowed entire Chinese allomorphic sets but also employed them to construct polysyllabic words with identical semantic and structural properties.
Vietnamese Chinese Pinyin English meaning hoanhỏ 花兒 (huār) 'flower' mainày 明兒 (mínr) 'tomorrow' họcgiả 學者 (xuézhě) 'scholar' tácgiả 作者 (zuòzhě) 'author' vôlễ 無禮 (wúlǐ) 'impolite' vôhiệu 無效 (wúxiào) 'ineffective' phithường 非常 (fēicháng) 'extraordinary' phichínhnghĩa 非正義 (fēizhèngyì) 'injustice' casĩ 歌手 (gēshǒu) 'singer' (手 shǒu ~ sĩ 士 shì) hoạsĩ 畫家 (huàjiā) 'painter' (家 jiā ~ sĩ 士 shì) nhàthơ 詩人 (shīrén) 'poet' (人 rén ~ nhà 家 jiā) This table highlights how:
- Both languages use fixed lexical items instead of inflectional affixes.
- Morphemic syllables act like affixes, producing polysyllabic compounds with parallel semantic structures.
-
Vietnamese systematically integrated Chinese
morphemic sets into its own lexicon.
-
Syllabic structure:
The basic lexical building block in Vietnamese follows the pattern [initial + middle + final], most often CVC (consonant + vowel + consonant). Vietnamese, like Chinese, favors consonant‑initial syllables (with relatively few vowel‑initial words). These syllables share simple consonants without clusters, such as /k/, /c/, /t/, /ʈ/, /n/, /ŋ/, /ɲ/, etc. Medial glides like ‑w‑ and ‑j‑ are also common. For example
- xoang [swaːŋ˧˧] vs. 腔 (qiāng, MC /hɑŋ⁵⁵/)
- hương [hɨəŋ˧˧] vs. 香 (xiāng, MC /hœːŋ⁵⁵/)
Sinitic‑Vietnamese vocabulary remains especially close to Middle Chinese, particularly in finals that evolved from Old Chinese endings such as /‑wng/ and /‑wk/.
Syllabic correspondences
| Viet. | IPA | MC | Mandarin | Chinese | Meaning |
|---|---|---|---|---|---|
| thống | [tʰəwŋ˧˥] | /thowng5/ | tòng /tʊŋ⁵⁵/ | 痛 | 'pain' |
| đông | [ɗəwŋ˧˧] | /downg1/ | dōng /tong1/ | 東 | 'east' |
| cốc | [kəwk˧˥] | /kowk7/ | gǔ /kʊk̚⁵/ | 榖 | 'cereal' |
| tốc | [təwk˧˥] | /towk7/ | sù /su4/ | 速 | 'fast' |
| quốc | [kwək̚˧˥] | /kwok/ | guó /kuɔ³⁵/ | 國 | 'nation' |
| mục | [muk̚˧˥] | /muwk/ | mù /mu⁵¹/ | 目 | 'eye' |
| bạch | [ɓaɪk̚˧˥] | /baek/ | bái /pai³⁵/ | 白 | 'white' |
| lực | [lɨk̚˧˥] | /liwk/ | lì /li⁵¹/ | 力 | 'strength' |
| học | [hɔk̚˧˥] | /haewk/ | xué /ɕyɛ³⁵/ | 學 | 'study' |
| quách | [kwaːk̚˧˥] | /kwæk/ | guō /kuɔ⁵⁵/ | 郭 | 'outer wall' |
This expanded table illustrates:
- The CVC pattern is dominant in both Vietnamese and Middle Chinese.
- Final consonants like ‑ng, ‑k, ‑c, ‑ch are preserved in Vietnamese, reflecting Middle Chinese phonology.
- Mandarin often simplifies or shifts these finals, but the historical link remains visible.
Table 7 - Finals across Vietnamese, Middle Chinese, Mandarin, Cantonese, and Minnan
| Endings | Vietnamese | MC | Mandarin | Cantonese | Hokkien | Chin. | Meaning |
|---|---|---|---|---|---|---|---|
| ‑ng | thống [tʰəwŋ˧˥], đông [ɗəwŋ˧˧] | /‑wng/ | tòng 痛 /tʊŋ⁵⁵/, dōng 東 /toŋ⁵⁵/ | tung3 痛, dung1 東 |
thàng 痛, tang 東 | 痛, 東 | 'pain', 'east' |
| ‑k |
cốc [kəwk̚˧˥], tốc [təwk̚˧˥], quốc [kwək̚˧˥] |
/‑wk/, /‑ok/ |
gǔ 榖 /ku³⁵/, sù 速 /su⁵¹/, guó 國 /kuɔ³⁵/ |
guk1 榖, cuk1 速, gwok3 國 |
kok 榖, chok 速, kok 國 | 榖, 速, 國 | 'cereal', 'fast', 'nation' |
| ‑t |
mật [mət̚˧˥], quật [kwət̚˧˥] |
/‑t/ | mì 蜜 /mi⁵¹/, qū 屈 /tɕʰy⁵⁵/ |
mat6 蜜, wat1 屈 |
bat 蜜, oat 屈 |
蜜, 屈 | 'honey', 'bend' |
| ‑p |
hợp [həp̚˧˥], thập [tʰəp̚˧˥] |
/‑p/ | hé 合 /xɤ³⁵/, shí 十 /ʂɨ³⁵/ |
hap6 合, sap6 十 |
hap 合, chap 十 | 合, 十 | 'combine', 'ten' |
| ‑m | tâm [təm˧˥], nam [nam˧˥] |
/‑m/ | xīn 心 /ɕin⁵⁵/, nán 南 /nan³⁵/ |
sam1 心, naam4 南 |
sim 心, lam 南 |
心, 南 | 'heart', 'south' |
| ‑n |
ân [ən˧˥], sơn [səːn˧˥] |
/‑n/ | ēn 恩 /ən⁵⁵/, shān 山 /ʂan⁵⁵/ |
jan1 恩, saan1 山 |
in 恩, soaⁿ 山 |
恩, 山 | 'grace', 'mountain' |
- Vietnamese and Cantonese both preserve final stops (‑p, ‑t, ‑k) and nasals (‑m, ‑n, ‑ng), making them closer to Middle Chinese than Mandarin.
- Hokkien also retains these finals, showing strong parallels with Vietnamese.
- Mandarin has lost most final stops, simplifying them into open vowels or tonal changes.
- This preservation explains why Vietnamese and southern Chinese dialects (Cantonese, Hokkien) are especially valuable for reconstructing Tang‑era rhyme schemes and tonal systems.
-
Basic vocabulary stock
This prominent commonality is undeniable in all lexical aspects of each and every word for their shared etyma in basic vocabulary stock.
Core vocabulary stock across Vietnamese, Chinese, and reconstruction systems
Viet. Chinese Pinyin Pulleyblank
MCBaxter–Sagart
MCZhengzhang
MCBaxter–Sagart
*OCMeaning nạ 娘 (niáng) nɨaŋ nɨaŋ njaŋ nraŋ 'mother' tía 爹 (diè) tja tja tja *jaʔ 'dad' bố 父 (fù) bjuH bjuH bjuH pəʔ-s 'father' xơi 食 (shí) ʑiək d͡ʑiɪk ɕiək l̥ek 'eat (meal)' ăn 唵 (ăn) ʔomX ʔomX ʔomX ʔˤəmʔ 'eat' ngủ 臥 (wò) ŋwaH ŋwaH ŋwaH ŋˤwaʔ 'sleep' xem 瞧 (qiáo) dzew dzew dzew dzew 'look' mắt 目 (mù) mjuk mjuwk mjuk mˤuk 'eye' đầu 頭 (tóu) duw duw duw lˤu 'head' ngực 臆 (yì) ʔik ʔik ʔik ʔik 'chest' phổi 肺 (fèi) piajH pɨajH piajH pˤi[t]-s 'lung' gạo 稻 (dào) dawH dawH dawH lˤuʔ-s 'rice' cá 魚 (yú) ŋjo ŋjo ŋjo ŋa 'fish' lửa 火 (huǒ) xwaX xwaX xwaX qʰˤajʔ 'fire' lá 葉 (yè) jep jep jep lˤap 'leaf' nhà 家 (jiā) ka kra kra kra 'home' lợn 豚 (tún) dwin dwon dwin lˤu[n] 'pig' trồng 種 (zhòng) tsyuwngX t͡ɕjuwngX tsyuwŋX toŋʔ 'cultivate' săn 田 (tián) den den den l̥ˤiŋ 'hunt' Key insights
-
Vietnamese reflexes often preserve final consonants (‑p, ‑t, ‑k, ‑m, ‑n, ‑ng) that are visible in Middle Chinese and traceable back to Old Chinese.
-
Pulleyblank, Baxter–Sagart, and Zhengzhang differ in notation, but all show the same underlying structure.
-
Old Chinese reconstructions reveal deeper roots: for example, bố (父) traces back to OC pəʔ-s, while mắt (目) reflects OC mˤuk.
-
This demonstrates how Vietnamese, through its Sino‑Vietnamese layer, preserves archaic phonological features that connect directly to Old Chinese.
-
Across all three systems, the finals (‑p, ‑t, ‑k, ‑m, ‑n, ‑ng) are consistently preserved, which explains why Vietnamese reflexes remain so close to Middle Chinese.
-
-
Shares of dialectal origin
Many everyday Vietnamese words and expressions show clear correspondences with southern Chinese dialects (Cantonese, Hainanese, Fukienese/Minnan), reflecting centuries of contact and borrowing.
- Mandarin colloquialisms contributed forms like đúngrồi, đượcrồi, ngàymai.
- Southern dialects (Hainanese, Hokkien, Cantonese) provided many everyday terms: gàcồ, gàmái, mắtkiếng, soài, con, chạy, uống.
- These borrowings highlight the layered dialectal influence on Vietnamese, beyond the classical Sino‑Vietnamese stratum tied to Middle Chinese.
-
Disyllabicity:
Vietnamese vocabulary is overwhelmingly disyllabic, paralleling Chinese compounds. These forms are frequent in daily usage and often show either direct equivalence or creative semantic composition.
Common disyllabic vocabulary
Vietnamese Chinese Pinyin English meaning siêngnăng 勤勉 qínmiǎn 'industrious' làmsiêng 勤勞 qínláo 'hardworking' nonsông 江山 jiāngshān 'country' (lit. 'river + mountain') ánhmắt 目光 mùguāng 'the look' ánhnắng 陽光 yángguāng 'sunlight' giàucó 富有 fùyǒu 'wealthy' Peculiar Semantic Compositions
Viet. Chinese Pinyin Baxter–Sagart
MCLiteral meaning Semantic sense bàntay 手板 shǒubǎn /*ʑuwX panX/ 'panel of the palm' 'hand' cổchân 腳脖子 jiǎobózi /*kjak pwot/ 'neck of the foot' 'ankle' khuônmặt 面孔 miànkǒng /*menH khuwngX/ 'frame of a face' 'face' dướiquê 鄉下 xiāngxià /*qʰjaŋ hæH/ '(down there in the) countryside' 'countryside' đoáhoa 花朵 huāduǒ /*xwa taX/ '(a stem of) flower' 'flower'
Key observations- High frequency of disyllables: Vietnamese, like Chinese, relies heavily on two‑syllable compounds.
- Semantic creativity: Many compounds are built from vivid metaphors (cổchân = 'neck of the foot').
- Reverse morpheme order: Some Vietnamese disyllables invert the order compared to Chinese, reflecting local syntactic patterns when they form the disyllabic words which like happened during the Tang's Dynasty.
-
Morphemic syllable, a building unit to coin new words:
Morphemic syllable compounds across Vietnamese, Chinese, and reconstruction systems
| Vietnamese | Chinese | Pinyin / Dialect | Dialectal Source | English meaning |
|---|---|---|---|---|
|
đúngrồi |
中了 |
zhòngle |
Mandarin |
'correct' |
| đượcrồi | 得了 | déle | Mandarin | 'that’s okay!' |
| luônluôn | 老老 / 牢牢 | láoláo | Mandarin | 'always' |
| ngàymai | 明兒 | mínr | Mandarin (colloquial) | 'tomorrow' |
| nóichuyện | 聊天 | liáotan | Mandarin | 'talk' |
| ngầu | 牛 | níu | Mandarin (slang) | 'hefty, cool' |
| đánhcá | 打魚 | dǎyú | Mandarin | 'net fishing' |
| gàcồ / gàtrống | 雞公 | jīgōng | Hainanese, Fukienese | 'rooster' |
| gàmái | 雞母 | jīmǔ | Hainanese, Fukienese | 'hen' |
| mắtkiếng | 目鏡 | mùjìng | Hainanese | 'eye‑glasses' |
| biết | – | /bat1/ | Hainanese, Fukienese | 'know' |
| soài | 檨 | shē (Fukienese /suã/) | Fukienese (Minnan) | 'mango' |
| con | 囝 | /kiaŋ/, /kiã/, /kẽ/ | Fukien (Fuzhou) | 'son, child' |
| chạy | 走 | /zau2/ | Cantonese | 'run' |
| xơi | 食 | shí /ʂʐ̩³⁵/ | Mandarin / Cantonese | 'eat (meal)' |
| uống | 飲 | /jam3/ | Cantonese | 'drink' |
Key observations
| Viet. | Chinese | Pinyin | Pulleyblank MC |
Baxter-Sagart MC |
Zhengzhang MC |
Baxter-Sagart *OC |
Meaning |
|---|---|---|---|---|---|---|---|
| bồihồi | 徘徊 | páihuái | baj hwaj | bɨaj hwaj | bɨaj hwaj | bˤəj ɡʷˤəj | 'melancholy; hesitation' |
| yêuđương | 愛戴 | àidài | ʔaj tajH | ʔajH tajH | ʔaj tajH | qˤəʔ-s lˤəks | 'love' |
| khổsở | 苦楚 | kǔchǔ | khuX tshoX | kʰuX t͡ʂʰoX | kʰuX t͡ʂʰoX | kʰˤaʔ tʰraʔ | 'hardship' |
| mắcbệnh | 犯病 | fànbìng | bjiamH biajŋH | bjomH biajŋH | bjomH biajŋH | bom-s breŋ-s | 'to be sick' |
| bắtcóc | 綁架 | bǎngjià | paŋX kaeH | paŋX kraeH | paŋX kraeH | pˤaŋʔ kraʔ-s | 'kidnap' |
| cẩuthả | 苟且 | gǒuqiě | kuwX tshjaX | kuwX t͡ɕʰjaX | kuwX t͡ɕʰjaX | kˤoʔ tʰjaʔ | 'sloppy; careless' |
Key observations
- Pulleyblank: Compact, practical transcription.
- Baxter-Sagart MC: Detailed, with tone categories (X, H) and vowel quality.
- Zhengzhang: Similar to Baxter–Sagart but with slightly different phonetic assumptions.
- Old Chinese (Baxter-Sagart OC): Pushes the etyma further back, often reconstructing laryngeals, uvulars, and clusters that explain later MC reflexes.
- Vietnamese forms often mirror MC phonology, but their semantic structures are preserved from Chinese compounds.
- Syllabic parallel compounds (in synonymous / antonymous / reduplicative forms)
- Monosyllabic overload: Both languages face heavy homonymy due to limited syllable inventories.
- Compounding as resolution: Disyllabic compounds reduce ambiguity and enrich expression.
- Semantic strategies: Compounds may be built from synonyms, antonyms, reduplication, or parallel morphemes.
- Vietnamese-Chinese parallels: The structural similarity reflects deep historical borrowing and adaptation, especially during the Tang period.
-
Similarities in colloquial, and idiomatic expressions:
Both languages, by their intrinsic nature, exhibit distinctive attributes across many dimensions – particularly in dialectal variation and colloquial usage. These parallels are evident in expressions that surface in everyday speech, such as:
- tạitôi (賴我 làiwǒ, 'because of me')
- vìsao (為啥 wèishă, 'how come')
- làmviệc (幹活 gànhuó, 'work')
- chồmhổm (犬坐 quǎnzuò, 'squat')
- răngkhểnh (犬牙 quǎnyá, 'canine')
- saocứ (總是 zǒngshì, 'how come')
- tấtcả (大家 dàjiā, 'everybody')
- mauchóng (馬上 măshàng, 'immediately')
- ítra (起碼 qǐmǎ, 'at least')
- trờinắng (太陽 tàiyáng, 'sunshine')
- đâunào (那裡 nàlǐ, 'where')
-
đểý (在意 zàiyì, 'to mind')Tênnàythậttếu. (這個人挺逗 zhègèréntǐngdòu, 'this person is really
funny')
- uốngnướcnhớnguồn (飲水思源 yǐnshuǐsīyuán, 'drink water and remember its source')
- lárụngvềcội (葉落歸根 yèluòguīgēn, 'a fallen leaf returns to its root')
- ếchngồiđáygiếng (井蛙之見 jǐngwòzhījiàn, 'a frog’s view from the bottom of a well')
- sưtửHàđông (河東獅子 Hédōngshīzǐ, 'tiger wife')
Classifiers and their function as pronouns:
Grammatical and functional classifiers in both Vietnamese and Chinese serve to specify objects, facts, or instances. Typically positioned before nouns, they may also function independently as pronouns. Their usage is virtually parallel in both languages, exemplified by:
- cái (個 gè, 'a unit of')
- chiếc (隻 zhī, 'a piece of')
- đôi (對 duì, 'a pair of')
- con (子 zǐ, 'a head of')
- cuốn (卷 juān, 'a roll of')
- bó (把 bă, 'a bunch of')
- chìa (匙 chí, 'a stick of')
- trang (張 zhāng, 'a sheet of')
- trận (陣 zhèn, 'an instance of')
- cục (塊 kuài, 'a lump of')
- miếng (片 piàn, 'a slice of')
- cơn (場 chăng, 'a round of')
- chuyện (件 jiàn, 'a matter of')
- ván (盤 pán, 'a game of')
- cuộc (局 jú, 'a round of')
- bữa (飯 fàn, 'a meal')
Semantically and syntactically, these classifiers often pair statically with specific lexical items, forming tightly bound units that convey precise categorical meaning. Each classifier anchors a distinct semantic realm – whether quantifying animate beings, abstract events, or physical objects.
Moreover, phonosemantic patterns in Vietnamese suggest that initial consonants such as /b-/, /f-/, /ph-/, and their derivatives /x-/, /gi-/, /z-/ often evoke imagery of gliding, swelling, or airy movement. This is reflected in expressions like :- phậpphồng (彭彭 péngpéng, 'erratic heartbeat')
- bềnhbồng (泛泛 fànfàn, 'floating and drifting')
- phấtphới (飄飄 piāopiāo, 'wavering')
- phầnphật (翩翩 piānpiā, 'flutter')
Both Vietnamese and Chinese contain a large number of monosyllabic words, many of which are homonyms due to the limited set of possible syllable structures {(C) + V + (C)}. In Chinese alone, nearly 80,000 characters have accumulated within this phonetic framework.
To reduce ambiguity, both languages developed a strong tendency toward compounding. This involves combining two monosyllables – often synonyms, antonyms, or semantically related morphemes – into disyllabic words. Vietnamese also makes extensive use of reduplication and parallel morphemic compounds, closely mirroring Chinese strategies.
Synonymous / Antonymous compounds
| Vietnamese | Chinese | Pinyin | Literal meaning | Semantic sense |
|---|---|---|---|---|
| đấtđai | 土地 | tǔdì | 'soil + land' | land |
| thươngyêu | 疼愛 | téngài | 'affection + love' | love |
| buồnrầu | 愁悶 | chóumèn | 'sad + sorrowful' | sorrow |
| chịuđựng | 承受 | chéngshòu | 'take + accept' | endure |
| tìmkiếm | 尋找 | xúnzǎo | 'seek + search' | search |
| chimchóc | 禽雀 | qínquè | 'fowls + birds' | birds |
| caothấp | 高低 | gāodī | 'high + low' | contrast in height |
| trêndưới | 上下 | shàngxià | 'above + below' | positional relation |
Reduplicative disyllabics
| Vietnamese | Chinese | Pinyin | English meaning |
|---|---|---|---|
| liênmiên | 連綿 | liánmiǎn | continuous |
| mongmanh | 渺茫 | miǎománg | slim, faint |
| lôithôi | 囉嗦 | luōsuō | verbose |
| dễdàng | 容易 | róngyì | easily |
| lòngthòng | 籠統 | lóngtǒng | long-winded, loose |
Morphemic Parallel Compounds
| Vietnamese | Chinese | Pinyin | Literal meaning | Semantic sense |
|---|---|---|---|---|
| cayđắng | 辛苦 | xīnkǔ | 'spicy hot + bitter' | hardship, suffering |
| lạigần | 離近 | líjìn | 'far + near' | get closer |
| dìghẻ | 姨姨 | yíyí | 'aunt + aunt' | stepmother |
Key Observations
These examples illustrate not only lexical convergence but also shared cognitive metaphors embedded in sound symbolism across both languages.
These examples illustrate not only lexical convergence but also shared cognitive metaphors embedded in sound symbolism across both languages.
-
Particles:
Grammatical particles are typically appended to the end of a sentence to convey directionality, emotional tone, or the speaker’s attitude toward a given state of affairs. These particles function similarly in both Vietnamese and Chinese, often serving pragmatic or modal roles. Examples include:
- đây as in Lênđây! (上來 Shànglái, 'Come up here!')
- đi as in Vềđi. (回去 Huíqù, 'Go home.')
- ơi as in Trờiơi! (天啊 Tiānna, 'My Lord!')
- nè as in Tôilàynè. (是 我 呢 Shì wǒ ne, 'It's me.')
- nha, nhé as in Tôi ăn nha. (我吃啦 Wǒ chī lā, 'I eat now.')
-
rồi as
in Chạykhôngnổinữa rồi! (走不了了呢! Zǒu bù liăo le
ne!, 'I cannot walk anymore!'
These particles, though often monosyllabic, carry nuanced semantic weight and are integral to the expressive rhythm of both languages. Their placement and usage reflect shared syntactic tendencies and pragmatic functions across the Sinitic-Vietnamese continuum.
-
Prepositions and Conjunctions
Virtually all Vietnamese prepositions and conjunctions trace their origin to Chinese functional words, known as 虛辭 xūcí (hưtừ). Their semantic roles and syntactic behavior are nearly identical in both languages, forming a shared grammatical substrate. Examples include:- và (和 hé, 'and')
- với (與 yú, 'with')
- từ (自 zì, 'from')
- nếu (若 ruò, 'if')
- vì (為 wèi, 'because')
- nhưngmà (然而 rán’ěr, 'but')
- vìthế (於是 yúshì, 'therefore')
- dođó (所以 suǒyǐ, 'hence')
- dùrằng (雖然 suīrán, 'although')
- dovì (bởivì) (由於 yóuyú, 'due to')
These functional elements not only mirror each other in form and meaning but also reflect a deep historical convergence in syntactic logic. Their consistent pairing across both languages underscores the embeddedness of Chinese grammatical architecture within Vietnamese discourse. -
Grammatical markers:
Grammatical markers in Chinese are lexical units that fulfill syntactic functions by framing, fossilizing, or abstracting fixed expressions into stative or nominalized forms. Many of these have evolved into stand-alone words denoting conditions, circumstances, or abstract states of affairs. Their origins lie in classical Chinese (文言文 wényánwén), and they remained in active usage across both languages well into the early 20th century.
Over time, these markers came to represent syntactic units that encode grammatical abstraction, often stative, circumstantial, or nominal in nature. Most are vestiges of classical constructions, preserved and recontextualized in Vietnamese usage. Examples include:
- sựchuẩnbị (有所準備 yǒusuǒzhǔnbèi, 'a state of being prepared')
- cáigọilà (所謂 suǒwéi, 'the so-called')
- cáitôicó (~>củatôi) (我所有 wǒsuǒyǒu, '(of) mine')
- cáiviệcnólàm (他所作所爲 tāsuǒzuòsuǒwéi, 'what he has done')
- cáikhác (其他 qítā, 'other')
- ởtrong (其中 qízhōng, 'among')
- đằngnầy (~> đằngấy) (我等 wǒděng, 'we all' > 'thou')
- chúngmình (~> chúngta) (咱們 zánměn, 'we all')
These markers reflect a shared grammatical architecture rooted in classical syntax and fossilized constructions. Vietnamese equivalents retain identical functional roles, often mirroring Chinese word order and semantic scope.
Syntactic alignment also extends to pronoun formation and grammatical sequencing, for instance:
In essence, virtually every grammatical feature found in Chinese can be identified in Vietnamese and its derivatives – and vice versa. This includes not only lexical items and particles but also structural conventions and word order, underscoring the deep historical entanglement between the two linguistic systems.
-
Analytical convergence of intimate linguistic features
Analytically, these linguistic features represent a uniquely intimate and culturally embedded stratum, peculiar to languages of close affiliation that share internal grammatical traits, phonosemantic patterns, and historical depth. Beyond structural parallels, cultural and emotional nuances are delicately encoded within lexical items and expressions.
For example:
- mẹruột (親媽 qīnmā, 'natural mother')
- charuột (親爹 qīndiē, 'natural father')
- mẹghẻ (繼母 jìmǔ, 'stepmother')
- chaghẻ (繼爹 jìdiē, 'stepfather')
These kinship terms reflect not only semantic equivalence but also shared cultural sentiment and familial hierarchy embedded in both languages.
Furthermore, expressive pragmatics – especially in profanity or emotional outbursts – reveal striking phonetic and semantic convergence. For instance, when a Chinese speaker exclaims Tāmà! (他媽, 'his mother'), the utterance closely mirrors the Vietnamese Đụmá, both in phonetic contour and emotional force. This parallel is reinforced by:
-
Cantonese 屌 diu (/tjew3/) ≈ Vietnamese đéo or đụ (Semantic equivalent: English "Fuck you!"
Such examples illustrate not only lexical overlap but also a shared expressive logic rooted in colloquial speech and emotional immediacy. The convergence of form, function, and sentiment across these expressions underscores the deep interconnectivity between Vietnamese and Chinese, linguistically, culturally, and historically.
With minimal premeditation and moderate effort, one can often render complete sentences from Chinese into Vietnamese on a near word-for-word basis, wherein each lexical item mirrors its counterpart with striking textual connotation. Clauses and phrases are likewise constructed with parallel syntactic architecture and rhetorical texture, allowing for seamless transposition between the two languages.
This phenomenon is especially evident in the translation of Chinese classics and martial arts novels, such as Romance of the Three Kingdoms and Water Margin by the Ming-era author Shi Nai’an (Thị Nại-Am), or Romance of an Archer by Hong Kong’s Jin Yong (Kim Dung), and the works of Gu Long (Cổ Long). Since the early 20th century, these texts have been translated into Vietnamese using a method that preserves their original structure and diction. For native-born speakers, the archaic style poses no barrier to comprehension; rather, it evokes a literary elegance akin to the stylized English of Hamlet or other Shakespearean works.
Modern Vietnamese readers continue to engage with these Chinese classics effortlessly, generation after generation, despite the dense layering of Chinese semantic, syntactic, and lexical features embedded in their Sino-Vietnamese transliterations. These texts, while richly evocative for native audiences, remain challenging for foreign learners of Vietnamese, who must exert considerable effort to master the linguistic intricacies and historical registers encoded within.
In short, if a Vietnamese speaker recognizes approximately 3,000 individual Chinese characters, they can read virtually any classical or modern Chinese literary work with remarkable ease – an ability unmatched by speakers of other languages without considerable effort. This is due to the fact that each character (字) in Chinese corresponds to numerous disyllabic words in Vietnamese, many of which are already embedded in the speaker’s core vocabulary by the time they complete middle school education.
For example, a Westerner learning modern Chinese must treat the following as six distinct lexical items:
- guó 國 ('state')
- jiā 家 ('home')
- guójiā 國家 ('nation')
- fù 婦 ('wife')
- nǚ 女 ('female')
- fùnǚ 婦女 ('woman')
Whereas a Vietnamese reader familiar with quốc, gia, quốcgia, phụ, nữ, phụnữ already internalizes these as cohesive units through Sino-Vietnamese vocabulary.
When comparing Vietnamese to Chinese dialects such as Mandarin, Yue (粵), and Minnan (閩南), the linguistic divergence resembles the difference between Vietnamese and Mandarin itself. Despite Vietnam’s political independence since the 10th century, the language absorbed extensive Sinicized elements over two millennia of cultural and administrative contact.
The early Annamese vernacular, spoken by the indigenous masses including Mường groups and local wives of Chinese soldiers, gradually evolved from a Yue-based substratum. Dialects such as Cantonese, Fukienese, and Teochow likely originated from the same aboriginal root, with successive Sinitic layers accumulating over time, namely, Old Chinese, Ancient Chinese, Middle Chinese, and Mandarin. These layers were reinforced by the court language brought by mandarins, embedding Sinitic structures into Vietnamese society for over 2,000 years.
Had Vietnam not secured its sovereignty in the 10th century, its linguistic fate might have mirrored that of Cantonese or Fukienese, languages still considered Chinese dialects by national linguistic institutions.
Even today, Vietnamese retains lexical items on par with those found in both Mandarin and Cantonese. However, semantic discrepancies between Mandarin and Cantonese often reveal subtle shifts. Consider the following comparative examples.
Table 8 - Lexical parallels and semantic drift
| Concept | Cantonese | Mandarin | Sino-Vietnamese | Sinitic-Vietnamese |
|---|---|---|---|---|
| where | /pin5dou2/ | 那裏 nàlǐ | nalí | nơiđâu, nơiấy, nơiđó |
| sleep | /fajng1kao1/ | 睡覺 shuìjiào | thuỵgiác | giấcngủ, đingủ |
| eat | /sək8/ | 吃 chī / 食 shí | ngật / thực | xơi, ăn |
| drink | /jam3/ | 喝 hè / 飲 yǐn | hát / ẩm | uống, hớp, húp |
| urinate | /o5niew2/ | 尿 niào | niếu | tiểu, đái, điđái |
| tired | /kwuj2/ | 累 lèi | luỵ | mỏi |
| see | /tʌj3/ | 見 jiàn | kiến | thấy |
| descend | /lɔt8/ | 下 xià | hạ | xuống |
| take | /lɔ3/ | 拿 ná | nã | lấy |
| go | /hoj1/ | 去 qù | khứ | đi |
| run | /zau2/ | 走 zǒu / 跑 pǎo | tẩu / bào | chạy |
These examples illustrate not only phonological and semantic alignment but also the shared etymological depth across dialects and Vietnamese. In some cases, Vietnamese forms diverge from Mandarin yet remain closer to Yue or MinNan pronunciations, suggesting a layered borrowing process shaped by regional contact and dynastical influence.
Between Vietnamese and Chinese, grammatical discrepancies are minimal, chiefly limited to syntactic word order. Vietnamese typically follows a reversed structure, such as {noun + adjective}, whereas Chinese prefers {adjective + noun}. Despite this inversion, the deeper convergence lies not in syntax but in etymology, where the lexical roots of Vietnamese and Chinese reveal profound historical entanglement, far deeper than proponents of Austroasiatic Mon-Khmer theory might concede.
Etymologically, the following examples demonstrate that it is possible to formulate rules of derivation from a core word-concept, generating plausible cognates across both Chinese and Sinitic-Vietnamese domains. Though restrictive in scope, the principle holds: if a majority of etyma within a semantic category exhibit phonological and contextual alignment, they likely share a common origin, even when phonetic discrepancies suggest otherwise.
The generalization is as follows: if lexical items carry consistent etymological traits, exhibit parallel phonological peculiarities, and encode similar contextual connotations, they may plausibly derive from the same root, most often as loanwords from Chinese.
The illustrated list below will further demonstrate how lexical transformation occurs through two complementary methodologies:
- Semantic analogy approach – identifying conceptual parallels across languages
- Disyllabic sound change approach – tracing phonological shifts across cognate forms
Together, these frameworks enable the identification of candidate patterns for sound shifts and semantic evolution, topics to be explored in greater detail as the analysis progresses.
- 'đầu' (head) 頭 tóu – 'sọ' (cranium) 首 shǒu,
- 'mặt' (face) 面 – 'mày' (eyebrow) 眉 méi,
- 'mắt' (eye) 目 mù – 'mũi' (nose) 鼻 bì,
- 'gan' (liver) 肝 gān – 'ruột' (intestines) 腸 chăng,
- 'sống' (live) 生 shēng – 'chết' (die) 死 sǐ,
- 'ăn' (eat) 唵 yān – 'uống' (drink) 飲 yǐn,
- 'khóc' (weep) 哭 kù – 'cười' (laugh) 笑 xiào,
- 'đi' (walk) 去 qù – 'đứng' (stand) 站 zhàn,
- 'chạy' (run) 走 zǒu – 'nhảy' (jump) 跳 tiāo,
- 'nặng' (heavy) 重 zhòng – 'nhẹ' (light) 輕 qīng,
- 'cao' (high) 高 gāo – 'thấp' (low) 底 dì,
- 'dài' (long) 長 cháng – 'ngắn' (short) 短 duăn,
- 'lạnh' (cold) 冷 lěng – 'nóng' (hot) 燙 tàng,
- 'hay' (good) 好 hǎo – 'dỡ' (bad) 亞 yà,
- 'buồn' (sad) 悶 mèn – 'vui' (happy) 快 kuài,
- 'gần' (near) 近 jìn – 'xa' (far) 遐 xiá,
- 'trước' (before) 前 qián – 'sau' (after) 後 hòu,
- 'cũ' (old) 舊 jìu – 'mới' (new) 萌 méng,
- 'đắng' (bitter) 辛 xīn – 'cay' (spicy hot) 苦 kǔ (SV khổ', 'bitter') [the meaning switches here.] (6), etc.
The postulation above is grounded in rational methodology, aiming to relate Sino-Vietnamese and Sinitic-Vietnamese lexicons through a parallel disyllabic approach. This framework enables the identification, analysis, and extraction of monosyllabic base forms from synonymous or compound Vietnamese expressions, such as chàilưới, xecộ, cậumợ, chúbác, and others. By applying this approach, we uncover reliable traces of sound change and semantic shift from Chinese into Vietnamese, even when such correspondences appear unconventional. These patterns have long been posited by Vietnamese linguistic specialists using traditional comparative methods.
For example, the following associations illustrate semantic analogy and phonological convergence:
- voi ~ 為 wēi ('elephant')
- lúa ~ 來 lái ('unhusked rice grain')
- gạo ~ 稻 dào ('rice') [cf. lúa, per Starostin]
- nắng <~ trờinắng ~ 太陽 tàiyáng ('sunshine')
For the same matter, indigenous lexicons embedded in the Chinese and Vietnamese zodiac systems reveal deep etymological ties. Each animal name in the Vietnamese cycle corresponds to a Chinese character and phonosemantic root:
| Vietnamese zodiac |
Chinese zodiac | Sino-Vietnamese | Sinitic-Vietnamese | English meaning/Notes |
|---|---|---|---|---|
| tý | 子 zǐ | tý | chuột | 'rat' / cf. 鼠 shǔ (SV thuộc) |
| sửu | 丑 chǒu | sửu | trâu | 'ox' /cf. 牛 níu (SV ngưu, VS 'trâu') |
| dần | 寅 yǐn | dần | cọp | cf. 虎 hǔ (SV hổ), also hùm, 甝 hán |
| mẹo | 卯 mǎo (replaced by 兔 tù - SV 'thố') | mão | mèo | cf. 貓 māo (SV miêu), no Vietnamese ever calls 兔 tù (SV 'thố'; or 'thỏ') |
| thìn | 辰 chén | thìn | rồng | cf. 龍 lóng (SV long) |
| tỵ | 巳 sì | tỵ | rắn | cf. 蛇 shé (SV xà), 巳 as snake pictograph |
| ngọ | 午 wǔ | ngọ | ngựa | perfect cognate |
| mùi | 未 wèi | vị | dê | cf. 羊 yáng (SV dương), phonetic shift /j-/ ~ /d-/ |
| thân | 申 shēn | thân | khỉ | cf. 猴 hóu (SV hầu), 猢 hú (SV hồ), 猻 sūn (SV tôn), 猿 yuán (SV viện, VS vượn) |
| dậu | 酉 yǒu | dậu | gà | cf. 雞 jī (SV kê) |
| tuất | 戌 xù | tuất | chó | cf. 狗 gǒu (SV cẩu), 犬 quán (VS cún) |
| hợi | 亥 hài | hợi | heo | cf. 腞 tùn, also lợn (Northern dialect) |
These correspondences affirm that mẹo 卯 (mǎo) must align with mèo ('cat'), not thỏ 兔 tù ('hare'). The origin of the twelve zodiac animals, in fact, likely stems from Southern Yue traditions, later adopted and codified by proto-Chinese civilizations.
This etymological framework, combining semantic analogy and disyllabic sound change, offers a robust methodology for tracing lexical evolution and identifying cognate patterns across Sinitic and Vietnamese strata.
From a series of solid etyma within a shared semantic category, it becomes methodologically plausible to induce parallel derivations for other Vietnamese words. Consider the case of Tết, which can be etymologically traced to 節 jié (SV tiết), as in:
- tiếtxuân (春節 Chūnjié, 'Spring Festival')
- ănTết (過節 guòjié, 'celebrate the Spring Festival')
- TếtNguyênđán (元旦節 Yuándànjié, 'New Year Festival')
- TếtÐoanngọ (端午節 Duānwǔjié, 'Late Spring Festival')
- TếtTrungthu (中秋節 Zhōngqiūjié, 'Mid-Autumn Festival')
This analysis supports the identification of Tết as a direct cognate of 節 jié, with ănTết emerging from 過節 guòjié (SV quátiết). While 過 guò in Mandarin is /kwo4/ ('to pass'), SV quá /wa5/ semantically aligns with ăn. In this context, ăn /ɐn/ evolves beyond its literal meaning of 'to eat' and functions as a prefix denoting 'celebration', 'participation', or 'engagement'.
Thus, 過 guò is reinterpreted as ăn, not merely through phonetic resemblance but via semantic elevation, ănTết becomes a conceptual equivalent of 'feasting', 'rejoicing', or 'partaking in festivities'. This transformation exemplifies what the author terms the principle of sandhi process of association, wherein a Chinese disyllabic compound converges upon a Vietnamese prefix, forming new lexical constructions:
* 過節 guòjié → ănTết \ M guò /wa5/ → ăn → ăn- / [ xz / x > y > Ø- ]
Once ăn- assumes prefixal status, it becomes a productive morpheme capable of generating extended meanings such as 'to take in', 'to engage in', or 'to undergo'. This semantic expansion is evident in compounds like:
- ăntấtniên (過小年 guòxiăonián, 'feast before New Year')
- ăntiệc (宴席 yànxí, 'banquet')
- ăncưới (酒席 jǐuxí, 'wedding feast')
- ănmừng (慶祝 qìngzhù, 'celebrate') [@ 祝 zhù ~ 食 shí, 吃 chī]
and further:
- ănmặc (衣食 yīshí, 'lifestyle')
- ănuống (飲食 yǐnshí, SV ẩmthực, 'diet')
- ănngon (吃香 chīxiāng, 'enjoy delicious food')
- ănnói (言語 yányǔ, 'manner of speech')
- ănhiếp (威脅 wēixiè, 'bully')
- ăntiền (贏錢 yíngqián, 'win money') / (要錢 yàoqián, 'extort money')
- ănmày (要飯 yàofàn, 'beg')
- ănhàng (吃貨 chīhuò, 'glutton / smuggler')
- ănđòn (挨打 áidă, 'get beaten')
- ăncắp (竊案 qiè’àn, 'steal')
- ănbám (白吃 báichī, 'live off others')
- ănnhậu (應酬 yìngchóu, 'social drinking')
- ănthua (輸贏 shūyíng, 'compete / gamble')
As a suffix, ăn continues to evolve:
- làmăn (生意 shēngyì, 'do business')
- đồăn, thứcăn (食物 shíwù, 'food')
- ngánăn, biếngăn, nhịnăn (厭食 yànshí, 'anorexia')
- thamăn, hamăn (貪吃 tānchī, 'gluttony')
- háuăn (好食 hàoshí, 'appetite')
- cóăn (有錢鑽 yǒuqiánzuàn, 'make money')
- ănnóibợmtrợn (胡說霸道 húshuōbàdào, 'talk nonsense')
Phonologically, the prefix ăn- aligns with initial consonants such as y-, w-, sh-, ch-, j-, suggesting a sandhi-driven convergence. This pattern reflects a broader morphophonemic shift where ăn absorbs and reinterprets elements from Chinese disyllabic compounds.
Etymologically, both 吃 chī and and VS ăn have their own pathway of development. The character 吃 (chī) has complex phonological roots from 喫:
- 喫 chī, jī, jí (吃 SV ngật, cật) < MC kjit < OC *kɯd
- 喫 chǐ < MC kʰek < OC *ŋ̥ʰeːɡ
- 唵 ǎn (SV àm, ảm) < MC ʔəm < OC qoːmʔ
- 奄 yǎn, yān (SV yễm) < MC ʔɜm < OC *ʔramʔ
Starostin notes that 喫 originally meant 'stammer', later evolving to 'eat', 'drink', 'swallow'. The phonetic stem 乙 yì (SV ất) and the {-t ~ -n} ending suggest a plausible pathway to ăn. Thus, ăn may derive from 唵 ǎn, reinforcing its semantic and phonological legitimacy as a cognate.
The author’s hypothesis posits ăn- as a conceptual umbrella encompassing a wide array of Vietnamese words derived from Chinese etyma. While not all derivations may be universally accepted, the majority exhibit compelling phonosemantic and structural parallels. This approach challenges rigid Austroasiatic Mon-Khmer frameworks and invites a reevaluation of Vietnamese etymology through a Sinitic lens, one that embraces sandhi, semantic drift, and morphemic innovation.
References
-
Bousquet, Gisèle Luce & Pierre Brocheux. Viêt nam Exposé: French Scholarship on Twentieth‑Century Vietnamese Society. University of Michigan Press, 2002.
-
Brindley, Erica Fox. Ancient China and the Yue: Perceptions and Identities on the Southern Frontier, c.400 BCE–50 CE. Cambridge University Press, 2015.
-
Brindley, Erica Fox. “Ancient China and the Yue.” Journal of Chinese History, Cambridge University Press, 2017.
-
Chang, Yu‑fen. “Constructing Vietnam, Constructing China: Chinese Scholarship on Vietnam from the Late Nineteenth Century until the Present.” Journal of Asian Studies, 2010.
-
Chinese Academy of Social Sciences (CASS). Studies on the Luoyue and Lạc Việt. Beijing: Zhonghua Shuju, 2005.
-
Hà Văn Tấn. Lịch sử Việt nam cổ đại. Hà Nội: Nhà xuất bản Đại học Quốc gia, 2002.
-
Henry, Eric. The Submerged History of Yue. Sino‑Platonic Papers, No. 176. University of North Carolina.
-
Leith, Seamus P. An Investigation into the Tai‑Kadai Substratum in Yue. MA Thesis, Leiden University, 2017.
-
Liang Tingwang. The Zhuang and the Ancient Yue. Guangxi Normal University Press, 2012.
-
Monnais‑Rousselot, Laurence. Médecine et Colonisation: L’aventure indochinoise, 1860–1939. Paris: CNRS Éditions, 1999.
-
Nguyễn Ngọc San. Nguồn gốc người Việt và tiếng Việt. Hà Nội: Nhà xuất bản Văn hoá Thông tin, 1993.
-
Nguyễn Tài Cẩn. Nguồn gốc và Quá trình Hình thành tiếng Việt. Hà nội: Nhà xuất bản Giáo dục, 1995.
-
Papin, Philippe. Việt Nam: Histoire et Civilisation. Paris: Éditions Fayard, 2003.
-
Phạm Đức Dương. Ngôn ngữ và Văn hoá Việt nam trong bối cảnh Đông Nam Á. Hà nội: Nhà xuất bản Khoa học Xã hội, 2001.
-
Poisson, Emmanuel. Mandarins et Modernité: Les Pratiques Administratives au Vietnam au XIXe siècle. Paris: École Française d’Extrême‑Orient, 2004.
-
Trần Quốc Vượng. Việt Nam: Văn hoá và Con người. Hà nội: Nhà xuất bản Khoa học Xã hội, 1998.
-
Yue, Anne O. "The Yue Language." In The Oxford Handbook of Chinese Linguistics. Oxford University Press, 2015.
FOOTNOTES
(1)^^ Throughout Vietnam’s history, the descendants of the racially mixed Annamese population – later known as Người Kinh, the Kinh, or 京 Jing (‘the metropolitans’) – established their early dominance in the Red River Delta before gradually migrating westward and southward into new settlements. In the process, they displaced or absorbed many indigenous groups, most notably the Mường and the Mèo (Hmong), along with smaller Daic‑origin communities such as the Tày, who today number over a million and continue to inhabit the remote northern highlands, including the more recently incorporated territories of Laichâu and Điệnbiên.
As Annam expanded further south, the Kinh supplanted the Chamic
populations along the coastal plains and pressed into the southwestern
uplands traditionally occupied by Mon‑Khmer groups. Over time, these
once‑dominant peoples became minorities within their own ancestral
lands. The legacy of this displacement remains visible today. In the
early years of the twenty‑first century, clashes erupted between
Khmer‑descended communities and the Kinh majority, episodes that were at
times tacitly tolerated or even directly sanctioned by the state. The
resulting tensions forced many Khmer minorities to flee across the
border into Cambodia, while hundreds eventually resettled in the United
States in 2005.
(2)^ From a historical perspective, one finds few instances of open racial
conflict between Vietnamese and Chinese communities. Even resentment was
generally directed not at long‑assimilated groups, those who had
intermarried with the Kinh over centuries, but at more recent arrivals,
often fresh from the boat, only two or three generations removed from
China and still trying maintain a distinct ethnic identity.
Nevertheless, the degree of integration is high as illustrated by the
fact that many celebrated performers in contemporary Vietnam are of Hoa
origin, a reality that contrasts sharply with the violent anti‑Chinese
episodes that have recurred in other Southeast Asian countries such as
Indonesia, the Philippines, or Malaysia.
The large‑scale departure of Chinese minorities from Vietnam between
1979 and 1990 should be seen not as spontaneous hostility from the
populace but as a politically orchestrated policy of the state. Most of
those who left were relatively recent immigrants, with less than a
century of settlement, while earlier arrivals had by then largely
assimilated into Vietnamese society. Their expulsion was closely tied to
the post‑1975 imposition of socialism following national reunification,
which provided the government with grounds to confiscate private assets
– factories, banks, and businesses – and was later intensified by the
Sino‑Vietnamese conflicts of the late 1970s.
In other words, there were no impromptu acts of violence initiated by
ordinary Vietnamese against individual members of the Chinese minority.
Even the 2014 riots, which damaged hundreds of Chinese‑owned factories
in Vietnam, were directed symbolically at Beijing’s leadership in
Zhongnanhai rather than at Chinese laborers working inside the
country.
(3)^ Across the seventy‑two volumes of the Zizhi Tongjian (資治通鑑) Sīmă Guāng’s monumental chronicle of Chinese history from the Xia and Shang dynasties through the Song, one recurring theme is the geography of exile. Again and again, disgraced officials, political rivals, and fallen elites were banished to the far south: the ancient Língnán (嶺南) region – today’s Guangxi, Hunan, and Guangdong – together with the northern reaches of Vietnam, once the NamViệt Kingdom (南越王國), and the island of Hainan. Exile, in other words, was not an occasional punishment but a structural feature of Chinese political life.
(4)^ If Vietnam could have not gained independence from China in the 10th century and were still a China's satellite state, protectorate, or province, then views from the modern linguists for the Vietnamese could have been completely similar to what was designated to the Cantonese and Fukienese dialects as that of the Sino-Tibetan linguistic language.
(5)^ All Thai citations in the form of individual word are quoted from Wiktionary.org.
(6)^ Vietnamese cay corresponds to 苦 kǔ
(SV khổ), which combines with 辛 xīn
(SV tân) to form cayđắng (辛苦 qīnkǔ, SV
tânkhổ, 'hardship’). The root of cay lies in Middle
Chinese 苦 kǔ, with parallels such as 亲酸 qīnsuān
(VS chuacay, ‘bitterness’).
[ 苦 kǔ < MC khɔ < OC kha:ʔ.
According to Starostin: ‘be bitter’. Also used for a homonymous *kha:ʔ
‘sow‑thistle’ (Sonchus oleraceus?). Vietnamese khó is
a colloquial development, restricted to the sense ‘(bitter) > hard,
difficult’, which also exists in Chinese. The regular Sino‑Vietnamese
form is khổ. ]
In modern Mandarin, ‘spicy hot’
is 辣 là (SV lạt), while 苦 kǔ (SV khổ)
corresponds to Vietnamese cay. In archaic Chinese,
however, 辣 là was closer to Vietnamese lạt ‘insipid,
not salted’.
[ 辣 là < MC ra:t <
OC lat. FQ 盧達. According to Starostin: ‘bitter, not
sweet’ (Tang). In Vietnamese cf. nhạt ‘insipid, not
salted’, written with the same character and possibly a colloquial loan
from the same source, though the nasalisation remains uncertain.
For r‑ cf. Min forms: Xiamen luat8, luaʔ8, Chaozhou laʔ8, Fuzhou lak8,
Jianou luoi8, Jianyang lue8,
Shaowu lai6. ]
For Chinese 辛 xīn ‘bitter’
(SV tân), the Vietnamese equivalent
is đắng.
[ 辛 xīn <
MC sjin < OC sin. According to
Starostin, also used for a homonymous *sin ‘be bitter, pungent,
painful’. ]
This cluster of correspondences illustrates a
common phenomenon in historical linguistics: semantic shifting among
archaic roots, where meanings oscillate between ‘bitter’, ‘spicy’,
‘painful’, and ‘insipid’.

