On the One-size-fits-all Conspiracy
by dchph
The Austroasiatic Mon-Khmer hypothesis is critiqued for its methodological rigidity and lack of historical grounding. Western theorists often imposed Indo-European frameworks onto Southeast Asian languages without fluency in Vietnamese or Chinese. The theory’s reliance on basic-word lists and speculative reconstructions has led to circular reasoning and misclassification. Vietnamese etymology, when examined through analogical and dissyllabic methods, reveals stronger ties to Sinitic-Yue than to Mon-Khmer. Examples include mẹ, mợ, mái, gàmẹ, gàmái, and cậumợ, all traceable to Old Chinese 母 mǔ and its derivatives.
I) Misreading Vietnamese: Western linguistics and the Austroasiatic trap
The author has long suspected that the Austroasiatic Mon-Khmer hypothesis was shaped by individuals lacking proficiency in both Vietnamese and Chinese, as well as in the historical contexts of their respective speech communities. Western neo-theorists, particularly in the post – Industrial Revolution era, often pursued methodological shortcuts – placing all assumptions into the Austroasiatic framework they were constructing. In doing so, they manipulated data to build supplementary models designed to override earlier theorizations, regardless of their historical grounding.
These efforts relied heavily on the authority of Western academic conventions, despite the fact that Chinese language and history remained largely unfamiliar to Western scholars until the early seventeenth century. (Knud Lunbæk, 1986).
The author's speculation is rooted in the observable rigidity of outdated data, inflexible presentation formats, frequent misspellings, overgeneralization from narrow samples, and repetitive patterns that fail to account for the linguistic nuance inherent in the lexicon of the target language.
Table 1 - What counts as evidence?
A checklist contrasting speculative reconstructions vs. documented cognates.
| Method | Example | Reliability |
|---|---|---|
| Reconstructed proto-forms | \ba* (hypothetical Mon–Khmer root) | Low – speculative |
| Documented cognates | mẹ, mái, mợ with Old Chinese parallels | High – attested |
| Tonal evolution | mẹ ↔ OC mjie | Medium – requires historical context |
| Folk usage | gàmẹ, gàmái, cậumợ | High – culturally embedded |
As languages evolve, semantic shifts often obscure original relationships. This issue evokes the familiar metaphor of the chicken-and-egg dilemma, though in this case the question is not which came first, but what form emerged, that is, a chicken, hen, rooster, or cock. For example, while 口 kǒu as "mouth" came before 吻 wěn, which once also referred to "mouth", later evolved to mean "kiss", aligning with VS "hôn". In Ancient Chinese, it is reasonably accepted that 土 tǔ ('soil') preceded 地 dì ('land'), and 口 kǒu ('opening') came before 吻 wěn ('mouth'). Yet can any Sino-Tibetan specialist definitively determine whether the Vietnamese equivalents – in this case, "đất" versus 土 tǔ, "cửa" versus 口 kǒu, or "mồm" versus 吻 wěn – originated first within their respective linguistic trajectories?
If Sino-Tibetan scholars themselves cannot decisively establish the directionality of key linguistic developments, it renders even more tenuous the piecemeal assertions made by Austroasiatic theorists regarding linguistic primacy. This underscores a broader principle: all theories of genetic linguistic affiliation remain provisional, open to revision as new evidence emerges. The Austroasiatic hypothesis, in particular, is hampered by a lack of historical documentation to substantiate its claims. It relies heavily on reconstructed etymons and speculative lexical correspondences, and therefore must be approached with measured skepticism rather than uncritical acceptance.
II) The Mon-Khmer conspiracy: a century of overreach
Let us take a moment to relax and engage in a metaphorical exercise to help visualize the broader linguistic taxonomy at hand. Imagine the Mon-Khmer theory as a handful of specimen fish placed within a much larger basket, one that also contains Austroasiatic, Yue, Taic, and Sino-Tibetan species. Among these, the Sino-Tibetan and Yue-Taic varieties are netted in far greater volume, with early Chinese written records documenting each catch through oracle bone inscriptions, turtle-shell divinations, and bronze tripod engravings. These artifacts fall squarely within the timeframe relevant to our inquiry, unlike the abstract mysticism of Pali or Sanskrit chants, which drift untethered in the air. Notably, Tibetan scholars took extensive notes on these traditions.
From the author’s perspective, alongside the Bodic (Tibetan) languages, the earliest linguistic formations included Taic, followed by Daic and the Yue split. Each of these branches produced lineages that stand on equal footing with the Sino-Tibetan family, which gave rise to its Sinitic descendants. These can be grouped in parallel with both Tibetan and Yue elements.
In contemporary discourse, the Austroasiatic hypothesis attempts to encompass this entire spectrum, positioning itself as a counterpoint to the Sino-Tibetan framework, particularly in relation to the Vietic segment. While ancient Taic gave rise to Yue and its newly siblings Austroasiatic languages, hence, including what we now recognize as Sinitic Vietnamese, the modern Western interpretation has reframed this lineage under a different guise, distancing itself from the Yue theory. The term 越 /Jyet/, transcribed through various homophonous characters in Chinese annals, reflects this historical complexity.
As we examine the southward migratory movement from China South into the Indo-Chinese peninsula, the Sino-Tibetan theory offers a compelling etymological explanation for the cognacy of over 400 fundamental words shared between Vietnamese and Chinese, as documented in this survey. However, pursuing the Sino-Tibetan route requires navigating vast repositories of Chinese historical records, many written in archaic and classical styles, which this study has chosen to engage directly.
All considered and for what follows next, be it Austroasiatic, Taic, or Sinitic, it is reasonable to assert that the Yue entities existed first emerged in succession from that foundational stratum.
Table 2 - Austroasiatic controversy or conspiracy? A case of Yue denial.
If the Austroasiatic Mon-Khmer theory were built on the premise that Mon-Khmer aboriginal groups – rather than Daic populations – were the earliest inhabitants of the Red River Basin, then it follows that this region served as the Indo-Chinese cradle for subsequent cultural and linguistic developments. According to this view, Austroasiatic communities were already established prior to the arrival of Yue-Daic migrants, who were mistakenly assumed to have come later. This assumption stands in contradiction to a wide range of historical and archaeological evidence.
Western theorists have often disregarded Yue contributions, constructing new frameworks without engaging with available records. In doing so, they bypassed centuries of documented history, including sources that have long posed interpretive challenges for scholars in mainland China since the early seventeenth century.
The theory further suggests that incoming resettlers intermingled with existing aboriginal populations–identified as Austroasiatic peoples based on Mon-Khmer assumptions–which had already spread across Southeast Asia.
Later waves of Yue-Daic speakers from southwestern regions, including Lower Laos, arrived and were followed by Sinitic-Yue migrants from South China. It was during this period that the linguistic configuration now labeled as the Austroasiatic family began to take shape, eventually extending across the southern and western zones of the Indo-Chinese peninsula.
Sino-Tibetan etymons for fundamental Vietnamese were largely disregarded as Austroasiatic theorists advanced their consensus on the origin of the language. They asserted that Vietnamese was spoken uniformly across the population and derived from a foundational set of Mon-Khmer cognates. These etyma were presented as conclusive and dependable proof of Austroasiatic influence in shaping the Vietnamese linguistic profile.
The Austroasiatic hypothesis, as a matter of fact, could have been used as patch works in a second thought to fill in all possible cracks where the Sinitic elements still stayed hidden and unnoticed in between linguistic pockets scattered intercontinentally within the timeframe of approximately 6,000-10,000 years ago (the same estimate might be reached with the least percentage of basic cognates with glottochronology calculation.) It is understandable to see Indian elements in the Khmer and Chamic languages in ancient forms of Sanskrit or Pali origin words as they used to be under strong influence of Buddhism and Hinduism, respectively, but they appear to be alien in Vietnamese except for what they sound in common Buddhist prayers such as 'MôPhật', a shortened form of 'Nammô AdiđàPhật' ('Namo Amitabha') that is assume to convey a much localized context.
What we observe today as the status quo in Vietnamese linguistic classification is the result of a long trajectory of competing hypotheses. These either aim to (1) nullify existing theories of Austroasiatic Mon-Khmer or Sino-Tibetan origin from opposing viewpoints, for instance, China's official institutions classify Cantonese, a Yue language, as part of the Sino-Tibetan family based on its Sinitic etyma, or (2) construct new frameworks atop the same hypothetical foundations, leveraging modern methodologies. One such example is the Austro-Thai hypothesis proposed by Benedict (1975), which builds upon similar premises as the Austroasiatic Mon-Khmer model.
In response to earlier theories, Austroasiatic proponents advanced the view that aboriginal populations, retrospectively labeled as Mon-Khmer, were the original inhabitants of the Red River Delta, rather than Yue-Daic resettlers. According to this model, subsequent migratory waves during the Han colonial period brought Tai-Kadai speakers from present-day Lower Laos and racially mixed Sinitic-Yue groups from South China. These later arrivals intermingled with the indigenous populations, producing new ethnolinguistic communities that gradually dispersed across Southeast Asia.
From this Mon-Khmer substratum, the Austroasiatic linguistic family is said to have emerged and expanded northward and westward throughout the Indo-Chinese peninsula. Mon-Khmer speakers are credited with introducing foundational vocabulary to local populations, including those who spoke early forms of Vietic. Even after cognates were later identified between Vietnamese and modern Cambodian etyma, Austroasiatic theorists continued to maintain that Vietnamese originated from Mon-Khmer linguistic roots (see Nguyễn Ngọc San, 1993).
Genetic studies conducted by Vietnam's DNA research institutions further complicate this narrative, though. Recent findings indicate that Vietnamese, Thai, Daic, Yao, Hmong, Mon, Khmer, and southern Chinese populations share similar genetic markers, suggesting a more intertwined ethnolinguistic heritage than previously acknowledged.
Socially and academically, many individuals tend to follow prevailing beliefs, especially when those beliefs are widely accepted and institutionally reinforced. For newcomers, it is often easier to adopt the Austroasiatic Mon-Khmer classification of Vietnamese, which has become one of the most dominant theories in genetic linguistic affiliation. Yet linguistically, the postulated Austroasiatic languages themselves, as forementioned, had evolved from a common Taic-Yue source, one that also gave rise to Yue daughter languages spoken by ethnic groups in China South. This Taic-Yue lineage could plausibly extend to Tai-Kadai, and even Austronesian and Polynesian divisions, all considered branches of the broader Taic linguistic family.
The Austroasiatic theorists, in constructing their Mon-Khmer hypothesis, applied Indo-European methodologies to Vietnamese without fully engaging with its historical and cultural context. As seen in A. Meillet and Marcel Cohen’s Les Langues du Monde (1952), the effort to position Mon-Khmer as a foundational linguistic family did not require deep engagement with Vietnamese or Chinese linguistic traditions. To illustrate this methodological imposition, let us consider a hypothetical case in the Amazon jungle.
Imagine Western linguists arriving to survey two remote Amazonian tribes in an effort to determine their linguistic affiliation. Applying the same logic once used to reframe Vietnamese origins, they approach the task with a mid-19th-century colonial mindset. Upon discovering that speakers in village B share a handful of basic words with those in village A – previously surveyed – they proceed without historical context. Instead of investigating deeper cultural or genealogical ties, they take an academic shortcut: they invent a label such as "Root A" and classify both languages under this newly coined family, assuming the tribes themselves lack awareness of their linguistic heritage and the label imposed on them will be what is to be accepted. In doing so, they exclude the scholarly communities from the process of classification and impose a framework shaped more by external assumptions than by lived reality.
This mirrors the approach taken by Western scholars and missionaries in 18th-century Annam. Confronted with the complexity of the Chinese-based Nôm script, they bypassed Chinese altogether and devised a Romanized orthography for Vietnamese. This system, complete with its own grammar, was tailored to the needs of the largely illiterate population and served the missionaries' evangelical objectives. They believed they had resolved a millennia-old problem that Annamese scholars had failed to address–unlike their smarter Korean and Japanese counterparts, who had successfully developed phonetic systems to complement Chinese ideograms. In essence, the Vietnamese linguistic and cultural mindset had long been shaped within the hardened mold of Chinese intellectual tradition that overlooks on every cultural aspects of the country.
It is important to recall that similar Romanization efforts in China ultimately failed. Western missionaries were met with widespread resistance, compounded by high illiteracy and entrenched cultural norms. This failure highlights the limitations of Western intervention in deeply rooted linguistic systems. By contrast, the Latinization of Vietnamese succeeded only with the support of the French colonial administration, which institutionalized the Romanized script.
In contrast, the Annamese case saw the successful imposition of a Romanized orthography, which bypassed the complexities of Nôm and classical Chinese script which has made a vast popular base. illiterate. However, this was not a natural linguistic evolution; it was a colonial shortcut. Ironically, the outcome in Vietnam proved more transformative than in China, despite the latter’s longer engagement with foreign missionaries.
The Austroasiatic Mon-Khmer classification, meanwhile, emerged as a technical construct at the turn of the previous century. Western scholars, many of whom lacked proficiency in Chinese, coined the term with strategic intent. "Austro-" was used to denote "south," while "Asiatic" signaled a continental linguistic scope. This allowed them to frame a language family that ostensibly originated in the southern regions of Asia, including South China, as distinct from the north.
By doing so, they effectively created a hypothesis that encompassed nearly every language spoken across Southeast Asia and South China, citing shared lexical roots, even suggesting that Chinese itself borrowed from these sources. In the process, they sidestepped the historical role of Chinese influence and even dismissed terms like "Sino" and "Sinitic" as politically motivated labels, particularly in the classification of Cantonese and Fukienese within the Sino-Tibetan family.
This maneuver, much like the hypothetical case of Amazonian tribal languages discussed earlier, reflects a broader pattern of Western linguistic theorization: one that often privileges convenience and conceptual neatness over historical depth and cultural specificity.
In the real world, linguistic theories, like languages themselves, are subject to change. Their volatility mirrors the dynamic nature of speech communities and the evolving tools used to study them. For example, Zhuang and Daic languages were long classified directly under the Sino-Tibetan family just like other Sinitic lects before being reassigned to the Tai-Kadai language family. This reclassification underscores the provisional nature of linguistic taxonomy, especially in contrast to the relative stability found in the natural sciences.
The Austroasiatic hypothesis, likewise, remains inconclusive. Until every foundational issue is resolved, it must be treated as a working model rather than a definitive account. This stands in contrast to the Indo-European framework, which has achieved broad scholarly consensus and left behind a robust legacy of analytical tools used to trace the origins of languages such as Pali, Sanskrit, Greek, Latin, Germanic, Baltic, and Gaulish.
III) Accommodating Austroasiatic claims: positional designation and diagrammatic nesting
In the case of our Sinitic-Vietnamese study, the initial objective was to reclassify the Vietnamese language into its rightful sub-family, as its historical and linguistic lineage implies. To achieve this, academic consensus must acknowledge the existence of an ancient Yue language family, one that can be verified through historical records. The ancient phonetic form rendered as "Jyet," or possibly "Bjyet," is recognized in modern Mandarin as "Yue," and appears in classical sources such as the Erya (爾雅), which was used for diplomatic communication during the Spring and Autumn Period. This is further evidenced by the continuity of major Yue-Sinitic dialects – Cantonese, Fukienese, and Wu – whose linguistic features elevate their Vietic counterparts to a shared Yue origin.
Historically, Zhuang and Daic languages, now classified under the Tai-Kadai–also known as Krai-Dai–family, were once grouped by Western scholars within the Sino-Tibetan framework. Chinese institutions later reclassified them as distinct branches, yet still under the broader Sino-Tibetan umbrella. For the same purpose, our reference to Cantonese, Fukienese, and Wu dialects is a deliberate attempt to justify the regrouping of Vietnamese into the same linguistic sub-family, aligning it with languages officially endorsed by Chinese academic authorities as part of the Sinitic branch.
In the contemporary battle over linguistic truth, China's information apparatus actively edits and counter-edits digital content across platforms such as Wikipedia and Facebook beyond its bubble autocratic sphere, shaping public perception through curated narratives. To finalize any major additions to the Sino-Tibetan family, comparable linguistic analytical tools must be employed. Historical linguists must also recognize the persistent Sinicizing force that has layered Chinese superstrata over Yue substrata–aboriginal elements that remain embedded beneath the surface.
This is not a distant or speculative past. Evidence from classical texts, including the Erya, confirms that Yue linguistic elements predate many archaic Chinese forms that later became foundational to Sinitic languages (see De Lacouperie, 1965). These elements are central to understanding the evolution of the region’s linguistic landscape.
To reconcile tensions between Sino-Tibetan and Austroasiatic models, our Yue-Sinitic framework adopts the same objective methodologies used by Western Austroasiatic theorists. Rather than reiterating older Sino-Tibetan paradigms, we establish Yue as a foundational stratum, integrating Sino-Tibetan etyma found across diverse dialects, including overlooked varieties such as late Northeastern Mandarin. Semantic shifts, such as 順路 shùnlù (VS thuậnlối) versus 順道 shùndào (SV thuậnđường), both words mean 'be by the way', illustrate the nuanced lexical dynamics that inform this analysis.
This methodological parallel allows for a deeper exploration of Vietnamese etymology, using the same mechanisms and tools that Austroasiatic theorists applied to Mon-Khmer languages. Many archaic Chinese words of Sino-Tibetan origin remain dormant, preserved in classical texts and archaeological substrata, long before the Austroasiatic Mon-Khmer theory emerged through Middle Vietnamese contact with Khmer in the south.
It is a historical fact that Vietnamese emigrants to southern territories once inhabited by Chamic peoples of Austronesian Malayo-Polynesian origin only began resettling those lands after the 17th century. Their interaction with Mon-Khmer speakers spans less than 370 years, a relatively recent development in linguistic terms. Middle Vietnamese was not the native language of these regions; it arrived with later migrants who followed the Mekong upstream into areas such as Tonle Sap Lake in Cambodia. In comparative terms, Austroasiatic linguistic claims resemble cultural assertions made by Vietnamese archaeologists who have controversially attributed artifacts from the Sahuỳnh and ÓcEo civilizations to Vietnamese ancestors. These claims, often criticized for their speculative reach, reflect a broader impulse to root national identity in material and linguistic heritage, regardless of historical nuance.
It is somewhat mechanical and dull to simply quote and re-quote the same old Austroasiatic basic etyma from one scholar to another, of which their lexical origins were supplied by "seasonal linguists of some summer's institute". For those who actually did not know the Mon-Khmer languages under investigation very well and they, in turn, mostly relied on translated versions mainly provided by local informants and interpreters, theirs being only casual translation without knowledge of etymological linguistics, in place of true cognates obtained methodologically from linguistic rules. In other words, local guides being not stakeholders, at the time, they might have not been aware of importance of their work that would finally exert so significant imprints in the Vietnamese historical linguistic records.
Many of the linguistic claims made within the Austroasiatic camp, particularly those concerning Vietnamese etymology, have relied on a narrow set of basic-word cognates between Vietnamese and Mon-Khmer languages. These wordlists, repeatedly cited since the mid-1960s, were often presented as methodologically sound despite their limited scope. Specifically, it is ill-advised to build a robust linguistic theory by revisiting the same handful of examples, such as the five counting numbers, names of local fruits and flora, or other low-frequency items drawn from regionally specific Mon-Khmer isoglosses.
Amusingly, newcomers to the field have continued this pattern, using these dated lists as springboards for new interpretations while remaining tethered to the same foundational assumptions. The result is a circular methodology: wordlists originally compiled during brief, grant-funded fieldwork in the remote highlands of South Vietnam during the Vietnam War era are recycled and elevated without critical reassessment. Many of the linguists involved, along with their Mon-Khmer guides, were only semi-literate in the languages under study. Consequently, the cited vocabulary, if valid at all, likely reflects archaic or borrowed forms whose phonological integrity has long since eroded.
To move beyond this problematic legacy of linguistic admixture and recent discrepancies, it is essential to inspire a new generation of scholars fresh from academic training to engage with Sinitic-Vietnamese historical linguistics. This requires first dismantling the misclassification of Vietnamese as an Austroasiatic Mon-Khmer language. The theory itself dates back to the early 20th century and has persisted largely due to institutional inertia. Young researchers, influenced by mentors steeped in Austroasiatic frameworks, often find themselves defaulting to the Mon-Khmer model out of familiarity and academic convenience to start with.
IV) Yue denial: rewriting the Red River narrative
Moreover, they continue to rely on outdated data from early fieldwork, which, while pioneering, was riddled with methodological flaws. If we can set aside the bitterness and sarcasm that sometimes accompany theoretical disputes, and if no entrenched interests obstruct critical reassessment, then even flawed past studies may serve as stepping stones toward meaningful breakthroughs.
Table 3 - The Yue continuum
| Zone | Representative Lects | Shared Features | Diffusion Path |
|---|---|---|---|
| Core zones | Cantonese, Fukienese, Teochew, Hakka | Tonality, polysyllabicity, semantic layering | Originating in Lingnan |
| Peripheral zones | Vietnamese, Zhuang, Pinghua | Partial convergence with Yue features | Extending toward the Red River delta |
As we prepare to reinitiate a Sino-Tibetan algorithmic approach to Vietnamese etymology, it is important to affirm that there remains ample room for interpretive freedom within either camp. Whether one leans toward Austroasiatic fieldwork or places trust in Mon-Khmer guides, linguistic competence must be prioritized. Ideally, such guides should be institutionally trained Khmer native speakers fluent in both Vietnamese and a Mon-Khmer language. Even better would be those with prior collaboration experience and familiarity with multiple Mon-Khmer varieties, alongside a strong command of Chinese–especially Archaic Chinese.
This latter qualification becomes crucial when comparing Mon-Khmer wordlists with affirmatively readable Ancient Chinese forms known to have existed in Vietnamese for millennia. Consider, for example, the twelve animals of the earthly zodiac, which the Khmer also share. The Vietnamese term ‘nămMèo’ aligns with 卯年 Mǎonián that clearly denotes the "Year of the Cat," not the "Rabbit," as often mistranslated directly from the Chinese language. Such examples underscore the need for deeper philological rigor and cross-referencing with classical Chinese sources.
The early Austroasiatic Mon-Khmer specialists who first compiled Vietnamese-Mon-Khmer cognate lists often lacked both linguistic sensitivity and sufficient proficiency in the languages under study. True mastery of these living languages, ideally at a near-native level, is not merely desirable but essential. More than translation, what is required is a deep "language feeling", a kind of intuitive grasp that allows one to perceive subtle semantic and phonological resonances that only trained historical linguists can detect.
This "feeling for the language" becomes especially evident when examining the thousands of Chinese-Vietnamese cognates that reveal themselves only to a discerning and linguistically attuned mind. These "exploding words", as they manifest in Vietnamese, follow no parallel pattern in Mon-Khmer languages. Only a competent historical linguist can trace the phonological evolution that clarifies the semantic layering of 母 mǔ (SV mẫu, VS mẹ, mái, mợ, "mother", "female", "aunty") – notably, the term "mợ" as "Mom" has undergone semantic expansion to encompass broader kinship references, including "uncle and aunt" as addressed by nieces and nephews, and even "parents" in northern dialectal usage – in compound forms such as 繼母 jìmǔ (SV kếmẫu, "stepmother"), 母雞 mǔjī (SV mẫukê, "hen"), and 舅母 jìumǔ (SV cựumẫu, "maternal uncle's wife"). These yield Vietnamese variants like "mẹghẻ" vs. "mẹkế" (stepmother), "gàmẹ" vs. "gàmái" (hen), and "cậumợ" vs. the contracted "mợ" (maternal uncle’s wife).
Each of these Vietnamese forms can be traced back to Old, Middle, modern, or regional Chinese variants, revealing a layered cultural and linguistic inheritance. No Mon-Khmer equivalent exhibits the same depth of semantic nuance or cultural embedding. The distinctions among mẫu, mẹ, mợ, and mái reflect a sophisticated interplay of phonology, kinship semantics, and cultural transmission that is uniquely Sinitic in character.
Furthermore, many of the basic words currently cited as Mon-Khmer–Vietnamese cognates may no longer be credible. Based on new Sino-Tibetan findings presented in this research, several of these items–once thought to be Austroasiatic in origin, now point to deeper roots within Sino-Tibetan etymologies. This reevaluation invites a broader reconsideration of Vietnamese linguistic classification and its historical affiliations. (Shafer, 1966 - 1974. Refer to Sino-Tibetan etyma.)
V) Patchwork etymologies: reconsidering Mon-Khmer claims
The author could go on endlessly discussing Vietnamese etymology in relation to Chinese, elaborating on lexical developments across Ancient Chinese and its dialectal variants. It is not that he is a formally trained historical linguist, nor does he claim proficiency in Khmer. Yet when Henri Maspero (Les Langues du Monde, 1952, pp. 582-83) asserted that Mon-Khmer languages constitute the substratum of Vietnamese and that its grammar reflects Thai and Mon-Khmer structures, the author immediately recognized the flaw in such a statement, and understood what led Maspero to that conclusion: a lack of deep engagement with Mon-Khmer linguistic realities.
With his fair mastery of Vietnamese and Chinese, both at native fluency and with academic grounding in their historical linguistics, the author perceives what many Mon-Khmer specialists, even those with comparable linguistic training, have overlooked. On one hand, they may competently analyze individual elements and propose Mon-Khmer cognates for Vietnamese equivalents. On the other hand, they do not speak Vietnamese and Chinese with a "feeling", the intuitive grasp possessed by bilingual native speakers who can articulate etymological relationships with precision and insight.
Specifically in the field of historical linguistics concerning Chinese and Vietnamese etymologies, Sinitic-Vietnamese words often trace back to a single etymon that may give rise to multiple Chinese variants. These may appear in differentiated forms, e.g., "quả" (fruit) as 果 guǒ vs. 菓 guǒ, or "đậu" (bean) as 豆 dòu vs. 荳 dòu, with original forms sometimes recycled to convey new meanings. Syllabically, phonological fragments evolve across multiple strata, and knowing only one or two layers of an etymon is insufficient–since the same concept may manifest through distinct phonologies. As readers will later observe, many Chinese and Vietnamese words have evolved through three or more etymological layers, including the phenomenon of 'doublets', that is, words derived from the same source but diverging in form and meaning.
For instance, Chinese 會 huì (SV hội) may have yielded VS forms such as 'hiểu', 'họp', and 'hụi', meaning 'understand', 'meeting', and 'trust fund', respectively. Similarly, 川 chuān, 水 shuǐ, and 江 jiāng are all possibly cognate to Vietic */krong/ or VS 'sông' (river), with further derivatives: 川 chuān for 'suối' (stream) evolving into 泉 quán (creek); 水 shuǐ for 'nước' (water) leading to 江 jiāng for 'sông' (river), which itself is of the same root as 長江 Chángjiāng (Yangtze River).
In these examples, it is no coincidence that many Austroasiatic Mon-Khmer specialists have failed to recognize the formation of polysyllabic doublets derived from Sinitic variants. Their cited Mon-Khmer etyma often appear repackaged from secondary sources, with Vietnamese terms frequently misspelled or mislabeled, even in academic publications, errors that cannot be dismissed as mere typographical oversight. While their contributions to Vietnamese linguistic studies are acknowledged, such omissions render their work incomplete, and at times, methodologically biased.
The presence of Vietnamese cognates in Mon-Khmer and Thai languages does not necessitate a Mon-Khmer origin. These forms may reflect shared proto-roots that penetrated archaic Chinese as well. No finger-pointing is needed, but when examining basic words cited by Henri Maspero (Ibid. 1952, pp. 582–83), we find Vietnamese etyma grouped under Mon-Khmer or Thai roots, e.g., "sông" (river), "rú" (forest), "chim" (bird), "lúa" (paddy), "áo" (shirt) under Mon-Khmer; and "gà" (chicken), "vịt" (duck), "gạo" (rice) under Thai. Yet their Chinese counterparts–江 jiāng, 野 yě, 禽 qín, 來 lái, 襖 ào, 雞 jī, 鴄 pī, 稻 dào–along with doublets such as 水 shuǐ, 粗 cū, 隹 zhuī, 衣 yī, 鷄 jī, 鶩 wù, 穀 gǔ–reveal a deeper Sinitic lineage.
Consider 江 jiāng, which specifically denotes 'river' in China South, as in 湄江 Méijiāng (Mekong, modern 湄公河 Méigōnghé) and 長江 Chángjiāng (Yangtze River), both originating from 三江源 Sānjiāngyuán (Three River Source) in the Tibetan-Qinghai plateau. These do not derive from Cambodia’s Tonle Sap Lake, where 'Tonle' means 'river' and /-krong/ denotes 'city', not 'river'. Amusingly, the Vietnamese name "Sông Hồng" (Red River) is rendered in Chinese as 紅河 Hónghé (SV Hồnghà), not 紅江 Hóngjiāng (SV Hồnggiang), though some Vietnamese even say "Sông Hồnghà", i.e., 紅河江 Hónghéjiāng – mirroring the compound structure of 湄公河 Méigōnghé.
In any case, the term "Austroasiatic" as a linguistic family was coined by Western linguists seeking a swift academic solution to classify Vietnamese without invoking Chinese. Faced with Vietnamese speakers living among Khmer populations in the south, they bypassed the Sinitic influence from the north. At the turn of the 20th century, overwhelmed by the abundance of Sino-Tibetan etyma in Vietnamese, they struggled to assign a proper classification–much like the unresolved status of East Asian languages until the 17th century (see Lunbæk, Knud. 1986. T.S. Bayer (1694-1738), Pioneer Sinologist).
In sum, it is assumed that early Austroasiatic pioneers coined a new term for the unclassified linguistic umbrella they were holding. From the outset, they may have ignored or remained unaware of the linguistic depth in Vietic, Daic, Mon, and other languages spoken across Vietnam, Thailand, Laos, and South China, many of which, under the Sino-Tibetan framework, had already been grouped under the "Yue languages". This concept later expanded to include the Taic family, encompassing Daic-Kadai and Yue languages such as Vietnamese, Zhuang, Cantonese, Fukienese, and Wu.
Historically, the term "Yue" was rendered through various Chinese characters, i.e., 粵, 戉, 鉞, often referencing axe-like weapons and the tribes who wielded them, as noted in classical Chinese annals. The term "Taic" was a later ad hoc addition, used to encompass the ancient language of the Chu State, considered the progenitor of Daic-Kadai daughter languages and their Yue-related sisters, possibly including Zhuang or Nùng in Vietnamese, ancestral to Cantonese long before its Sinicization (De Lacouperie, Ibid. 1867 [1965]).
One may speculate that the old Yue framework was reshuffled to compile a new theoretical deck–later known as the Austroasiatic linguistic family, encompassing the Mon-Khmer sub-branch. In the absence of oriental philological expertise, the classification of Vietnamese as Austroasiatic Mon-Khmer was likely initiated by a new generation of Western linguistic enthusiasts eager to establish a foothold in historical linguistics, amid the rise of novel methodologies in the humanities. Linguists, of course, may name a newly "discovered" family as they see fit, especially at the turn of the 20th century. And so, the Austroasiatic initiators walked away with "Austro-Asiatic" (AA), unchallenged, having carved a convenient shortcut for themselves and for latecomers in a field that, arguably, already existed under another name.
To challenge the misconception that Vietnamese basic words stem from Austroasiatic Mon-Khmer origins, by coining the academically grounded term "Sinitic-Vietnamese" (VS), the author seeks to restore Vietnamese to its rightful place within the Sino-Tibetan classification because "Austroasiatic" as a misnomer.
In classifying Vietnamese etymological affiliation, the author introduces the term "Sinitic-Vietnamese" to foreground the role of the Sinitic-Yue linguistic sub-family in shaping the modern Vietnamese language. That is to say, the Vietic linguistic entities emerging after the breakup of the Vietmuong sub-family are best understood as Sinitic overlays atop a Yue substratum. It is plausible that the Sinitic stratum itself was an admixture of Yue elements, explaining the deep cognacy between Chinese and Vietnamese lexemes, such as 江 jiāng for VS 'sông' (river), 椰 yé for VS 'dừa' (coconut), and 糖 táng for SV 'đường' (sugar), all of which are considered aboriginal Yue words within this historical linguistic framework.
To clarify terminological confusion, the author prefers "Sinitic" over "Chinese," as the former denotes a genetic blend of Taic-Yue and possibly ancient Tibetan elements, rather than the vague "extinct foreign elements" referenced by Jerry Norman (1988). The Sinitic-Vietnamese etyma, later labeled simply as "Chinese-origin words" reflect Vietnam’s historical overshadowing by the larger Chinese sphere. Besides, Vietic – conceptually a pre-VietMuong variable within a continuum that stretches from the documented "Yue" in ancient Chinese record to the early formation of Annamese – is grouped within the Sino-Tibetan family.
The term "Sinitic-Yue" enters the discussion to correct the overextension of a purely "Sinitic" canopy. This terminological shift helps avoid ambiguities inherent in the Austroasiatic Mon-Khmer label. For example, while "Sinitic" has been misapplied to exclude Yue, the Austroasiatic designation was originally crafted to encompass languages presumed to originate from regions south of China South. Such framing implies a genetic affiliation rooted in Southeast Asia, leaving little room for alternative interpretations, especially given archaeological artifacts attributed to Mon-Khmer speakers. Yet this does not account for the Yue origin of Sinitic-Vietnamese lexemes such as 'cún' (puppy) and 'lợn' (piglet), which align with Chinese 犬 (quǎn, SV khuyển) and 腞 (dùn, SV đốn), respectively.
Prominent theorists in the Austroasiatic Mon-Khmer tradition have successfully trained generations of Vietnamese studies graduates using Western methodologies rooted in Indo-European linguistics. Rather than perpetuating a "business as usual" stance, it is time to explore a renovative approach to Yue and Sino-Tibetan theorization, as outlined in this survey.
It must be emphasized that the Austroasiatic Mon-Khmer theory, despite its institutional maturity, remains a hypothesis. It is an open-ended framework, subject to valid antitheses as Sinitic-Vietnamese research progresses. The theory continues to evolve, often drifting further from resolution as new scholars expose errors, such as conflating Sino-Vietnamese with Sinitic-Vietnamese forms and drawing flawed conclusions. Western methodology, while logical and systematic, is not infallible. In historical linguistics, there are no absolute maxims. Crucially, the Austroasiatic theory lacks historical documentation to support its claims and has yet to satisfactorily explain the full scope of Sinitic cognacy in Vietnamese. This paper aims to address that gap, revising outdated models in light of new Sino-Tibetan evidence embedded in Vietnamese basic vocabulary.
VI) Westward diffusion: Ferlus and the Annamese hypothesis
The Sino-Tibetan school of thought, with its accumulated scholarship, continues to offer etymological, historical, and theoretical value, such as the identification of Sino-Tibetan cognates, reconstruction of archaic forms, theorization of Old Chinese consonantal clusters and triphthongs, and hypotheses on tonogenesis. These foundations pave the way for a robust Sinitic-Vietnamese framework.
As for long-recognized Austroasiatic Mon-Khmer basic words, the author proposes a reevaluation. Take, for example, 'lá' (leaf):
- Mon-Khmer lineage: < 'ha' < hala < *pa (Chamic)
- Chinese lineage: < 葉 M yè, dié, shè, xiè < MC jiap, ɕiap < OC *leb, *hljeb
- Austroasiatic lineage: < Proto-AA *la, Proto-Katuic *la, Proto-Bahnaric *la, Khmer sla:, Proto-Vietic *laʔ, Proto-Monic *la:ʔ, Proto-Palaungic *laʔ, Proto-Khmu *laʔ, Proto-Viet-Muong *laʔ...
This paper presents newly resurfaced evidence with over 420 fundamental etyma, pointing decisively toward Sino-Tibetan origins based on Shafer's long-standing but underutilized list of Sino-Tibetan etymologies. (Shafer, 1966-1974. Refer to Chapter 10 - Sino-Tibetan etymologies.)
We return now to the matter of Sinitic-Vietnamese core vocabulary, which substantiates the etymological affiliation among Vietnamese, Chinese, and the Yue languages, all together forming the foundation of modern Vietnamese. Their historical interconnection, as documented in Chinese records, dates back less than 3,000 years. Ancient Chinese philologists, within their scholarly capacity, were already aware of these etymological commonalities embedded in the Yue linguistic continuum, postulated as proto-Daic, proto-Vietic, proto-Cantonese, proto-Fukienese, etc., with many lexical variants recorded in the monumental Kangxi Dictionary (康熙字典), offering insights into their origins long before Western linguistic constructs such as the Austroasiatic mainstream emerged.
When Austroasiatic theorists introduced the Mon-Khmer origin hypothesis for Vietnamese, they dismissed the pre-existing classification that had grouped Vietnamese alongside other Chinese dialects such as Cantonese and Fukienese all within the Sino-Tibetan family, which had evolved independently of state-sponsored linguistic intervention. By coining the term "Mon-Khmer linguistic sub-family" (MK) and embedding it within the broader "Austroasiatic family" (AA), Western linguists effectively sidelined Chinese scholars from advancing Sino-centric theories, despite the wealth of evidence preserved in ancient Chinese rhyme books. Their maneuver, as noted by Lunbæk (1986), appears to have been a strategic detour around the complexities of Sinology, seemingly deemed to be an attempt to bypass the steep learning curve required to engage with Chinese phonological traditions.
This theoretical displacement obscured the historical role of ancient Yue languages in shaping Chinese lexicons, as well, just as neighboring Mon-Khmer languages influenced Annamese. Many of these Yue-derived forms are buried in Chinese classics and cataloged in the Kangxi Dictionary, e.g., 簍 (lóu) for 'rỗ' (basket), possibly a doublet of 籮 (luó), or 帔 (pèi) and 襣 (bì) for 'váy' (skirt).
Ancient Chinese rhyme books, compiled by native philologists, remained underappreciated in Western linguistics until the early 20th century, when scholars such as Haudricourt, Karlgren, Forest, and Maspéro began exploring Chinese historical linguistics. These pioneers recognized the role of Annamese in preserving phonological features of Ancient Chinese. Indeed, Sinitic-Vietnamese phonological values have been instrumental in reconstructing Old Chinese, paralleling the evolutionary trajectories of Cantonese and Fukienese. These features, when examined through the lens of Sinitic-Vietnamese etymology, point unmistakably to Yue origins. (Wang, Li. 王力. 1948.)
As previously noted, one reason Austroasiatic theorists may have opted to construct a new framework was the convenience of starting afresh–rather than acknowledging the longstanding theorization of Yue roots. Estranged from the world of Chinese historical linguistics, intentionally or not, they overlooked records where Yue languages were clearly identifiable. Regardless of the Mon-Khmer parallels, Vietnamese, Cantonese, and Fukienese undeniably share a common Yue ancestry rooted in ancient China South.
Elements of historical Yue languages once spoken throughout China South are also embedded in other major Chinese lects, including northern Mandarin and southern Wu. Yet Vietnamese – despite its unmistakable Sinitic dominance, absent only the square-script orthography – was reclassified under the Austroasiatic Mon-Khmer hypothesis, a reassignment that does not, however, negate the Sino-Tibetan classification already applied to its dialectal counterparts across China South.
Moreover, there are undeniable cognates across many Sino-Tibetan etymologies. This paper will continue to address these ancestral roots and clarify their linguistic affiliations by examining basic words in Vietnamese that align with Sino-Tibetan etymologies. To expedite this process, certain linguistic premises will be assumed as known to journeyman linguists, such as standard sound change rules: 蒜 suàn (SV toán) ~ VS 'tỏi' (garlic), 鮮 xiān (SV tiên) ~ VS 'tươi' (fresh), or 團圓 tuányuán (SV đoànviên) ~ VS 'sumvầy' ('union'), without further elaboration on conditions like [s-, x- ~ t-] for the former lexemes, [-n ~ -i] or [t- ~ s-], [y- ~ v-] for the latter word, etc.
Readers will later observe that the same rationale used to justify cognacy between Vietnamese and Austroasiatic languages applies equally to etyma found in Sino-Tibetan languages. The validity of these connections stands on equal footing. That is to say, if certain Vietnamese words are accepted as cognate with Mon-Khmer forms, the same principle applies to their Sino-Tibetan counterparts–depending only on which theoretical framework was adopted first.
In fact, the extent of cognacy is striking. As will be detailed in the What Makes Chinese So Vietnamese: Sino-Tibetan etymologies, over 420 core items have been identified, each bolted down with fundamental lexical evidence. These shared etyma span essential semantic categories – body parts, kinship terms, natural elements, numerals, and basic verbs – that permeate not only the languages of China South and Southeast Asia, but extend into East Asia as well. Their distribution suggests a deeper, regionally integrated linguistic heritage that transcends the boundaries imposed by modern classification schemes.
If the sole criterion for assigning a language to the Sino-Tibetan family is the presence of similar etyma without accounting for broader linguistic features then the Sinitic-Vietnamese lexicon alone would suffice to classify Vietnamese as a Sino-Tibetan language. Its intrinsic Sinicized features, structurally and semantically, mirror those of other recognized members such as Cantonese and Fukienese. By the same logic, the Austroasiatic argument that Vietnamese shares a parallel relationship with Mon-Khmer languages that is based primarily only on basic lexical items rests on an equally reductive axiom. This observation reiterates a point previously discussed: theoretical classification often hinges more on initial framing than on comprehensive linguistic evidence.
Anthropologically, aside from Haudricourt’s influential model of tonogenesis, the Austroasiatic camp has yet to produce historical records detailing how Khmer linguistic elements could have evolved into Vietnamese forms under its proposed framework. Culturally, Austroasiatic specialists have tended to marginalize Mon-Khmer entities from the broader narrative of Annam's cultural synthesis, a synthesis deeply infused with Confucianism, Taoism, and Buddhism. These traditions, foundational to Vietnamese identity, are characteristic of Sinitic-centric languages and notably absent from neighboring Mon-Khmer linguistic environments.
Historical records indicate that Annam’s national development began its contact with the Champa Kingdom, which is an Indianized polity that later adopted Islam, during the Eastern Han Dynasty, as documented in Chinese annals. Champa, located south of Annam’s border, occupied the territory once held by its precursor state Lâmấp (Linyi 林邑, c. 197-750 A.D.). The Champa Kingdom endured from the 8th to the early 18th centuries, serving as a chronological buffer between Annam in the north and Khmer polities in the south–both before and after Annam’s emergence as a sovereign state in 939.
In a landmark study, Michel Ferlus (2012) proposes that Annamese may have radiated westward — rom northeastern Annam toward southwestern zones and even into India. This diffusion, he argues, explains the presence of Annamese-derived loanwords in Mon-Khmer and Munda languages, likely transmitted via trade corridors active between the 3rd and 8th centuries.
Table 4 - Ferlus’s Westward hypothesis
| Period | Region of Origin | Trade Corridor | Linguistic Trail | Cultural Implication |
|---|---|---|---|---|
| 3rd–8th century CE | Northeastern Annam | Red River → India | Annamese → Mon–Khmer → Munda | Loanwords embedded in Austroasiatic lexicons |
| Quote | — | — | “Annamese may have radiated westward…” – Ferlus (2012) | Interdisciplinary insight into linguistic diffusion |
Ferlus's hypothesis underscores the power of interdisciplinary inquiry, especially historical analysis, which remains conspicuously absent from the Austroasiatic Mon–Khmer framework. (1)
To set the record straight, whether one subscribes to the Austroasiatic theory or not, it matters little whether the proto-Vietic languages in ancient times originated from the Yue stock or the Austroasiatic Mon-Khmer family, as posited by Western-trained linguists (2). What truly matters is the holistic composition of Vietnamese as a living language–an organic totality in which all attributes function together as an integrated system. To better grasp this point, consider English as a parallel case. Historically, English has absorbed foreign elements with remarkable openness, layering them atop a relatively modest native core. When we examine modern English, we do not isolate its Anglo-Saxon, Welsh, Scots, Gothic, or Germanic foundations from its Norman, Romance, and Greek influences. Instead, we recognize English as a unified linguistic entity–one shaped by both native and foreign contributions.
By analogy, Vietnamese must be approached with the same comprehensive lens as described above. Its linguistic truth is not confined to any single ancestral lineage, but revealed through the way the entire system presents itself in its modern form: a dynamic, multifaceted language shaped by centuries of cultural and lexical convergence.
In analyzing the traits of a living language, we must also consider the racial composition of its speakers, many of whom may not speak the indigenous languages of the regions they inhabit. In Vietnam, the 3,260-kilometer S-shaped geopolitical map was formed incrementally over two millennia. The Kinh majority emerged through intermarriage with local populations such as Chams, Mon, Khmer, Chinese (especially Teochew), among others, but did not adopt the local languages of the annexed territories. This is evident in the Vietnamese–Khmer dynamic, where Khmer roots were imposed upon ancient Annamese populations, yet the Vietnamese language remained distinct. Vietnamese has never been a "pure" language; etymologically, it is a hybrid Sinitic language.
Globally, there are well-known cases of creole or outlander languages becoming mother tongues. Haitian-French, spoken by descendants of Amazonian slaves, and English, spoken by Jamaicans, Bahamians, and other Afro-Caribbean communities, are examples where linguistic affiliation does not align with genetic ancestry. Analogously, although Vietnamese absorbed Mon-Khmer basic words through contact with local groups – Thai blanc and noir, Daic, Zhuang, Chamic, Mon, Khmer – the Khmer language spoken by modern Cambodians had no linguistic affiliation with ancient Annamese prior to the 10th century. The aboriginal language spoken by the Trưng Sisters and their contemporaries' uprisings against the Han's Chinese 2,000 years ago was certainly distinct from modern Vietnamese as spoken in Hanoi today. Thus, the racial-linguistic foundation of Vietnamese was uprooted in prehistoric times, and Mon-Khmer contributions remain limited to loanwords.
Regarding its hybrid nature, Vietnamese is structurally a composite language built on a phonological model of {(C)+V+(C)} sequences. It blends a Yue core with later Sinitic overlays, which themselves may contain earlier Taic substrata, ancestral to the mother tongues of China South and beyond, including languages once spoken by the Chu populace. Over the past 800 years, following Annam’s annexation of neighboring states, Vietnamese also absorbed Chamic and Mon-Khmer lexicons, though these contribute only marginally to its etymological inventory. Many of these are basic words, which Austroasiatic specialists have emphasized in formulating the Mon-Khmer hypothesis.
In fact, attempts by Austroasiatic theorists to align Mon-Khmer languages with Vietnamese have focused on elaborating basic-word cognates. However, this approach is necessary but insufficient, though.
Grouping languages based solely on basic vocabulary overlooks deeper structural and historical factors. At best, such languages may be categorized as distant affiliates from a prehistoric past, shaped by early contact through trade and barter.
VII) Sinitic–Vietnamese convergence: cognates, tonality, and historical depth
Over the last two millennia, Chinese components have merged into Vietnamese with documented historical continuity. Therefore, linguistic classification must consider not only basic vocabulary but also the unique traits embedded in each word–its phonological DNA, tonal system, syllabicity, and morphosyntactic structure, so to speak. These features are so distinctive that few Austroasiatic elements match Vietnamese across all dimensions. Such intrinsic commonalities are absent even in most Sino-Tibetan languages, e.g., Tibetan vs. Chinese, let alone in Austroasiatic Mon-Khmer, because, it is obvious that language changes.
Terminologically, the term "Sinitic-Yue" is grounded in the historical concept of "Yue" (越), as recorded in Chinese annals. The term "Viet" is avoided here due to its potential misnomer status and phonetic ambiguity, despite its resemblance to ancient pronunciations like /wjat/ or /jyet/, as in Cantonese 粵 /jyut6/. "Yue", like "Sinitic", is adopted for its academic precision, representing all descendants of the ancient Yue. This designation helps counter arguments that Vietnamese and Chinese share linguistic features such as tones, syllabic segments, etc., only due to superficial proximity. In fact, the term "Yue" has long been used to classify Southern Chinese dialects with shared origins.
Under the premise that "Sinitic-Yue" constitutes a legitimate linguistic entity, this paper demonstrates that Vietnamese is structurally and functionally akin to a Chinese dialect. Its etymology, tonal system, phonology, syllabicity, lexical stems, morphemic suffixes, grammatical markers, classifiers, particles, and instrumental prepositions all align with Sinitic norms.
In addition, Vietnamese morpho-syllabic stems independently generate vast localized vocabulary, e.g., 訂婚 dìnghūn ~ 'đámhỏi' (marital engagement), 嫁娶證 jiàqǔzhèng ~ 'giấygiáthú' (marriage certificate), etc.. These linguistic traits are interchangeable with those of Cantonese and Fukienese, and together they form an integral whole. Vietnamese tentatively for now can be grouped with southern Chinese dialects within the Sino-Tibetan family before getting into details.
With this recognition, the author proposes a distinct linguistic class: the Sinitic-Yue branch, as outlined in the previous chapter. This branch cascades within the Sinitic sub-family and stands on par with other Yue-rooted languages, including Cantonese and Fukienese, which may be classified simultaneously under both Sinitic and Yue.
Speculatively, this concept may be extended to encompass the linguistic roots of indigenous languages spoken by ethnic groups descended from the Yue 越 (or 粵), known historically as BǎiYuè (百越 or SV BáchViệt ), or Namman (南蠻, "Southern Barbarians") in ancient Chinese records. These Yue descendants include the Zhuang (壯族, Tráng or Nùng in Vietnamese), the largest minority in China South today, as well as the Dai people of North Vietnam, Laos, and Thailand. Their racial stock has been innovatively classified as "Austro-Thai" by Benedict (1975), further supporting the interconnectedness of the Sinitic-Yue linguistic continuum.
For the terminology "the Austroasiatic linguistic family", in order for the Austroasiatic Mon-Khmer theorization to retain its merit within our comparative framework, it must be situated within historical contexts that accommodate both sides of the narrative, ours and theirs. The entire 'Austro-' perspective, a legacy from the previous century, has contributed to the convergence of the idea that ancient Yue speakers may have descended from a common ancestral population originating in Southeast Asia or even the southern hemisphere.
However, recent archaeological discoveries have introduced new complexities. Human skeletal remains dated to over 40,000 years ago–far older than the 10,000-year-old specimens found in southern Indonesia–have been unearthed just 50 kilometers southwest of present-day Beijing. This northern locus suggests the possibility that the earliest Asian populations may have originated in the northern sphere (3), thereby offering a potential link to what is referred to as 'proto-Tai' and its Taic affiliations with the arrival of nomadic proto-Tibetan groups from the southwestern corridor.
The term Taic (including proto-Daic and proto-Yue) refers to the indigenous racial stock that once inhabited the southern region of present-day China and may have diversified into numerous distinct ethnic groups. If the historical BaiYue (百越, or "One Hundred Yue Tribes" as commonly referenced) recorded in early Chinese annals are anthropologically the "Bod" accurate as postulated by Terrien De Lacouperie (The Languages of China Before the Chinese. 1887), they likely encompassed ancestral populations of the Zhuang and Daic peoples (泰族) known today.
These groups have also been associated with other native communities across China South, occupying a vast expanse below the Yangtze River since prehistoric times. Similar to the Austroasiatic hypothesis, this theorization is built primarily on analogy and inductive reasoning drawn from historical records. The region south of the river's lower basin includes Anhui and Hebei provinces, extending eastward to modern Jiangsu, where the Wu dialect is spoken, and further south through Hunan and Guangxi provinces, ultimately reaching the Red River Delta in northern Vietnam.
Figure 1 - Map of the ancient states in China
Source: Multiple sources in public domains on the internet
So far, we have not yet incorporated "the mystic foreign people in Bashu State (巴蜀, SV Bathục) of ancient Sichuan" into the broader racial framework. However, archaeological excavations have uncovered artifacts suggesting that these now-extinct populations once possessed a highly advanced civilization in the remote past. Unfortunately, there is no definitive evidence linking the Bashu people of southwestern Sichuan to other ethnic groups across China South, including the postulated proto-Taic populations.
While the Chu subjects cannot be directly identified with either the ancient Bashu or proto-Tibetan peoples, historical records indicate that the Chu populace was ethnically mixed, comprising ancient Taic elements, the forebears of both the Daic and Yue branches. These proto-Taic groups contributed to the formation of the Qin-Han populations around the 2nd century B.C. The proto-Taic lineage also served as the ancestral foundation for the Taic-speaking subjects of Chu 楚 and the Yue populations of Zhou 周, Wu 吳, and Yue 越. These groups eventually intermingled with the earlier inhabitants of the Qin State 秦國 (778–207 B.C.), forming the ethnocultural basis of what would become the Chinese people. (4)
In other words, ethnologically, the proto-Taic people had already diverged into smaller branches during prehistoric times, evolving into distinct ethnic groups before the Qin's invasion. As a matter of fact, each emerging tribal lineage would later govern one of the seven Yue polities, as recorded in Chinese historical sources spanning at least two millennia prior to the unification of the Qin Empire under Qin Shihuang (秦始皇), the first emperor of a unified "China," and subsequently, the Han Dynasty.
Following Qin's total victory and the consolidation of power, the subjects of the six other states in the eastern part of what is now China – namely Chu 楚, Zhao 趙, Qi 齊, Jin 晉, Yan 燕, and Han 韓, which had previously existed as vassal states under the Eastern Zhou (403–221 B.C.) – were absorbed into the Qin polity. These populations, direct descendants of "the proto-Chinese" {XYZ} who had migrated from the southwest, formed the demographic base of the Qin State 秦國 (221-206 B.C.). They subsequently merged with the Taic-descended populations of the Chu State to establish the Han Dynasty (漢朝, 206 B.C.-220 A.D.).
Figure 3 - Map of the historical ancient Yue states
Source: Multiple sources in public domains on the internet
The process Chinese immigrants from mainland of China to Vietnam happened in the same fashion that repeated to both indigenes with the biometrics {2YMK} and emigrants {4Y6Z8H} who had previously lived or already long resettled in the northern part of today's Vietnam around the Red River Delta Basin before foot soldiers of the Han Empire – that also consisted of Yue populace from those states that fell under the umbrella of Qin Dynasty in the earlier period – came to invade the ancient Vietnamese northern piece of land. All at the same time, war-savaged immigrants followed them and altogether they as the new settlers who mixed up with the locals and made up the population of the ancient Annam.
In other words, all subjects of the Qin – biometrically postulated as "the Early Chinese" {X2Y3Z4H} – were fused within the 'racial melting pot' of the first unified empire, blending with the populations of the six other conquered states to form what came to be known as "the Chinese" {X4Y6Z8H}, a designation later adopted by the West to refer to "China." This newly consolidated entity became the "united states of the Middle Kingdom" (中華 Zhonghua) following the rise of the Han Dynasty, which ruled for the next 406 years and laid the foundation for all successive dynasties thereafter.
During the course of Chinese territorial expansion, many early native Yue groups – most notably the Zhuang, Dong, Yao, and Miao (known as Mèo in Vietnam and Hmong in Laos) – resisted the assimilative pressures of Han cultural integration (Sinicization) and retreated into mountainous enclaves. Over time, descendants of those who remained in isolation but refused collaboration with Han authorities were gradually displaced, migrating southward out of China South into Giaochỉ 交趾 (Jiaozhi; later renamed 交州 Jiaozhou or Giaochâu, the region historically known as Annam 安南). These migrants were later intermixed with successive waves of racially diverse immigrants arriving from the north.
Upon reaching the outer frontier, Han conquerors and colonists, initially sojourners, were often compelled to settle permanently in these territories to fulfill imperial China's 'national policy' which was still being enforced as late as 787 during the Tang Dynasty (Bo Yang, Zizhi Tongjian, Vol. 56, p. 83). These new settlers inevitably intermarried with local populations {2YMK}, either due to limited availability of Han women or through integration with other waves of mixed-stock Han immigrants {X4Y6Z8H} from both China North (華北 Huabei) and China South (華南 Huanan), who followed the military expeditions into the frontier prefecture of Annam.
From this convergence emerged the local "Kinh" {4Y6Z8HMK} people, also known in modern Chinese as Jing 京 ethnicity, i.e., Vietnamese. This migratory and integrative pattern continued over the next two millennia, extending into the present day with over one million Chinese immigrants since 1995. For example, new Chinatowns have proliferated around industrial zones across Vietnam in the past two decades. The descendants of these settlers multiplied over generations, gradually becoming part of the national population and forming the majority of the Kinh demographic in contemporary Vietnam.
The following table is designed to accommodate Austroasiatic without compromise the prominent position of the Yue polities with their very own Yue lects, on par with the former entities.
Table 5 - Outline of the isoglottal languages in China South
-
1.0 Taic Languages
-
1.1 Austroasiatic Linguistic Family
- 1.1.1 Mon-Khmer Languages
-
1.2 Yue Languages
- 1.2.1 Zhuang Language
- 1.2.2 Daic Language
- 1.2.3 Miao Languages
- 1.2.4 Maonan Language
-
1.2.5 VietMuong Languages
- 1.2.5.1 Muong Dialects
- 1.2.5.2 Vietic Language
- 1.2.6 Proto-Cantonese (NanYue)
- 1.2.7 Proto-Fukienese (MinYue)
- 1.2.8 ... etc.
- 1.3 Proto-Sinitic Languages
-
1.4 Sinitic-Yue Languages
- 1.4.1 Ancient Annamese
- 1.4.2 Sinitic-Vietnamese
- 1.4.3 Vietnamese
- 1.4.4 ... etc.
-
1.1 Austroasiatic Linguistic Family
-
2.0 Sino-Tibetan Linguistic Family
-
2.1 Archaic Chinese
- 2.1.1 Old Chinese
-
2.2 Ancient Chinese
- 2.2.1 Chinese Dialects (Fukienese, Wu Dialects, etc.)
- 2.3 Early Middle Chinese
-
2.4 Middle Chinese
- 2.4.1 Cantonese Dialects
- 2.4.2 Sino-Vietnamese
- 2.4.3 ... etc.
- 2.5 Early Mandarin
-
2.6 Mandarin
- 2.6.1 Northwestern Mandarin
- 2.6.2 Putonghua
- 2.6.3 Northeastern Mandarin
- 2.6.4 Southwestern Mandarin
- 2.6.5 ... etc.
-
2.7 Cantonese
- 2.7.1 Guangzhou Dialect
- 2.7.2 Taishan Dialect
-
2.8 Fukienese
- 2.8.1 Xiamen Dialect
- 2.8.2 Hainanese Dialect
- 2.8.3 Chaozhou Dialect
-
2.9 Wu Dialects
- 2.9.1 Wenzhou Dialect
- 2.9.2 Shanghainese Dialect
- 2.10 ... etc.
-
2.1 Archaic Chinese
Under such positional circumstances, languages in "the Austroasiatic linguistic family" {1.1} (a anthropological value for symbolistically weighed hierarchy) had been formed out of Taic languages {1.0} some 6,000 years ago, long before the emergence of the Western Zhou (西周) Dynasty. In other words, they all had been stemmed from an ancestral proto-Taic linguistic form {1.0} supposedly spoken by the so-called "larger Taic indigenous people" and finally evolved themselves into linguistic forms of the Yue {1.2}, including those speeches currently spoken by the Zhuang, the Dai, the Miao, the Maonan, the Vietmuong, etc. {1.2.1, 1.2.2, 1.2.3, etc.}, while other branches had diverged into other Mon-Khmer languages included in what is now universally named as "the Austroasiatic linguistic family" {1.1.1, 1.1.2, 1.1.3, etc.}.
During the reigns of the Zhou kings, Taic glosses {1.0} had also found their way into, intertwined and interpolated, and merged with the Archaic Chinese (ArC) {2.1} and Old Chinese (OC) {2.1.1}, including Ancient Chinese (AC) {2.2} of the Later-Han, since its break-off from the Sino-Tibetan route {2.0} and evolved itself independently (see Brodrick 1942, Norman 1988, Wiens 1967, FitzGerald 1972). (cf. Tibetan and Sinitic linguistic cluster as opposed to Mon-Khmer and VietMuong cluster) Variants of this early form of OC {2.1.1, 2.1.2, 2.1.3, etc.} later were brought by the 'Han' foot soldiers and emigrants to have gone south all the way to Annamese land ("Tonkin") and then blended well gradually with the Vietic language {1.2.5.2} after it had separated from the Viet-Muong group.
Symbolistically, in a broader sense, on the one hand, Austroasiatic languages (1.1} may have the same footing with properties overlapped inclusively or even mean the same thing as the Yue languages {1.2}, which is covered under the Taic stage {1.0} before the emergence of the historical Yue {1.2} language. The implication of the concept of the historical 'Yue' is that it does not include Vietnamese as having had a direct genetic affinity with the Mon-Khmer sub-family that is what the Austroasiatic hypothesis is all about {1.1); therefore, the concept of "Austroasiatic" is engrossed in a 'union' with "the Yue languages". What is known as the Austroasiatic linguistic family was postulated by its theorists as an ancestral form of the Mon-Khmer languages that gave birth to the proto-Vietmuong and the later Vietnamese as commonly referred to by modern linguists. In other word, they all are descendant languages under the larger ancestral Austroasiatic linguistic family and the Vietnamese language was descended directly from the Mon-Khmer linguistic branch. That is misleading.
In a broader sense, if we begin with the premise that the Vietnamese language originated from the Taic family, the same framework would logically extend to other linguistic groups such as Zhuang, Daic, Miao, etc. {1.2.1, 1.2.2, 1.2.3...}. Within this view, it may be postulated that the Vietmuong sub-family diverged from the Yue mainstream several centuries ago, giving rise to the Vietic language, evidenced by residual Muong linguistic features embedded in early Annamese. Such a postulation would conveniently position the Viet-Muong group under the Austroasiatic linguistic umbrella {1.1}, placing it on par with other Mon-Khmer daughter languages {1.1.1, 1.1.2, 1.1.3, etc.}, including Vietnamese. This alignment would help account for the commonly cited Mon-Khmer basic words, while simultaneously preserving the etymological continuity of ancestral Yue forms {1.2} within the Taic languages mainstream {1.0}.
This approach is further justified by the relative ease of identifying commonalities between Vietnamese and Daic-Kadai languages, as opposed to the more distant Munda languages of India (see Henri Maspero's "Les Langues Mounda" in Les Langues du Monde, 1952, pp. 624–25).
As for the cognateness of basic vocabulary between Vietnamese and Mon-Khmer
languages, their etymological connections do not align diachronically with
the Sinitic synchronizing patterns under examination here. From a developmental standpoint, any Chinese lexical traces found in
Mon-Khmer languages, if present at all, are relatively recent, likely
introduced within the last 300 to 800 years, and plausibly transmitted via
trade routes through North Vietnam. Per Ferlus,
"By the period of 3rd-8th centuries, an ancient land trade route linked North Vietnam to the Gulf of Thailand. The circulation of traders and travelers along this route has left cultural and linguistic influences of Ancient China as well as Ancient Vietnam (under Chinese rule) through the Khmer area. (1) Some Chinese words, few but highly significant, were borrowed into Khmer, and later passed in Thai, (2) The names of animals of the duodenary cycle in Ancient Vietnamese were borrowed by the Khmer and are still used today, and (3) The syllabic contrast /Tense ~ Lax/ of Middle Chinese was transferred, with various effects, in Vietic, and thence in Katuic and Pearic." (Michel Ferlus, Linguistic evidence of the trans-peninsular trade route from North Vietnam to the Gulf of Thailand (3rd-8th centuries). 2012.)
Those loanwords from the Muong and Vietnamese in a contemporary setting might find their way into Mon-Khmer wordlist as cited in the What Makes Chinese So Vietnamese - Chapter 9 on the Mon-Khmer etymologies.
In the case of Vietnamese and its early formation, the historical backdrop begins with the loss of resistance wars against Chinese incursions. The freedom fighters followed native Muong groups who resisted Han rule and retreated into mountainous and remote southern territories. The ancient indigenous Vietmuong languages, originally spoken by Yue natives inhabiting regions from China South to the Red River Basin in northern Annam, eventually bifurcated into proto-Vietic and proto-Muong branches as a result.
Such a proposition could help clarify why the Muong language appears to contain more lexicons closer to Mon-Khmer besides what its speakers shared with Vietnamese. There is no contradiction in this triangular interconnection if we consider a parallel phenomenon between Chinese and Tibetan linguistic structures. Despite their genetic affinity, these are two distinct languages, especially in terms of core lexical inventories and grammatical architecture.
Meanwhile, the lingua franca of those who remained in lowland and coastal settlements and cooperated with Han occupiers underwent a process of fusion with evolving forms of Ancient Chinese. These Chinese variants already contained Taic-Yue lexical admixtures, inherited from the Chu State and later from the Yue of the NamViet polity, as previously discussed. This fusion was further reinforced by the arrival of Han colonists and successive waves of emigrants from China South to ancient Annam following its annexation in 111 B.C.
Food for your future linguistics doctorate thesis: "Could it have been that Vietnamese is the result of 'pidginization' of some form of Chinese vernacular starting from the Han Dynasty?By that time, the early ancient Vietic language had already absorbed a substantial layer of Old Chinese vocabulary, particularly from vernacular Mandarin, and likely began to take shape following its divergence from the Vietmuong group. In effect, under the Han Empire, Vietnamese evolved around the Chinese nucleus. Ancient Chinese elements were adopted and repurposed as lexical raw material by early Vietnamese speakers to coin new terms in the emerging Vietic language, used both by native Annamese and later Chinese resettlers.
Han colonial agents, including administrative officials and stationed soldiers, were gradually assimilated into the local population through intermarriage with native women. Over centuries of sustained Sinicization, they also adopted local customs. This prolonged period of cultural fusion is reflected in the presence of Ancient Chinese etyma in Vietnamese, supporting the argument for deep lexical integration. Examples include: vuquy (于歸 yúguī, 'bridal nuptial'), goábụa (寡婦 guǎfù, 'widow'), trờinắng (太陽 tàiyáng, 'sunshine'), trăngrằm (月圓 yuèyuán, 'full moon'), cửasổ (窗戶 chuānghù, 'window'), xecộ (車子 chēzǐ, 'carriages'), among others. These terms were widely used and embedded in the speech of common people, nurturing the genesis of ancient Annamese. Their prevalence and semantic depth underscore the plausibility of cognacy between Ancient Chinese and Vietnamese basic vocabulary far exceeding, in both quantity and integration, what is found in other Austroasiatic Mon-Khmer languages.
Table 7 - Sinitic–Vietnamese Convergence
| Layer | Feature type | Example pair(s) | Notes |
|---|---|---|---|
| Phonological | Tonal parallels | nạ, má, mạ, mẹ, mợ, mụ | Shared tone contour and syllable nucleus |
| Semantic | Kinship terms | mợ, mái, trống, cồ, gàmẹ, cậumợ | Semantic layering across Yue lects |
| Lexical | Body parts, verbs | trốc, đầu, mắt, chân, ăn, nói, ngủ | Cognates with Cantonese and Fukienese |
| Cultural | Folk usage and idioms | gàmẹ, gàmái, gàtrống, gàcồ (in compound kinship sets) | Embedded in oral tradition and daily speech |
The newly emerged Annamese was characteristically unique, shaped by the habitual speech patterns of its speakers, particularly what may be termed Yue grammar, typified by the syntactic structure {modified + modifier}. This pattern parallels the grammar of Zhuang and other Daic languages and stands in contrast to the syntactic models of Munda or Mon-Khmer languages (see Henri Maspero, Les Langues du Monde, 1952).
Vietic speech was further enriched and reshaped by descendants of racially mixed Yue-Han immigrants who arrived in successive waves, as recorded in Chinese historical sources during the period when Annam–then Giaochỉ County (交趾郡) of the Greater Giaochâu Prefecture (交州) remained under Han rule. Their language reflected a blend of earlier Vietic forms and various Han dialectal pronunciations from different periods. Lexical residues such as Bụt = 佛 Fó (Buddha), bụa = 婦 fù (wife), khơi = 海 hǎi (sea), buồng = 房 fáng (room), giường = 床 chuáng (bed), tủ = 櫝 dú (bedhead case), đũa = 箸 zhú (chopsticks), thìa = 匙 chí (spoon), etc., are foundational Vietnamese words, indisputably plausible cognates with their Chinese counterparts across time. (See Wang Li's 安南譯語 'Annamese translated glosses', Bùi Khánh Thế in What Makes Chinese So Vietnamese - Appendix I, Nguyễn Tài Cẩn's Nguồn gốc Hình thành Cách đọc Âm Hán-Việt ('Origin of the formation of Sino-Vietnamese pronunciation'), and What Makes Chinese So Vietnamese - Appendix H)
While the question of whether the Mon-Khmer affinity of Vietnamese is valid remains open to debate, the primary rationale here is to challenge the Austroasiatic theory that posits a Mon-Khmer origin for the Vietnamese language. What this inquiry truly centers on is the cognateness of fundamental Vietnamese vocabulary items that appear across various Mon-Khmer languages. Such an argument inevitably raises questions like "who borrowed what from whom?" and similar lines of inquiry.
Amusingly, a significant portion of these same etyma also turn out to be cognate with Chinese and other Sino-Tibetan languages. These are what we designate as Sinitic-Vietnamese vocabularies. The phenomenon is largely attributed to centuries of trade and migratory contact, as suggested by Ferlus (2012), extending well into the late 18th century.
Nonetheless, there are no definitive historical records documenting these affiliations – only a limited set of basic words preserved under the umbrella of prehistoric rhetoric. The new findings presented in this research are intended to remain open for further scholarly discussion and investigation.
VIII) Reframing Vietnamese origins: A Sinitic–Yue resolution
The Austroasiatic Mon–Khmer hypothesis – once heralded as a definitive classification for Vietnamese, now stands as a provisional construct – one shaped more by methodological convenience than historical depth. Its reliance on basic-word lists, speculative reconstructions, and Indo-European analogies has obscured the layered etymological reality embedded in Vietnamese.
This survey reclaims the Sinitic–Yue lineage as a foundational stratum, one that predates Austroasiatic overlays and aligns with documented cognacy across Vietnamese, Cantonese, Fukienese, and other Yue-rooted lects. By foregrounding historical records, tonal evolution, and polysyllabic etymologies, the Sinitic–Vietnamese framework restores linguistic agency to the Yue continuum.
Ultimately, the truth of Vietnamese lies not in a single ancestral label, but in its composite structure: a living language shaped by centuries of convergence, migration, and cultural synthesis. To classify it reductively is to miss the point. To study it historically is to begin anew.
References
Benedict, Paul K. Austro-Thai Language and Culture, with a Glossary of Roots. New Haven: HRAF Press, 1975.
Cohen, Marcel, and Antoine Meillet. Les Langues du Monde. Paris: CNRS, 1952.
De Lacouperie, Terrien. The Languages of China Before the Chinese. London: Trübner & Co., 1887. Reprint 1965.
Ferlus, Michel. “Origine de la tonalité dans les langues môn-khmer.” Mon-Khmer Studies 41 (2012): 1–12.
Haudricourt, André-Georges. “De l’origine des tons en vietnamien.” Journal Asiatique 242 (1954): 69–82.
Karlgren, Bernhard. Grammata Serica Recensa. Stockholm: Museum of Far Eastern Antiquities, 1957.
Knud Lunbæk. T.S. Bayer (1694–1738), Pioneer Sinologist. Copenhagen: The Royal Library, 1986.
Maspero, Henri. “Les Langues austroasiatiques.” In Les Langues du Monde, edited by Antoine Meillet and Marcel Cohen, 582–583. Paris: CNRS, 1952.
Nguyễn Ngọc San. Tìm hiểu về Tiếng Việt Lịch sử. Hà Nội: Nhà xuất bản Giáo dục, 1993.
Norman, Jerry. Chinese. Cambridge: Cambridge University Press, 1988.
Shafer, Robert. Introduction to Sino-Tibetan. Wiesbaden: Otto Harrassowitz, 1966–1974.
Wang Li (王力). Hànyǔ shǐgǎo (汉语史稿) [Draft History of the Chinese Language]. Beijing: Zhonghua Shuju, 1948.
FOOTNOTES
(1)^ Michel Ferlus, Linguistic evidence of the trans-peninsular trade route from North Vietnam to the Gulf of Thailand (3rd-8th centuries). 2012.
(2)^ When one sees there are Mon-Khmer elements in Vietnamese, it is easier to say that Vietnamese originated from the Mon-Khmer linguistic family whether initially it originated from the same root as those of Mon-Khmer languages or not. However, most of the specialists of Vietnamese prefer the other and this is where all the debates started even though one could still say Vietnamese loanwords exist in other Mon-Khmer languages (Ferlus, 2012). See more in What Makes Chinese So Vietnamese - Chapter 8 (The Mon-Khmer Association).
(3)^ The Tianyuan specimen, a partial human skeleton, was unearthed with
abundant late Pleistocene faunal remains in 2003 in the Tianyuan Cave near the
Zhoukoudian site in northern China, about 50 km southwest of Beijing. The
skeleton was radiocar
bon-dated to 34,430 ± 510 years before present (BP)
(uncalibrated), which corresponds to ~40,000 calendar years BP. A
morphological analysis of the skeleton confirms initial assessments that this
individual is a modern human, but suggests that it carries some archaic traits
that could indicate gene flow from earlier hominin forms. The Tianyuan
skeleton is thus one of a small number of early modern humans more than 30,000
years old discovered across Eurasia and an even smaller number known from East
Asia.
The Tianyuan skeleton, unearthed with abundant late Pleistocene faunal remains in 2003 in the Tianyuan Cave near the Zhoukoudian site in northern China, about 50 km southwest of Beijing. (Image by GAO Xing)
Source: DNA Analyses Show Early Modern Human 40000 Years Ago in Beijing Area Related to Present-Day Asians and Native Americans (posted 1/27/2013 Chinese Academy of Sciences)
(4)^ As we have learned, at the fall of the Qin Dynasty, Liu Bang (劉邦) and his generals of the resurrected Chu State (楚國) were exiled from into the region of Hanzhong (漢中); hence, the name of the Han Dynasty (漢) was picked by Liu Bang after he and his troops defeated General Xiang Yu (項羽) of the Chu State.





