Executive Summary
- The Zen of Sinitic-Vietnamese
-
On the One-size-fits-all Conspiracy
The Austroasiatic Mon-Khmer hypothesis is critiqued for its methodological rigidity and lack of historical grounding. Western theorists often imposed Indo-European frameworks onto Southeast Asian languages without fluency in Vietnamese or Chinese. The theory’s reliance on basic-word lists and speculative reconstructions has led to circular reasoning and misclassification. Vietnamese etymology, when examined through analogical and dissyllabic methods, reveals stronger ties to Sinitic-Yue than to Mon-Khmer. Examples include mẹ, mợ, mái, gàmẹ, gàmái, and cậumợ, all traceable to Old Chinese 母 mǔ and its derivatives.
-
On the Relativity of Historical Phonology and the Limits of
Reconstruction
Vietnamese etymology must be approached meditatively, akin to Zen practice, tracing phonetic resonance with historical depth. The term "Sinitic-Vietnamese" reflects layered linguistic convergence between Vietnamese and Chinese, not genetic descent. Ancient Yue entities predate the Qin state, and many Vietnamese etyma—such as cộ, sọ, răng, chua, heo, lợn—align with archaic Sinitic forms documented in classical Chinese texts. These connections, often overlooked by Austroasiatic theorists, are supported by phonetic keys like 讀若, 反切, and 形聲. The Sinitic Vietnamese lexicon reveals deep cognacy with Sinitic-Yue roots, distinct from Mon-Khmer vocabulary.
x X x
Sinitic-Vietnamese (VS) is a linguistic concept that describes the affiliation between Vietnamese and Chinese across multiple dimensions. The term "Sinitic" should not be misunderstood as referring to ancestral origins. It primarily reflects geopolitical and cultural associations with the Chinese mainland. Chinese is not a race but a cultural construct. Just as the label "Mon-Khmer" has gained prominence due to regional influence, "Sinitic" has been elevated because of the historical stature of China, not because the Chinese language gave birth to Vietnamese. In fact, Yue entities existed long before the emergence of the Qin state, which later became the symbolic foundation of what is now called China.
Unlike Vietnam, China does not observe national holidays commemorating founding forefathers or ancestral heroes. Its emperors, including those who established the Han, Tang, Yuan, Song, Ming, and Qing dynasties, are not formally celebrated in this way. The geopolitical map of China is instead a record of historical events that unfolded across millennia.
If one were to divide China's territory into segments based on major historical stages and trace them back to around 3000 B.C., it becomes clear that no entity called China existed at that time. Neither Qin, Han, nor any recognizable Chinese identity had yet formed. The concept of Qin, and by extension "Sinitic", was selected by modern scholars. Had the Chu state defeated the Han in 208 B.C., the linguistic label might have been "Chuic" rather than "Sinitic" — and as a matter of fact, Liu Bang who established the Han Dynasty was a subject of Chu polity.
At a broader level, "Sino-" functions as an anthropological gateway into various branches of Sinology, including historical linguistics. Unlike "Sinitic", which is often associated with the seven major Chinese dialects regardless of genetic affiliation, "Sino-" refers to the larger Sino-Tibetan (ST) family. This includes not only Sinitic languages but also Bodic or Tibetan languages. Conceptually, "Sinitic" is more closely tied to Chinese cultural identity, while "Sino-" is genetically affiliated with ancient Bodic languages. This distinction is similar to how "Indo-" functions in Indo-European studies, whereas "Sanskrit" represents a specific ancient language from prehistoric India. Though Sanskrit is not the progenitor of all Indo-European languages, it plays a central role in their reconstruction. Similarly, "Sino-" and "Sinitic" relate to Sino-Tibetan and Sino-Vietnamese (SV), as well as Sinitic languages and Sinitic-Vietnamese (VS).
Sinitic is one link in the broader chain of Sinology, encompassing various Chinese dialects classified under the Sinitic branch of the Sino-Tibetan family. This is the extent to which "Sino-" and "Sinitic" inform the classification of Sinitic-Vietnamese, which also includes Sino-Vietnamese.
In linguistic terms, "Sinitic" refers to a class of languages that may trace back to a single ancestral tongue predating the Qin Empire. Although the Qin Dynasty (221 to 206 B.C.) was short-lived, "Sinitic" remains the term used to credit its role in unifying the states of the Zhongguo, or Middle Kingdom. This union encompassed all dynasties and vassal states established in the Chinese mainland before and after the third century B.C., comparable to the European Union nowadays.
The term "Sino-" complements "Tibetan" in forming "Sino-Tibetan", a designation for a grand linguistic family that includes both Sinitic and Bodic languages. In a limited sense, "Sino-" and "Sinitic" are academic constructs tied to the political geography of China. This framework places Sinitic-Vietnamese within the broader linguistic family, even though Vietnamese itself originated as a Yue entity. Ideally, it should be classified as Yue-Sinitic. However, following established academic conventions, this paper adopts the term Sinitic-Yue, using "Yue" in place of "Viet", as has long been customary in scholarly discourse. This proposition forms the basis for the reclassification of Vietnamese historical linguistics discussed below.
I) The Zen of Sinitic-Vietnamese
Etymologically, research on the origin of any Vietnamese word should be conducted in an appreciative and meditative manner just as one would do in practicing Zen or Yoga, slowing down, calmly, tracing one's feelings on the sounds meditatively that happen to one's lips. To be truthful, it should be the same with Chinese as well. For example, 弩機 nújī (crossbow trigger) where 弩 nú=nổ=nỏ=ná (in the same rhyming table with 魚 yú=ngư=nga=ngá=cá and 機 jī=cơ=cò as specifically described in 說文 "Shuowen" by 許慎 Xu Shen of the Han Dynasty, not as originally of 'shuttle' of a weaving device as speculated by 段玉裁 of the Qing Dynasty (See Xu Zhongshu, 1934. pp. 425, 427, 441).
The contemporary Austroasiatic avant-gardes have focused much about on the genre of their basic words, minus tonality and the like, as compared to other prominent Sinitic linguistic attributes, all intrinsically. The Austroasiatic Mon-Khmer theorists have simply ignored what has originally been essentially in store on the Sinitic-Yue (or Sinitic-Vietic) side of the Chinese historical linguistics in several anthropological aspects. "Vietic" is a later term for a linguistic sub-family affiliated with ancient Vietnamese — as in Viet-Muong — for the concept of ancient Yue isoglosses that were nominally transcribed in Chinese records as 粵, 越, 戉, 鉞... that are now grouped into different Chinese Southern dialects such as Cantonese and Fukienese.
Bibliographically speaking, etymological evidences of the Yue etyma are actually buried deeply in ancient Chinese literary and historical books. The plausibility of proven cognates, indeed, are manifested in parts by phonetic keys, or clues, amply noted in numerous ancient classical materials with notations such as the "dúruò" 讀若 (read as... or 'pronunciation'), "fănqiè" 反切 (spelling), "xíngshēng" 形聲 (phonographic), etc. in traditional Chinese linguistics. For instance, VS "cộ" /ko6/ (carriage) is a [轂 gǔ] variant of an archaic sound of 車 chē (carriage) as shown in HòuHànshū (後漢書 'Books on the Later Han') "dúruò" (is pronounced as) 居 jū (SV cư), which is in turn evidenced by the phonetic sign of 古 gǔ (SV cổ, OC *ku). Many old characters as etyma lexicographically are listed in the Kangxi Dictionary 康熙字典 with over 50,000 out of more than the possible 70,000 single Chinese glyphs ever recorded with quotations and notations derived from variant dialectal forms and keys to pronunciations. Anybody who uses the Kangxi dictionary — now of course conveniently accessible online — can find numerous interesting examples therein, including thousands of ancient ideographs.
In this Sinitic etymological realm alone, as the time goes on more of those
obsolete etyma have been gradually identified, tagged, analyzed, and
reconstructed. That kind of work appears to be a painstaking task, though,
like sieving tiny bits of gold dust from ground rocky sandy grains in streams
of Chinese classics for every single etymon that can or cannot be identified
with modern set of common lexicons, for example,
- "sọ" (head) 首 shǒu (SV thủ),
- "răng" (tooth) 齡 líng (SV linh),
- "ngọt" (sweet) 𩜌 yuē (SV ngạt),
- "chua" (sour) 酸 suān (SV toan),
- "rát" (sore) 熱 rè (SV nhiệt),
- "heo" (pig) 亥 hài (SV hợi),
- "lợn" (pig) 腞 dùn [ or 豘 tún ] (SV độn), etc.
or those highly plausible cognates, such as
Austroasiatic theorists have overlooked these connections. It is possible that they remain unaware of the linguistic traces linking archaic Yue and Sinitic languages—traces that are gradually being uncovered through etymological analysis. These connections span proto-forms and archaic scripts that evolved into Sinitic etyma, now found in both Chinese lexical items and their Sinitic-Vietnamese equivalents.
All words in this category may be classified as 同源辭 (tóngyuáncí), meaning "etyma" or "words of the same root." For example, 川 chuān, 水 shuǐ, and 江 jiāng all signify "river" or "stream." The Vietnamese word "sông" should not be rigidly tied to 江 jiāng alone. Etymologically, Sinitic-Vietnamese etyma are postulated here as having evolved from Sinitic-Yue forms, as illustrated in the case of "river."
Specifically, the Sinitic-Yue etymon 狗 gǒu corresponds to "chó" (dog), while its Sino-Vietnamese reading "cẩu" aligns with the Sinitic-Vietnamese form "cầy." This coexists with 犬 (quán), which gives rise to "cún" ('puppy'). Similarly, 眼 yǎn and 目 mù correspond to "mắt" ('eye'), while 首 shǒu and 頭 tóu relate to "trốc" ('head'). These examples reflect layered etymological relationships.
In contrast, the Austroasiatic Mon-Khmer wordlist offers no characteristically parallel forms to those fundamental terms as described above. The absence of such correspondences further underscores the limitations of the Austroasiatic framework in accounting for the depth and complexity of Sinitic-Yue linguistic convergence (see Nguyễn Ngọc San, 1993)
Archaeological excavations of cultural relics from the ancient Chu State (楚國), located in parts of the South China region, have revealed a substantial presence of proto-Taic vestiges embedded within early Yue cultural elements. These findings align closely with developments in historical linguistics, particularly through the identification of Sinitic-Vietnamese etyma, Vietnamese words that share cognates with Sinitic lexical items.
It is worth recalling that Liu Bang (劉邦), the founding emperor of the Han Dynasty (漢高祖), was originally a subject of the Chu polity, as were many of his subordinates and military forces. Following the process of Sinicization (漢化), linguistic features of Chu, rooted in Tai-Yue origins, were gradually absorbed into the Han cultural and linguistic framework. Over time, these Chu elements came to be treated as if they were inherently Han, and thus classified within the Sinitic language sub-family due to their integration with other Han-affiliated forms.
A similar phenomenon can be observed in the Austroasiatic Mon-Khmer context. Early Mon-Khmer resettlers from the southwest of what is now northern Vietnam migrated into the Red River Delta, where they came into direct contact with local Daic populations and later Yue emigrants from the Dongtinghu region of China South. This convergence produced linguistic overlaps that have often been misattributed or oversimplified in classification models (Nguyễn Ngọc San, 1993, p. 43).
Whether or not the findings presented in this survey are sufficient to overturn the foundational assumptions of the Austroasiatic Mon-Khmer theory, they may at least serve as valuable complements to areas where current Sinitic-Vietnamese studies remain underdeveloped. Much of the field still relies heavily on systematic Han-Viet (漢越) readings of Chinese etyma, which follow a diachronic model. In contrast, the development of Sinitic-Vietnamese etyma is fundamentally synchronic.
The credibility of the examples cited in this study stems from the application of two new etymological techniques introduced by the author: (1) the analogical approach and (2) the dissyllabic method.
The analogical approach allows for the positive identification and plausible reconstruction of Vietnamese etyma of Chinese origin, including those obscured beneath basic Mon-Khmer vocabulary substrata. Without this method, many such forms might have gone unnoticed. Examples include "chim" (bird) corresponding to 禽 qín, "chuột" (mouse) to 鼠 shǔ, "ngựa" (horse) to 午 wǔ, "heo" (boar) to 亥 hài, and "trâu" (water buffalo) to 丑 chǒu. These cases illustrate how analogical reasoning can reveal deep etymological relationships otherwise hidden beneath surface-level classifications. For further examples, refer to the Chapter 9 on Comparative Mon-Khmer and Vietnamese basic words.)
The analogical method allows us to examine sibling etyma that may have emerged within the same semantic or cultural category. If a word demonstrably shares linguistic traits and attributes with a Chinese-origin term, its related forms can often be postulated as belonging to the same etymological genre. For example, "đất" corresponds to 土 tǔ ('soil'), while 地 dì ('land') shares overlapping semantic space. Similarly, "nặng" ('heavy') aligns with 重 (zhòng, SV trọng), and "nhẹ" (light) with 輕 qīng (SV khinh).
In cultural contexts, these relationships extend to morpho-syllabic constructions that express shared conceptual meaning. Examples include 懷念故土 huáiniàngùtǔ, rendered in Vietnamese as nhớvềđấttổ (homesickness); 心地 xīndì (SV tâmđịa) as tấmlòng (the heart's core); 重擔 zhòngdàn as gánhnặng (heavy burden); 輕視 qīngshì as khinhkhi (to look down on) and xemnhẹ (to take lightly); 輕易 qīngyì as khidễ (to despise); and 容易 róngyì as dễdàng (easily), contrasted with 困難 kùnnán as khókhăn (difficult). These examples demonstrate both literal meanings (nghĩađen, 正義 zhèngyì) and figurative meanings (nghĩabóng, 偏義 piānyì).
The second approach focuses on dissyllabic characteristics of two-syllable words. This methodology is used to trace the etymology of many basic Sinitic-Vietnamese terms and often complements the analogical method. For instance, 田地 tiándì corresponds to đồngruộng ('paddy field'), while 下地 xiàdì ('to go to the field') aligns with rađồng. The term 地 dì (SV địa) maps to đồng, and 田 tián (SV điền) connects to both đồng and ruộng. In this case, 地 dì may have evolved into đồng, represented by 垌 tóng ('paddy field'), which is semantically linked to 田 (tián).
The dissyllabicity approach requires historical linguists to treat sound changes in two-syllable Chinese words as synchronic events. Each syllable may function independently as an allophonic unit with its own generative properties. For example, 田 (tián) originally meant "hunt" in Old Chinese and is plausibly cognate to the Vietnamese word săn, following a sound change pattern from /t- to s-/ (M 田 tián < MC dɛn < OC *l'iːŋ ).
This transformation process is not governed by rigid phonological rules such as those found in Middle Chinese, Sino-Vietnamese, or Cantonese interchanges. Instead, it reflects a range of phonetic outcomes resulting from lexical mutation, contraction, metamorphosis, metathesis, or spoonerism. These are sporadic synchronic events in which each morphemic syllable diverges into multiple phonemic forms.
Ultimately, the implications of these two extended approaches—analogical and dissyllabic—provide evidence of Chinese linguistic traits embedded in Vietnamese. Together, they lay the groundwork for classifying Vietnamese within a new Sinitic-Vietnamese sub-family, placing it alongside other languages in the Sinitic branch of the broader Sino-Tibetan linguistic family.
Sinitic > Seven major Chinese dialectal groups / *Sino-Tibetan < \ Sinitic-Yue > Sinitic Vietnamese---------------------- (*) based on the cognacy of some 400+ Sino-Tibetan fundamental words with those of Sinitic Vietnamese
How would the Mon-Khmer etyma, i.e., those Austroasiatic cognates, fit into the diagram above?
To satisfy the Austroasiatic demand given the postulation of Taic > Mon-Khmer as well, their positional designation is proposed as follows:
Taic > Sinitic-Yue > Sinitic > Sinitic Vietnamese* > Annamese > Vietnamese / **proto-Taic < \ Taic > ***Austroasiatic > Mon-Khmer > Vietmuong > Vietic > ****Vietnamese ---------------------- (*) including Sino-Vietnamese (**) linguistic elements already existed prior to the class of Austroasiatic (***) could be interpreted as the Yue without Sinitic elements (****) which is redundantly the same as the above and can be omitted
The Sinitic-Yue theorization presented above suggests that Yue linguistic forms may have originated from a common proto-Taic family, likely dating back 4,000 to 6,000 years before present. This ancestral stratum may also have contributed to the emergence of what is now classified as the proto-Austroasiatic linguistic family. The entire process unfolded in parallel with the dissemination of Dongsonian bronze drum culture, carried by early Yue emigrants who may have introduced these artifacts to the Indonesian archipelago, where they remain visible today.
If this scenario holds, then the previously proposed theory of a northward migration from the Indo-Chinese peninsula cannot be attributed to either the Yue or Austroasiatic populations as traditionally speculated. Both frameworks—Austroasiatic Mon-Khmer and Sinitic-Yue—remain speculative in their own right.
During the same formative period, Yue-Sinitic entities began to take shape through the fusion of elemental linguistic and cultural forms. This transmutation occurred among aboriginal groups who remained in situ and incoming Sino-Tibetan populations migrating from south of the Yellow River, the ancient cradle of the Shang-Yin civilization. From this convergence emerged the genetic and cultural foundations of early Chinese identity.
These early populations shared mythological traditions that continue to resonate in Chinese and Vietnamese cultural memory. Figures such as Yándì 炎帝 (SV Viêmđế), Chénnóng 神農 (SV Thầnnông), and the legendary "children of the Dragon" 龍種 (VS dòngdõi Tiênrồng) reflect a shared symbolic heritage between the descendants of the Yue and the peoples who inhabited the regions north and south of the Yangtze River Basin.(See Nguyen Nguyen's on the origin of Vietnamese)
Whether one considers the Yue as the dominant lineage or the Austroasiatic Mon-Khmer as a lesser branch, both groups appear to have evolved from a shared ancestral strain—commonly referred to as proto-Taic—dating back approximately 4,000 to 8,000 years before present. At that early stage, the boundaries between these populations were fluid and subject to interpretation, based primarily on archaeological findings and oral traditions passed down through generations.
Historically, while some theories remain speculative, others are substantiated by classical records. The existence of Yèlángguó 夜朗國 (SV Dạlangquốc) in Sichuan and the Yue State 越國 Yuèguó (VS NướcViệt) in Jiangsu Province, near present-day Shaoxing 紹興, are well attested. Additional Yue polities include Chu State in the west, 吳國 Wúguó (VS nướcNgô) to the north, and MinYue 閩粵 MǐnYuè (SV MânViệt) in the south.
These well-documented states were inhabited by early Yue populations who spoke proto-Yue languages and preserved distinct cultural identities during the Warring States Period (475 to 403 B.C.). Historical records from this era offer meaningful evidence of proto-Taic-speaking communities, their settlement zones, and the linguistic forms they likely used.
Austroasiatic theorists, however, assigned these populations a generalized label "Austroasiatic" without adequately addressing the complexity of their origins. The prefix "Austro-" simply means "southern," yet its referent remains ambiguous. Is it meant to indicate South China, the southern portion of the Indo-Chinese peninsula, or even the Southern Hemisphere? Such vagueness reflects the limitations of a framework that overlooks the deeper Yue-Taic substratum and its historical continuity.
Linguistically and culturally, Vietnamese literary works remained deeply
influenced by Chinese stylistic conventions well into the early 1970s.
Both prose and poetry reflected classical Chinese aesthetics, with poetry
often composed in Tang-style rhyming matrix and enriched by metaphors and
imagery drawn from traditional Chinese settings, such as snow-covered
landscapes in Suzhou or Hangzhou being common motifs. This
stylistic legacy persisted until a new generation of France-educated
writers introduced romantic scenes inspired by Paris and the Seine River,
marking a shift in literary sensibility (see Tô Kiều Ngân, 2013; Hà Đình
Nguyên, 1992).
By contrast, writings in the 21st century,
especially those published in official outlets such as Tuổitrẻ and
Thanhniên newspapers, exhibit a distinctly modern and liberal
Vietnamese style. These works reflect contemporary themes, colloquial
expression, and a departure from classical Chinese literary frameworks,
signaling a broader cultural and linguistic transformation.From the perspective of Sinitic linguistic development, it is postulated
that Sinitic-Yue evolved through the fusion of speech forms spoken by
indigenous Yue populations and early Sino-Tibetan migrants. These
newcomers—possibly nomadic, intelligent, and militarily assertive—were
likely ancestral to the subjects of the Yin Dynasty. They resettled south of
the Yellow River Basin, entering regions inhabited by Yue communities in
what is now southern China (S).
The etyma associated with this fusion, originally derived from ancient Yue languages, continue to be reflected in the speech of ethnic minorities such as the Zhuang, Dai, Dong, and Miao. These linguistic survivals offer compelling evidence of a deep historical substratum that shaped the evolution of Sinitic-Yue forms still present in southern China today. (See Appendix K)
As for Vietnamese specifically, historically, the language has distinctively emerged as a special case of a Sinicized Yue speech that has been largely mixed with Chinese elements dominantly prominent for a simple reason that it had gone through 1,060 years under the rule of the imperial China as one of its prefectures. Its case, however, is much less Sinicized and different from the same process that has turned all Cantonese and Fukienese as "Chinese" lects since the Han and Tang dynasties, respectively. While the two prefectures continued to stay under the shadow of Chinese Han who have kept moving in and resettling there throughout their history, in contrast, since the year 939 the ancient Vietnamese speakers inside Annam have managed to keep the country as a sovereign state and its language to evolve in its own way. The same phenomenon has now been recurring in the Chinese Hainan Province island as of now, a process that has picked up more and more northern Chinese resettlers since the start of the current millenium. Most of Vietnamese specialists have not paid enough attention to such historical details.
The classification of Vietnamese within the Sino-Tibetan language family is grounded in substantial evidence drawn from core linguistic elements and etymological patterns associated with Sino-Tibetan origins (see Chapter 10 on the Sino-Tibetan etymologies). Accordingly, the approaches proposed in this study diverge significantly from frameworks that attempt to reinforce the Austroasiatic theory of Vietnamese origin.
In essence, the Sinitic-Vietnamese lexical items examined here stand in contrast to the linguistic traits found in Mon-Khmer lexicons. This opposition will be further illustrated in the comparative analyses presented in the following sections.
II) On the one-size-fits-all conspiracy
The author has long suspected that the Austroasiatic Mon-Khmer hypothesis was shaped by individuals lacking proficiency in both Vietnamese and Chinese, as well as in the historical contexts of their respective speech communities. Western neo-theorists, particularly in the post–Industrial Revolution era, often pursued methodological shortcuts—placing all assumptions into the Austroasiatic framework they were constructing. In doing so, they manipulated data to build supplementary models designed to override earlier theorizations, regardless of their historical grounding.
These efforts relied heavily on the authority of Western academic conventions, despite the fact that Chinese language and history remained largely unfamiliar to Western scholars until the early seventeenth century. (Knud Lunbæk, 1986).
The author's speculation is rooted in the observable rigidity of outdated data, inflexible presentation formats, frequent misspellings, overgeneralization from narrow samples, and repetitive patterns that fail to account for the linguistic nuance inherent in the lexicon of the target language.
TABLE 1: Austroasiatic controversy or conspiracy? A case of Yue denial.
If the Austroasiatic Mon-Khmer theory were built on the premise that Mon-Khmer aboriginal groups—rather than Daic populations—were the earliest inhabitants of the Red River Basin, then it follows that this region served as the Indo-Chinese cradle for subsequent cultural and linguistic developments. According to this view, Austroasiatic communities were already established prior to the arrival of Yue-Daic migrants, who were mistakenly assumed to have come later. This assumption stands in contradiction to a wide range of historical and archaeological evidence.
Western theorists have often disregarded Yue contributions, constructing new frameworks without engaging with available records. In doing so, they bypassed centuries of documented history, including sources that have long posed interpretive challenges for scholars in mainland China since the early seventeenth century.
The theory further suggests that incoming resettlers intermingled with existing aboriginal populations—identified as Austroasiatic peoples based on Mon-Khmer assumptions—which had already spread across Southeast Asia.
Later waves of Yue-Daic speakers from southwestern regions, including Lower Laos, arrived and were followed by Sinitic-Yue migrants from South China. It was during this period that the linguistic configuration now labeled as the Austroasiatic family began to take shape, eventually extending across the southern and western zones of the Indo-Chinese peninsula.
Sino-Tibetan etymons for fundamental Vietnamese were largely disregarded as Austroasiatic theorists advanced their consensus on the origin of the language. They asserted that Vietnamese was spoken uniformly across the population and derived from a foundational set of Mon-Khmer cognates. These etyma were presented as conclusive and dependable proof of Austroasiatic influence in shaping the Vietnamese linguistic profile.
As languages evolve, semantic shifts often obscure original relationships. This issue evokes the familiar metaphor of the chicken-and-egg dilemma, though in this case the question is not which came first, but what form emerged, that is, a chicken, hen, rooster, or cock. For example, while 口 kǒu as "mouth' came before 吻 wěn, which once also referred to "mouth," later evolved to mean "kiss," aligning with VS "hôn." In Ancient Chinese, it is reasonably accepted that 土 tǔ ('soil') preceded 地 dì ('land'), and 口 kǒu ('opening') came before 吻 wěn ('mouth'). Yet can any Sino-Tibetan specialist definitively determine whether the Vietnamese equivalents—in this case, "đất" versus 土 tǔ, "cửa" versus 口 kǒu, or "mồm" versus 吻 wěn—originated first within their respective linguistic trajectories?
If Sino-Tibetan scholars themselves cannot decisively establish the directionality of key linguistic developments, it renders even more tenuous the piecemeal assertions made by Austroasiatic theorists regarding linguistic primacy. This underscores a broader principle: all theories of genetic linguistic affiliation remain provisional, open to revision as new evidence emerges. The Austroasiatic hypothesis, in particular, is hampered by a lack of historical documentation to substantiate its claims. It relies heavily on reconstructed etymons and speculative lexical correspondences, and therefore must be approached with measured skepticism rather than uncritical acceptance.
Let us take a moment to relax and engage in a metaphorical exercise to help visualize the broader linguistic taxonomy at hand. Imagine the Mon-Khmer theory as a handful of specimen fish placed within a much larger basket, one that also contains Austroasiatic, Yue, Taic, and Sino-Tibetan species. Among these, the Sino-Tibetan and Yue-Taic varieties are netted in far greater volume, with early Chinese written records documenting each catch through oracle bone inscriptions, turtle-shell divinations, and bronze tripod engravings. These artifacts fall squarely within the timeframe relevant to our inquiry, unlike the abstract mysticism of Pali or Sanskrit chants, which drift untethered in the air. Notably, Tibetan scholars took extensive notes on these traditions.
From the author’s perspective, alongside the Bodic (Tibetan) languages, the earliest linguistic formations included Taic, followed by Daic and the Yue split. Each of these branches produced lineages that stand on equal footing with the Sino-Tibetan family, which gave rise to its Sinitic descendants. These can be grouped in parallel with both Tibetan and Yue elements.
In contemporary discourse, the Austroasiatic hypothesis attempts to encompass this entire spectrum, positioning itself as a counterpoint to the Sino-Tibetan framework, particularly in relation to the Vietic segment. While ancient Taic gave rise to Yue and its newly siblings Austroasiatic languages, hence, including what we now recognize as Sinitic Vietnamese, the modern Western interpretation has reframed this lineage under a different guise, distancing itself from the Yue theory. The term 越 /Jyet/, transcribed through various homophonous characters in Chinese annals, reflects this historical complexity.
As we examine the southward migratory movement from China South into the Indo-Chinese peninsula, the Sino-Tibetan theory offers a compelling etymological explanation for the cognacy of over 400 fundamental words shared between Vietnamese and Chinese, as documented in this survey. However, pursuing the Sino-Tibetan route requires navigating vast repositories of Chinese historical records, many written in archaic and classical styles, which this study has chosen to engage directly.
This is not a matter of privileging larger fry over rare specimen fish. One cannot dismiss the presence of essential Austroasiatic Mon-Khmer cognates, scarce yet foundational, that continue to surface in adjacent linguistic waters, like the numerals, at least from one to five. The Austroasiatic hypothesis started with having postulated links from the distant Munda languages of coastal India first, then it initiators extended its reach eastward and netted a modest catch of etyma affiliated with Munda (see Chapter 8 on the Mon-Khmer association.)
All considered and for what follows next, be it Austroasiatic, Taic, or Sinitic, it is reasonable to assert that the Yue entities existed first emerged in succession from that foundational stratum.
The Austroasiatic hypothesis, as a matter of fact, could have been used as patch works in a second thought to fill in all possible cracks where the Sinitic elements still stayed hidden and unnoticed in between linguistic pockets scattered intercontinentally within the timeframe of approximately 6,000-10,000 years ago (the same estimate might be reached with the least percentage of basic cognates with glottochronology calculation.) It is understandable to see Indian elements in the Khmer and Chamic languages in ancient forms of Sanskrit or Pali origin words as they used to be under strong influence of Buddhism and Hinduism, respectively, but they appear to be alien in Vietnamese except for what they sound in common Buddhist prayers such as 'MôPhật', a shortened form of 'Nammô AdiđàPhật' ('Namo Amitabha') that is assume to convey a much localized context.
What we observe today as the status quo in Vietnamese linguistic classification is the result of a long trajectory of competing hypotheses. These either aim to (1) nullify existing theories of Austroasiatic Mon-Khmer or Sino-Tibetan origin from opposing viewpoints—for instance, China's official institutions classify Cantonese, a Yue language, as part of the Sino-Tibetan family based on its Sinitic etyma—or (2) construct new frameworks atop the same hypothetical foundations, leveraging modern methodologies. One such example is the Austro-Thai hypothesis proposed by Benedict (1975), which builds upon similar premises as the Austroasiatic Mon-Khmer model.
In response to earlier theories, Austroasiatic proponents advanced the view that aboriginal populations—retrospectively labeled as Mon-Khmer—were the original inhabitants of the Red River Delta, rather than Yue-Daic resettlers. According to this model, subsequent migratory waves during the Han colonial period brought Tai-Kadai speakers from present-day Lower Laos and racially mixed Sinitic-Yue groups from South China. These later arrivals intermingled with the indigenous populations, producing new ethnolinguistic communities that gradually dispersed across Southeast Asia.
From this Mon-Khmer substratum, the Austroasiatic linguistic family is said to have emerged and expanded northward and westward throughout the Indo-Chinese peninsula. Mon-Khmer speakers are credited with introducing foundational vocabulary to local populations, including those who spoke early forms of Vietic. Even after cognates were later identified between Vietnamese and modern Cambodian etyma, Austroasiatic theorists continued to maintain that Vietnamese originated from Mon-Khmer linguistic roots (see Nguyễn Ngọc San, 1993).
Genetic studies conducted by Vietnam's DNA research institutions further complicate this narrative, though. Recent findings indicate that Vietnamese, Thai, Daic, Yao, Hmong, Mon, Khmer, and southern Chinese populations share similar genetic markers, suggesting a more intertwined ethnolinguistic heritage than previously acknowledged.
Socially and academically, many individuals tend to follow prevailing beliefs, especially when those beliefs are widely accepted and institutionally reinforced. For newcomers, it is often easier to adopt the Austroasiatic Mon-Khmer classification of Vietnamese, which has become one of the most dominant theories in genetic linguistic affiliation. Yet linguistically, the postulated Austroasiatic languages themselves, as forementioned, had evolved from a common Taic-Yue source, one that also gave rise to Yue daughter languages spoken by ethnic groups in China South. This Taic-Yue lineage could plausibly extend to Tai-Kadai, and even Austronesian and Polynesian divisions, all considered branches of the broader Taic linguistic family.
The Austroasiatic theorists, in constructing their Mon-Khmer hypothesis, applied Indo-European methodologies to Vietnamese without fully engaging with its historical and cultural context. As seen in A. Meillet and Marcel Cohen’s Les Langues du Monde (1952), the effort to position Mon-Khmer as a foundational linguistic family did not require deep engagement with Vietnamese or Chinese linguistic traditions. To illustrate this methodological imposition, let us consider a hypothetical case in the Amazon jungle.
Imagine Western linguists arriving to survey two remote Amazonian tribes in an effort to determine their linguistic affiliation. Applying the same logic once used to reframe Vietnamese origins, they approach the task with a mid-19th-century colonial mindset. Upon discovering that speakers in village B share a handful of basic words with those in village A—previously surveyed—they proceed without historical context. Instead of investigating deeper cultural or genealogical ties, they take an academic shortcut: they invent a label such as “Root A” and classify both languages under this newly coined family, assuming the tribes themselves lack awareness of their linguistic heritage and the label imposed on them will be what is to be accepted. In doing so, they exclude the scholarly communities from the process of classification and impose a framework shaped more by external assumptions than by lived reality.
This mirrors the approach taken by Western scholars and missionaries in 18th-century Annam. Confronted with the complexity of the Chinese-based Nôm script, they bypassed Chinese altogether and devised a Romanized orthography for Vietnamese. This system, complete with its own grammar, was tailored to the needs of the largely illiterate population and served the missionaries' evangelical objectives. They believed they had resolved a millennia-old problem that Annamese scholars had failed to address—unlike their smarter Korean and Japanese counterparts, who had successfully developed phonetic systems to complement Chinese ideograms. In essence, the Vietnamese linguistic and cultural mindset had long been shaped within the hardened mold of Chinese intellectual tradition that overlooks on every cultural aspects of the country.
It is important to recall that similar Romanization efforts in China ultimately failed. Western missionaries were met with widespread resistance, compounded by high illiteracy and entrenched cultural norms. This failure highlights the limitations of Western intervention in deeply rooted linguistic systems. By contrast, the Latinization of Vietnamese succeeded only with the support of the French colonial administration, which institutionalized the Romanized script.
In contrast, the Annamese case saw the successful imposition of a Romanized orthography, which bypassed the complexities of Nôm and classical Chinese script which has made a vast popular base. illiterate. However, this was not a natural linguistic evolution; it was a colonial shortcut. Ironically, the outcome in Vietnam proved more transformative than in China, despite the latter’s longer engagement with foreign missionaries.
The Austroasiatic Mon-Khmer classification, meanwhile, emerged as a technical construct at the turn of the previous century. Western scholars, many of whom lacked proficiency in Chinese, coined the term with strategic intent. "Austro-" was used to denote "south," while "Asiatic" signaled a continental linguistic scope. This allowed them to frame a language family that ostensibly originated in the southern regions of Asia, including South China, as distinct from the north.
By doing so, they effectively created a hypothesis that encompassed nearly every language spoken across Southeast Asia and South China, citing shared lexical roots, even suggesting that Chinese itself borrowed from these sources. In the process, they sidestepped the historical role of Chinese influence and even dismissed terms like "Sino" and "Sinitic" as politically motivated labels, particularly in the classification of Cantonese and Fukienese within the Sino-Tibetan family.
This maneuver, much like the hypothetical case of Amazonian tribal languages discussed earlier, reflects a broader pattern of Western linguistic theorization: one that often privileges convenience and conceptual neatness over historical depth and cultural specificity.
In the real world, linguistic theories, like languages themselves, are subject to change. Their volatility mirrors the dynamic nature of speech communities and the evolving tools used to study them. For example, Zhuang and Daic languages were long classified directly under the Sino-Tibetan family just like other Sinitic lects before being reassigned to the Tai-Kadai language family. This reclassification underscores the provisional nature of linguistic taxonomy, especially in contrast to the relative stability found in the natural sciences.
The Austroasiatic hypothesis, likewise, remains inconclusive. Until every foundational issue is resolved, it must be treated as a working model rather than a definitive account. This stands in contrast to the Indo-European framework, which has achieved broad scholarly consensus and left behind a robust legacy of analytical tools used to trace the origins of languages such as Pali, Sanskrit, Greek, Latin, Germanic, Baltic, and Gaulish.
In the case of our Sinitic-Vietnamese study, the initial objective was to reclassify the Vietnamese language into its rightful sub-family, as its historical and linguistic lineage implies. To achieve this, academic consensus must acknowledge the existence of an ancient Yue language family, one that can be verified through historical records. The ancient phonetic form rendered as "Jyet," or possibly "Bjyet," is recognized in modern Mandarin as "Yue," and appears in classical sources such as the Erya (爾雅), which was used for diplomatic communication during the Spring and Autumn Period. This is further evidenced by the continuity of major Yue-Sinitic dialects—Cantonese, Fukienese, and Wu—whose linguistic features elevate their Vietic counterparts to a shared Yue origin.
Historically, Zhuang and Daic languages, now classified under the Tai-Kadai—also known as Krai-Dai—family, were once grouped by Western scholars within the Sino-Tibetan framework. Chinese institutions later reclassified them as distinct branches, yet still under the broader Sino-Tibetan umbrella. For the same purpose, our reference to Cantonese, Fukienese, and Wu dialects is a deliberate attempt to justify the regrouping of Vietnamese into the same linguistic sub-family, aligning it with languages officially endorsed by Chinese academic authorities as part of the Sinitic branch.
In the contemporary battle over linguistic truth, China's information apparatus actively edits and counter-edits digital content across platforms such as Wikipedia and Facebook beyond its bubble autocratic sphere, shaping public perception through curated narratives. To finalize any major additions to the Sino-Tibetan family, comparable linguistic analytical tools must be employed. Historical linguists must also recognize the persistent Sinicizing force that has layered Chinese superstrata over Yue substrata—aboriginal elements that remain embedded beneath the surface.
This is not a distant or speculative past. Evidence from classical texts, including the Erya, confirms that Yue linguistic elements predate many archaic Chinese forms that later became foundational to Sinitic languages (see De Lacouperie, 1965). These elements are central to understanding the evolution of the region’s linguistic landscape.
To reconcile tensions between Sino-Tibetan and Austroasiatic models, our Yue-Sinitic framework adopts the same objective methodologies used by Western Austroasiatic theorists. Rather than reiterating older Sino-Tibetan paradigms, we establish Yue as a foundational stratum, integrating Sino-Tibetan etyma found across diverse dialects, including overlooked varieties such as late Northeastern Mandarin. Semantic shifts, such as 順路 shùnlù (VS thuậnlối) versus 順道 shùndào (SV thuậnđường), both words mean 'be by the way', illustrate the nuanced lexical dynamics that inform this analysis.
This methodological parallel allows for a deeper exploration of Vietnamese etymology, using the same mechanisms and tools that Austroasiatic theorists applied to Mon-Khmer languages. Many archaic Chinese words of Sino-Tibetan origin remain dormant, preserved in classical texts and archaeological substrata, long before the Austroasiatic Mon-Khmer theory emerged through Middle Vietnamese contact with Khmer in the south.
It is a historical fact that Vietnamese emigrants to southern territories once inhabited by Chamic peoples of Austronesian Malayo-Polynesian origin only began resettling those lands after the 17th century. Their interaction with Mon-Khmer speakers spans less than 370 years, a relatively recent development in linguistic terms. Middle Vietnamese was not the native language of these regions; it arrived with later migrants who followed the Mekong upstream into areas such as Tonle Sap Lake in Cambodia. In comparative terms, Austroasiatic linguistic claims resemble cultural assertions made by Vietnamese archaeologists who have controversially attributed artifacts from the Sahuỳnh and ÓcEo civilizations to Vietnamese ancestors. These claims, often criticized for their speculative reach, reflect a broader impulse to root national identity in material and linguistic heritage, regardless of historical nuance.
It is somewhat mechanical and dull to simply quote and re-quote the same old Austroasiatic basic etyma from one scholar to another, of which their lexical origins were supplied by "seasonal linguists of some summer's institute". For those who actually did not know the Mon-Khmer languages under investigation very well and they, in turn, mostly relied on translated versions mainly provided by local informants and interpreters, theirs being only casual translation without knowledge of etymological linguistics, in place of true cognates obtained methodologically from linguistic rules. In other words, local guides being not stakeholders, at the time, they might have not been aware of importance of their work that would finally exert so significant imprints in the Vietnamese historical linguistic records.
Many of the linguistic claims made within the Austroasiatic camp, particularly those concerning Vietnamese etymology, have relied on a narrow set of basic-word cognates between Vietnamese and Mon-Khmer languages. These wordlists, repeatedly cited since the mid-1960s, were often presented as methodologically sound despite their limited scope. Specifically, it is ill-advised to build a robust linguistic theory by revisiting the same handful of examples, such as the five counting numbers, names of local fruits and flora, or other low-frequency items drawn from regionally specific Mon-Khmer isoglosses.
Amusingly, newcomers to the field have continued this pattern, using these dated lists as springboards for new interpretations while remaining tethered to the same foundational assumptions. The result is a circular methodology: wordlists originally compiled during brief, grant-funded fieldwork in the remote highlands of South Vietnam during the Vietnam War era are recycled and elevated without critical reassessment. Many of the linguists involved, along with their Mon-Khmer guides, were only semi-literate in the languages under study. Consequently, the cited vocabulary, if valid at all, likely reflects archaic or borrowed forms whose phonological integrity has long since eroded.
To move beyond this problematic legacy of linguistic admixture and recent discrepancies, it is essential to inspire a new generation of scholars fresh from academic training to engage with Sinitic-Vietnamese historical linguistics. This requires first dismantling the misclassification of Vietnamese as an Austroasiatic Mon-Khmer language. The theory itself dates back to the early 20th century and has persisted largely due to institutional inertia. Young researchers, influenced by mentors steeped in Austroasiatic frameworks, often find themselves defaulting to the Mon-Khmer model out of familiarity and academic convenience to start with.
Moreover, they continue to rely on outdated data from early fieldwork, which—while pioneering—was riddled with methodological flaws. If we can set aside the bitterness and sarcasm that sometimes accompany theoretical disputes, and if no entrenched interests obstruct critical reassessment, then even flawed past studies may serve as stepping stones toward meaningful breakthroughs.
As we prepare to reinitiate a Sino-Tibetan algorithmic approach to Vietnamese etymology, it is important to affirm that there remains ample room for interpretive freedom within either camp. Whether one leans toward Austroasiatic fieldwork or places trust in Mon-Khmer guides, linguistic competence must be prioritized. Ideally, such guides should be institutionally trained Khmer native speakers fluent in both Vietnamese and a Mon-Khmer language. Even better would be those with prior collaboration experience and familiarity with multiple Mon-Khmer varieties, alongside a strong command of Chinese—especially Archaic Chinese.
This latter qualification becomes crucial when comparing Mon-Khmer wordlists with affirmatively readable Ancient Chinese forms known to have existed in Vietnamese for millennia. Consider, for example, the twelve animals of the earthly zodiac, which the Khmer also share. The Vietnamese term ‘nămMèo’ aligns with 卯年 Mǎonián that clearly denotes the "Year of the Cat," not the "Rabbit," as often mistranslated directly from the Chinese language. Such examples underscore the need for deeper philological rigor and cross-referencing with classical Chinese sources.
The early Austroasiatic Mon-Khmer specialists who first compiled Vietnamese-Mon-Khmer cognate lists often lacked both linguistic sensitivity and sufficient proficiency in the languages under study. True mastery of these living languages, ideally at a near-native level, is not merely desirable but essential. More than translation, what is required is a deep "language feeling", a kind of intuitive grasp that allows one to perceive subtle semantic and phonological resonances that only trained historical linguists can detect.
This "feeling for the language" becomes especially evident when examining the thousands of Chinese–Vietnamese cognates that reveal themselves only to a discerning and linguistically attuned mind. These "exploding words", as they manifest in Vietnamese, follow no parallel pattern in Mon-Khmer languages. Only a competent historical linguist can trace the phonological evolution that clarifies the semantic layering of 母 mǔ (SV mẫu, VS mẹ, mái, mợ, "mother", "female", "aunty") — notably, the term "mợ" as "Mom" has undergone semantic expansion to encompass broader kinship references, including "uncle and aunt" as addressed by nieces and nephews, and even "parents" in northern dialectal usage — in compound forms such as 繼母 jìmǔ (SV kếmẫu, "stepmother"), 母雞 mǔjī (SV mẫukê, "hen"), and 舅母 jìumǔ (SV cựumẫu, "maternal uncle's wife"). These yield Vietnamese variants like "mẹghẻ" vs. "mẹkế" (stepmother), "gàmẹ" vs. "gàmái" (hen), and "cậumợ" vs. the contracted "mợ" (maternal uncle’s wife).
Each of these Vietnamese forms can be traced back to Old, Middle, modern, or regional Chinese variants, revealing a layered cultural and linguistic inheritance. No Mon-Khmer equivalent exhibits the same depth of semantic nuance or cultural embedding. The distinctions among mẫu, mẹ, mợ, and mái reflect a sophisticated interplay of phonology, kinship semantics, and cultural transmission that is uniquely Sinitic in character.
Furthermore, many of the basic words currently cited as Mon-Khmer–Vietnamese cognates may no longer be credible. Based on new Sino-Tibetan findings presented in this research, several of these items—once thought to be Austroasiatic in origin—now point to deeper roots within Sino-Tibetan etymologies. This reevaluation invites a broader reconsideration of Vietnamese linguistic classification and its historical affiliations. (Shafer, 1966 - 1974. Refer to Sino-Tibetan etyma.)
The author could go on endlessly discussing Vietnamese etymology in relation to Chinese, elaborating on lexical developments across Ancient Chinese and its dialectal variants. It is not that he is a formally trained historical linguist, nor does he claim proficiency in Khmer. Yet when Henri Maspero (Les Langues du Monde, 1952, pp. 582–83) asserted that Mon-Khmer languages constitute the substratum of Vietnamese and that its grammar reflects Thai and Mon-Khmer structures, the author immediately recognized the flaw in such a statement, and understood what led Maspero to that conclusion: a lack of deep engagement with Mon-Khmer linguistic realities.
With his fair mastery of Vietnamese and Chinese, both at native fluency and with academic grounding in their historical linguistics, the author perceives what many Mon-Khmer specialists, even those with comparable linguistic training, have overlooked. On one hand, they may competently analyze individual elements and propose Mon-Khmer cognates for Vietnamese equivalents. On the other hand, they do not speak Vietnamese and Chinese with a "feeling", the intuitive grasp possessed by bilingual native speakers who can articulate etymological relationships with precision and insight.
Specifically in the field of historical linguistics concerning Chinese and Vietnamese etymologies, Sinitic-Vietnamese words often trace back to a single etymon that may give rise to multiple Chinese variants. These may appear in differentiated forms, e.g., "quả" (fruit) as 果 guǒ vs. 菓 guǒ, or "đậu" (bean) as 豆 dòu vs. 荳 dòu, with original forms sometimes recycled to convey new meanings. Syllabically, phonological fragments evolve across multiple strata, and knowing only one or two layers of an etymon is insufficient—since the same concept may manifest through distinct phonologies. As readers will later observe, many Chinese and Vietnamese words have evolved through three or more etymological layers, including the phenomenon of 'doublets', that is, words derived from the same source but diverging in form and meaning.
For instance, Chinese 會 huì (SV hội) may have yielded VS forms such as 'hiểu', 'họp', and 'hụi', meaning 'understand', 'meeting', and 'trust fund', respectively. Similarly, 川 chuān, 水 shuǐ, and 江 jiāng are all possibly cognate to Vietic */krong/ or VS 'sông' (river), with further derivatives: 川 chuān for 'suối' (stream) evolving into 泉 quán (creek); 水 shuǐ for 'nước' (water) leading to 江 jiāng for 'sông' (river), which itself is of the same root as 長江 Chángjiāng (Yangtze River).
In these examples, it is no coincidence that many Austroasiatic Mon-Khmer specialists have failed to recognize the formation of polysyllabic doublets derived from Sinitic variants. Their cited Mon-Khmer etyma often appear repackaged from secondary sources, with Vietnamese terms frequently misspelled or mislabeled, even in academic publications, errors that cannot be dismissed as mere typographical oversight. While their contributions to Vietnamese linguistic studies are acknowledged, such omissions render their work incomplete, and at times, methodologically biased.
The presence of Vietnamese cognates in Mon-Khmer and Thai languages does not necessitate a Mon-Khmer origin. These forms may reflect shared proto-roots that penetrated archaic Chinese as well. No finger-pointing is needed, but when examining basic words cited by Henri Maspero (Ibid. 1952, pp. 582–83), we find Vietnamese etyma grouped under Mon-Khmer or Thai roots, e.g., "sông" (river), "rú" (forest), "chim" (bird), "lúa" (paddy), "áo" (shirt) under Mon-Khmer; and "gà" (chicken), "vịt" (duck), "gạo" (rice) under Thai. Yet their Chinese counterparts—江 jiāng, 野 yě, 禽 qín, 來 lái, 襖 ào, 雞 jī, 鴄 pī, 稻 dào—along with doublets such as 水 shuǐ, 粗 cū, 隹 zhuī, 衣 yī, 鷄 jī, 鶩 wù, 穀 gǔ—reveal a deeper Sinitic lineage.
Consider 江 jiāng, which specifically denotes 'river' in China South, as in 湄江 Méijiāng (Mekong, modern 湄公河 Méigōnghé) and 長江 Chángjiāng (Yangtze River), both originating from 三江源 Sānjiāngyuán (Three River Source) in the Tibetan-Qinghai plateau. These do not derive from Cambodia’s Tonle Sap Lake, where 'Tonle' means 'river' and /-krong/ denotes 'city', not 'river'. Amusingly, the Vietnamese name "Sông Hồng" (Red River) is rendered in Chinese as 紅河 Hónghé (SV Hồnghà), not 紅江 Hóngjiāng (SV Hồnggiang), though some Vietnamese even say "Sông Hồnghà", i.e., 紅河江 Hónghéjiāng—mirroring the compound structure of 湄公河 Méigōnghé.
In any case, the term "Austroasiatic" as a linguistic family was coined by Western linguists seeking a swift academic solution to classify Vietnamese without invoking Chinese. Faced with Vietnamese speakers living among Khmer populations in the south, they bypassed the Sinitic influence from the north. At the turn of the 20th century, overwhelmed by the abundance of Sino-Tibetan etyma in Vietnamese, they struggled to assign a proper classification—much like the unresolved status of East Asian languages until the 17th century (see Lunbæk, Knud. 1986. T.S. Bayer (1694–1738), Pioneer Sinologist).
In sum, it is assumed that early Austroasiatic pioneers coined a new term for the unclassified linguistic umbrella they were holding. From the outset, they may have ignored or remained unaware of the linguistic depth in Vietic, Daic, Mon, and other languages spoken across Vietnam, Thailand, Laos, and South China, many of which, under the Sino-Tibetan framework, had already been grouped under the "Yue languages". This concept later expanded to include the Taic family, encompassing Daic-Kadai and Yue languages such as Vietnamese, Zhuang, Cantonese, Fukienese, and Wu.
Historically, the term "Yue" was rendered through various Chinese characters, i.e., 粵, 戉, 鉞, often referencing axe-like weapons and the tribes who wielded them, as noted in classical Chinese annals. The term "Taic" was a later ad hoc addition, used to encompass the ancient language of the Chu State, considered the progenitor of Daic-Kadai daughter languages and their Yue-related sisters, possibly including Zhuang or Nùng in Vietnamese, ancestral to Cantonese long before its Sinicization (De Lacouperie, Ibid. 1867 [1965]).
One may speculate that the old Yue framework was reshuffled to compile a new theoretical deck—later known as the Austroasiatic linguistic family, encompassing the Mon-Khmer sub-branch. In the absence of oriental philological expertise, the classification of Vietnamese as Austroasiatic Mon-Khmer was likely initiated by a new generation of Western linguistic enthusiasts eager to establish a foothold in historical linguistics, amid the rise of novel methodologies in the humanities. Linguists, of course, may name a newly “discovered” family as they see fit, especially at the turn of the 20th century. And so, the Austroasiatic initiators walked away with "Austro-Asiatic" (AA), unchallenged, having carved a convenient shortcut for themselves and for latecomers in a field that, arguably, already existed under another name.
To challenge the misconception that Vietnamese basic words stem from Austroasiatic Mon-Khmer origins, by coining the academically grounded term "Sinitic-Vietnamese" (VS), the author seeks to restore Vietnamese to its rightful place within the Sino-Tibetan classification because "Austroasiatic" as a misnomer.
In classifying Vietnamese etymological affiliation, the author introduces the term "Sinitic-Vietnamese" to foreground the role of the Sinitic-Yue linguistic sub-family in shaping the modern Vietnamese language. That is to say, the Vietic linguistic entities emerging after the breakup of the Vietmuong sub-family are best understood as Sinitic overlays atop a Yue substratum. It is plausible that the Sinitic stratum itself was an admixture of Yue elements, explaining the deep cognacy between Chinese and Vietnamese lexemes, such as 江 jiāng for VS 'sông' (river), 椰 yé for VS 'dừa' (coconut), and 糖 táng for SV 'đường' (sugar), all of which are considered aboriginal Yue words within this historical linguistic framework.
To clarify terminological confusion, the author prefers "Sinitic" over "Chinese," as the former denotes a genetic blend of Taic-Yue and possibly ancient Tibetan elements, rather than the vague "extinct foreign elements" referenced by Jerry Norman (1988). The Sinitic-Vietnamese etyma, later labeled simply as "Chinese-origin words" reflect Vietnam’s historical overshadowing by the larger Chinese sphere. Besides, Vietic — conceptually a pre-VietMuong variable within a continuum that stretches from the documented "Yue" in ancient Chinese record to the early formation of Annamese — is grouped within the Sino-Tibetan family.
The term "Sinitic-Yue" enters the discussion to correct the overextension of a purely "Sinitic" canopy. This terminological shift helps avoid ambiguities inherent in the Austroasiatic Mon-Khmer label. For example, while "Sinitic" has been misapplied to exclude Yue, the Austroasiatic designation was originally crafted to encompass languages presumed to originate from regions south of China South. Such framing implies a genetic affiliation rooted in Southeast Asia, leaving little room for alternative interpretations, especially given archaeological artifacts attributed to Mon-Khmer speakers. Yet this does not account for the Yue origin of Sinitic-Vietnamese lexemes such as 'cún' (puppy) and 'lợn' (piglet), which align with Chinese 犬 (quǎn, SV khuyển) and 腞 (dùn, SV đốn), respectively.
Prominent theorists in the Austroasiatic Mon-Khmer tradition have successfully trained generations of Vietnamese studies graduates using Western methodologies rooted in Indo-European linguistics. Rather than perpetuating a "business as usual" stance, it is time to explore a renovative approach to Yue and Sino-Tibetan theorization, as outlined in this survey.
It must be emphasized that the Austroasiatic Mon-Khmer theory, despite its institutional maturity, remains a hypothesis. It is an open-ended framework, subject to valid antitheses as Sinitic-Vietnamese research progresses. The theory continues to evolve, often drifting further from resolution as new scholars expose errors, such as conflating Sino-Vietnamese with Sinitic-Vietnamese forms and drawing flawed conclusions. Western methodology, while logical and systematic, is not infallible. In historical linguistics, there are no absolute maxims. Crucially, the Austroasiatic theory lacks historical documentation to support its claims and has yet to satisfactorily explain the full scope of Sinitic cognacy in Vietnamese. This paper aims to address that gap, revising outdated models in light of new Sino-Tibetan evidence embedded in Vietnamese basic vocabulary.
The Sino-Tibetan school of thought, with its accumulated scholarship, continues to offer etymological, historical, and theoretical value, such as the identification of Sino-Tibetan cognates, reconstruction of archaic forms, theorization of Old Chinese consonantal clusters and triphthongs, and hypotheses on tonogenesis. These foundations pave the way for a robust Sinitic-Vietnamese framework.
As for long-recognized Austroasiatic Mon-Khmer basic words, the author proposes a reevaluation. Take, for example, 'lá' (leaf):
- Mon-Khmer lineage: < 'ha' < hala < *pa (Chamic)
- Chinese lineage: < 葉 M yè, dié, shè, xiè < MC jiap, ɕiap < OC *leb, *hljeb
- Austroasiatic lineage: < Proto-AA *la, Proto-Katuic *la, Proto-Bahnaric *la, Khmer sla:, Proto-Vietic *laʔ, Proto-Monic *la:ʔ, Proto-Palaungic *laʔ, Proto-Khmu *laʔ, Proto-Viet-Muong *laʔ...
This paper presents newly resurfaced evidence with over 420 fundamental etyma, pointing decisively toward Sino-Tibetan origins based on Shafer’s long-standing but underutilized list of Sino-Tibetan etymologies. (Shafer, 1966 - 1974. Refer to Chapter 10 - Sino-Tibetan etymologies.)
We return now to the matter of Sinitic-Vietnamese core vocabulary, which substantiates the etymological affiliation among Vietnamese, Chinese, and the Yue languages, all together forming the foundation of modern Vietnamese. Their historical interconnection, as documented in Chinese records, dates back less than 3,000 years. Ancient Chinese philologists, within their scholarly capacity, were already aware of these etymological commonalities embedded in the Yue linguistic continuum, postulated as proto-Daic, proto-Vietic, proto-Cantonese, proto-Fukienese, etc., with many lexical variants recorded in the monumental Kangxi Dictionary (康熙字典), offering insights into their origins long before Western linguistic constructs such as the Austroasiatic mainstream emerged.
When Austroasiatic theorists introduced the Mon-Khmer origin hypothesis for Vietnamese, they dismissed the pre-existing classification that had grouped Vietnamese alongside other Chinese dialects such as Cantonese and Fukienese all within the Sino-Tibetan family, which had evolved independently of state-sponsored linguistic intervention. By coining the term "Mon-Khmer linguistic sub-family" (MK) and embedding it within the broader "Austroasiatic family" (AA), Western linguists effectively sidelined Chinese scholars from advancing Sino-centric theories, despite the wealth of evidence preserved in ancient Chinese rhyme books. Their maneuver, as noted by Lunbæk (1986), appears to have been a strategic detour around the complexities of Sinology, seemingly deemed to be an attempt to bypass the steep learning curve required to engage with Chinese phonological traditions.
This theoretical displacement obscured the historical role of ancient Yue languages in shaping Chinese lexicons, as well, just as neighboring Mon-Khmer languages influenced Annamese. Many of these Yue-derived forms are buried in Chinese classics and cataloged in the Kangxi Dictionary, e.g., 簍 (M lóu) for 'rỗ' (basket), possibly a doublet of 籮 (luó), or 帔 (M pèi) and 襣 (bì) for 'váy' (skirt).
Ancient Chinese rhyme books, compiled by native philologists, remained underappreciated in Western linguistics until the early 20th century, when scholars such as Haudricourt, Karlgren, Forest, and Maspéro began exploring Chinese historical linguistics. These pioneers recognized the role of Annamese in preserving phonological features of Ancient Chinese. Indeed, Sinitic-Vietnamese phonological values have been instrumental in reconstructing Old Chinese, paralleling the evolutionary trajectories of Cantonese and Fukienese. These features, when examined through the lens of Sinitic-Vietnamese etymology, point unmistakably to Yue origins. (Wang, Li. 王力. 1948.)
As previously noted, one reason Austroasiatic theorists may have opted to construct a new framework was the convenience of starting afresh—rather than acknowledging the longstanding theorization of Yue roots. Estranged from the world of Chinese historical linguistics, intentionally or not, they overlooked records where Yue languages were clearly identifiable. Regardless of the Mon-Khmer parallels, Vietnamese, Cantonese, and Fukienese undeniably share a common Yue ancestry rooted in ancient China South.
Elements of historical Yue languages once spoken throughout China South are also embedded in other major Chinese lects, including northern Mandarin and southern Wu. Yet Vietnamese—despite its unmistakable Sinitic dominance, absent only the square-script orthography—was reclassified under the Austroasiatic Mon-Khmer hypothesis, a reassignment that does not, however, negate the Sino-Tibetan classification already applied to its dialectal counterparts across China South.
Moreover, there are undeniable cognates across many Sino-Tibetan etymologies. This paper will continue to address these ancestral roots and clarify their linguistic affiliations by examining basic words in Vietnamese that align with Sino-Tibetan etymologies. To expedite this process, certain linguistic premises will be assumed as known to journeyman linguists, such as standard sound change rules: 蒜 suàn (SV toán) ~ VS 'tỏi' (garlic), 鮮 xiān (SV tiên) ~ VS 'tươi' (fresh), or 團圓 tuányuán (SV đoànviên) ~ VS 'sumvầy' ('union'), without further elaboration on conditions like [s-, x- ~ t-] for the former lexemes, [-n ~ -i] or [t- ~ s-], [y- ~ v-] for the latter word, etc.
Readers will later observe that the same rationale used to justify cognacy between Vietnamese and Austroasiatic languages applies equally to etyma found in Sino-Tibetan languages. The validity of these connections stands on equal footing. That is to say, if certain Vietnamese words are accepted as cognate with Mon-Khmer forms, the same principle applies to their Sino-Tibetan counterparts—depending only on which theoretical framework was adopted first.
In fact, the extent of cognacy is striking. As will be detailed in Chapter 10 on the Sino-Tibetan etymologies, over 420 core items have been identified, each bolted down with fundamental lexical evidence. These shared etyma span essential semantic categories—body parts, kinship terms, natural elements, numerals, and basic verbs—that permeate not only the languages of China South and Southeast Asia, but extend into East Asia as well. Their distribution suggests a deeper, regionally integrated linguistic heritage that transcends the boundaries imposed by modern classification schemes.
If the sole criterion for assigning a language to the Sino-Tibetan family is the presence of similar etyma without accounting for broader linguistic features then the Sinitic-Vietnamese lexicon alone would suffice to classify Vietnamese as a Sino-Tibetan language. Its intrinsic Sinicized features, structurally and semantically, mirror those of other recognized members such as Cantonese and Fukienese. By the same logic, the Austroasiatic argument that Vietnamese shares a parallel relationship with Mon-Khmer languages that is based primarily only on basic lexical items rests on an equally reductive axiom. This observation reiterates a point previously discussed: theoretical classification often hinges more on initial framing than on comprehensive linguistic evidence.
Anthropologically, aside from Haudricourt’s influential model of tonogenesis, the Austroasiatic camp has yet to produce historical records detailing how Khmer linguistic elements could have evolved into Vietnamese forms under its proposed framework. Culturally, Austroasiatic specialists have tended to marginalize Mon-Khmer entities from the broader narrative of Annam’s cultural synthesis, a synthesis deeply infused with Confucianism, Taoism, and Buddhism. These traditions, foundational to Vietnamese identity, are characteristic of Sinitic-centric languages and notably absent from neighboring Mon-Khmer linguistic environments.
Historical records indicate that Annam’s national development began its contact with the Champa Kingdom, which is an Indianized polity that later adopted Islam, during the Eastern Han Dynasty, as documented in Chinese annals. Champa, located south of Annam’s border, occupied the territory once held by its precursor state Lâmấp (Linyi 林邑, c. 197–750 A.D.). The Champa Kingdom endured from the 8th to the early 18th centuries, serving as a chronological buffer between Annam in the north and Khmer polities in the south—both before and after Annam’s emergence as a sovereign state in 939.
In a unprecedented study, Michel Ferlus (2012) posits that the ancient Annamese language may have exerted influence westward, spreading from northeastern Annam to southwestern regions and even into India. This diffusion, he suggests, accounts for loanwords found in Mon-Khmer and Munda languages within the Austroasiatic framework, likely transmitted along trade routes active between the 3rd and 8th centuries. His theory exemplifies the value of interdisciplinary approaches in linguistics and history—particularly the latter, which remains a conspicuous omission in the Austroasiatic Mon-Khmer hypothesis. (M).
To set the record straight, whether one subscribes to the Austroasiatic theory or not, it matters little whether the proto-Vietic languages in ancient times originated from the Yue stock or the Austroasiatic Mon-Khmer family, as posited by Western-trained linguists (K). What truly matters is the holistic composition of Vietnamese as a living language—an organic totality in which all attributes function together as an integrated system. To better grasp this point, consider English as a parallel case. Historically, English has absorbed foreign elements with remarkable openness, layering them atop a relatively modest native core. When we examine modern English, we do not isolate its Anglo-Saxon, Welsh, Scots, Gothic, or Germanic foundations from its Norman, Romance, and Greek influences. Instead, we recognize English as a unified linguistic entity—one shaped by both native and foreign contributions.
By analogy, Vietnamese must be approached with the same comprehensive lens as described above. Its linguistic truth is not confined to any single ancestral lineage, but revealed through the way the entire system presents itself in its modern form: a dynamic, multifaceted language shaped by centuries of cultural and lexical convergence.
In analyzing the traits of a living language, we must also consider the racial composition of its speakers, many of whom may not speak the indigenous languages of the regions they inhabit. In Vietnam, the 3,260-kilometer S-shaped geopolitical map was formed incrementally over two millennia. The Kinh majority emerged through intermarriage with local populations such as Chams, Mon, Khmer, Chinese (especially Teochew), among others, but did not adopt the local languages of the annexed territories. This is evident in the Vietnamese–Khmer dynamic, where Khmer roots were imposed upon ancient Annamese populations, yet the Vietnamese language remained distinct. Vietnamese has never been a "pure" language; etymologically, it is a hybrid Sinitic language.
Globally, there are well-known cases of creole or outlander languages becoming mother tongues. Haitian-French, spoken by descendants of Amazonian slaves, and English, spoken by Jamaicans, Bahamians, and other Afro-Caribbean communities, are examples where linguistic affiliation does not align with genetic ancestry. Analogously, although Vietnamese absorbed Mon-Khmer basic words through contact with local groups—Thai blanc and noir, Daic, Zhuang, Chamic, Mon, Khmer—the Khmer language spoken by modern Cambodians had no linguistic affiliation with ancient Annamese prior to the 10th century. The aboriginal language spoken by the Trưng Sisters and their contemporaries' uprisings against the Han's Chinese 2,000 years ago was certainly distinct from modern Vietnamese as spoken in Hanoi today. Thus, the racial-linguistic foundation of Vietnamese was uprooted in prehistoric times, and Mon-Khmer contributions remain limited to loanwords.
Regarding its hybrid nature, Vietnamese is structurally a composite language built on a phonological model of {(C)+V+(C)} sequences. It blends a Yue core with later Sinitic overlays, which themselves may contain earlier Taic substrata, ancestral to the mother tongues of China South and beyond, including languages once spoken by the Chu populace. Over the past 800 years, following Annam’s annexation of neighboring states, Vietnamese also absorbed Chamic and Mon-Khmer lexicons, though these contribute only marginally to its etymological inventory. Many of these are basic words, which Austroasiatic specialists have emphasized in formulating the Mon-Khmer hypothesis.
In fact, attempts by Austroasiatic theorists to align Mon-Khmer languages with Vietnamese have focused on elaborating basic-word cognates. However, this approach is necessary but insufficient, though.
Grouping languages based solely on basic vocabulary overlooks deeper structural and historical factors. At best, such languages may be categorized as distant affiliates from a prehistoric past, shaped by early contact through trade and barter.
Over the last two millennia, Chinese components have merged into Vietnamese with documented historical continuity. Therefore, linguistic classification must consider not only basic vocabulary but also the unique traits embedded in each word—its phonological DNA, tonal system, syllabicity, and morphosyntactic structure, so to speak. These features are so distinctive that few Austroasiatic elements match Vietnamese across all dimensions. Such intrinsic commonalities are absent even in most Sino-Tibetan languages, e.g., Tibetan vs. Chinese, let alone in Austroasiatic Mon-Khmer, because, it is obvious that language changes.
Terminologically, the term "Sinitic-Yue" is grounded in the historical concept of "Yue" (越), as recorded in Chinese annals. The term "Viet" is avoided here due to its potential misnomer status and phonetic ambiguity, despite its resemblance to ancient pronunciations like /wjat/ or /jyet/, as in Cantonese 粵 /jyut6/. "Yue," like "Sinitic," is adopted for its academic precision, representing all descendants of the ancient Yue. This designation helps counter arguments that Vietnamese and Chinese share linguistic features such as tones, syllabic segments, etc., only due to superficial proximity. In fact, the term "Yue" has long been used to classify Southern Chinese dialects with shared origins.
Under the premise that "Sinitic-Yue" constitutes a legitimate linguistic entity, this paper demonstrates that Vietnamese is structurally and functionally akin to a Chinese dialect. Its etymology, tonal system, phonology, syllabicity, lexical stems, morphemic suffixes, grammatical markers, classifiers, particles, and instrumental prepositions all align with Sinitic norms.
In addition, Vietnamese morpho-syllabic stems independently generate vast localized vocabulary, e.g., 訂婚 dìnghūn ~ 'đámhỏi' (marital engagement), 嫁娶證 jiàqǔzhèng ~ 'giấygiáthú' (marriage certificate), etc.. These linguistic traits are interchangeable with those of Cantonese and Fukienese, and together they form an integral whole. Vietnamese tentatively for now can be grouped with southern Chinese dialects within the Sino-Tibetan family before getting into details.
With this recognition, the author proposes a distinct linguistic class: the Sinitic-Yue branch, as outlined in the previous chapter. This branch cascades within the Sinitic sub-family and stands on par with other Yue-rooted languages, including Cantonese and Fukienese, which may be classified simultaneously under both Sinitic and Yue.
Speculatively, this concept may be extended to encompass the linguistic roots of indigenous languages spoken by ethnic groups descended from the Yue 越 (or 粵), known historically as BǎiYuè (百越 or SV BáchViệt ), or Namman (南蠻, "Southern Barbarians") in ancient Chinese records. These Yue descendants include the Zhuang (壯族, Tráng or Nùng in Vietnamese), the largest minority in China South today, as well as the Dai people of North Vietnam, Laos, and Thailand. Their racial stock has been innovatively classified as “Austro-Thai” by Benedict (1975), further supporting the interconnectedness of the Sinitic-Yue linguistic continuum.
For the terminology "the Austroasiatic linguistic family," in order for the Austroasiatic Mon-Khmer theorization to retain its merit within our comparative framework, it must be situated within historical contexts that accommodate both sides of the narrative, ours and theirs. The entire 'Austro-' perspective, a legacy from the previous century, has contributed to the convergence of the idea that ancient Yue speakers may have descended from a common ancestral population originating in Southeast Asia or even the southern hemisphere.
However, recent archaeological discoveries have introduced new complexities. Human skeletal remains dated to over 40,000 years ago—far older than the 10,000-year-old specimens found in southern Indonesia—have been unearthed just 50 kilometers southwest of present-day Beijing. This northern locus suggests the possibility that the earliest Asian populations may have originated in the northern sphere (北), thereby offering a potential link to what is referred to as 'proto-Tai' and its Taic affiliations with the arrival of nomadic proto-Tibetan groups from the southwestern corridor.
The term Taic (including proto-Daic and proto-Yue) refers to the indigenous racial stock that once inhabited the southern region of present-day China and may have diversified into numerous distinct ethnic groups. If the historical BaiYue (百越, or "One Hundred Yue Tribes" as commonly referenced) recorded in early Chinese annals are anthropologically the "Bod" accurate as postulated by Terrien De Lacouperie. (The Languages of China Before the Chinese. 1887), they likely encompassed ancestral populations of the Zhuang and Daic peoples (泰族) known today.
These groups have also been associated with other native communities across China South, occupying a vast expanse below the Yangtze River since prehistoric times. Similar to the Austroasiatic hypothesis, this theorization is built primarily on analogy and inductive reasoning drawn from historical records. The region south of the river's lower basin includes Anhui and Hebei provinces, extending eastward to modern Jiangsu, where the Wu dialect is spoken, and further south through Hunan and Guangxi provinces, ultimately reaching the Red River Delta in northern Vietnam.
Figure 1: Map of the ancient states in China
Source: Multiple sources in public domains on the internet
So far, we have not yet incorporated 'the mystic foreign people in Bashu State (巴蜀, SV Bathục) of ancient Sichuan' into the broader racial framework. However, archaeological excavations have uncovered artifacts suggesting that these now-extinct populations once possessed a highly advanced civilization in the remote past. Unfortunately, there is no definitive evidence linking the Bashu people of southwestern Sichuan to other ethnic groups across China South, including the postulated proto-Taic populations.
While the Chu subjects cannot be directly identified with either the ancient Bashu or proto-Tibetan peoples, historical records indicate that the Chu populace was ethnically mixed, comprising ancient Taic elements, the forebears of both the Daic and Yue branches. These proto-Taic groups contributed to the formation of the Qin-Han populations around the 2nd century B.C. The proto-Taic lineage also served as the ancestral foundation for the Taic-speaking subjects of Chu 楚 and the Yue populations of Zhou 周, Wu 吳, and Yue 越. These groups eventually intermingled with the earlier inhabitants of the Qin State 秦國 (778–207 B.C.), forming the ethnocultural basis of what would become the Chinese people. (Y)
In other words, ethnologically, the proto-Taic people had already diverged into smaller branches during prehistoric times, evolving into distinct ethnic groups before the Qin's invasion. As a matter of fact, each emerging tribal lineage would later govern one of the seven Yue polities, as recorded in Chinese historical sources spanning at least two millennia prior to the unification of the Qin Empire under Qin Shihuang (秦始皇), the first emperor of a unified "China," and subsequently, the Han Dynasty.
Following Qin's total victory and the consolidation of power, the subjects of the six other states in the eastern part of what is now China—namely Chu 楚, Zhao 趙, Qi 齊, Jin 晉, Yan 燕, and Han 韓, which had previously existed as vassal states under the Eastern Zhou (403–221 B.C.)—were absorbed into the Qin polity. These populations, direct descendants of "the proto-Chinese" {XYZ} who had migrated from the southwest, formed the demographic base of the Qin State 秦國 (221–206 B.C.). They subsequently merged with the Taic-descended populations of the Chu State to establish the Han Dynasty (漢朝, 206 B.C.–220 A.D.).
Figure 3: Map of the historical ancient Yue states
Source: Multiple sources in public domains on the internet
The process Chinese immigrants from mainland of China to Vietnam happened in the same fashion that repeated to both indigenes with the biometrics {2YMK} and emigrants {4Y6Z8H} who had previously lived or already long resettled in the northern part of today's Vietnam around the Red River Delta Basin before foot soldiers of the Han Empire — that also consisted of Yue populace from those states that fell under the umbrella of Qin Dynasty in the earlier period — came to invade the ancient Vietnamese northern piece of land. All at the same time, war-savaged immigrants followed them and altogether they as the new settlers who mixed up with the locals and made up the population of the ancient Annam.
In other words, all subjects of the Qin — biometrically postulated as "the Early Chinese" {X2Y3Z4H} — were fused within the 'racial melting pot' of the first unified empire, blending with the populations of the six other conquered states to form what came to be known as "the Chinese" {X4Y6Z8H}, a designation later adopted by the West to refer to "China." This newly consolidated entity became the "united states of the Middle Kingdom" (中華 Zhonghua) following the rise of the Han Dynasty, which ruled for the next 406 years and laid the foundation for all successive dynasties thereafter.
During the course of Chinese territorial expansion, many early native Yue groups—most notably the Zhuang, Dong, Yao, and Miao (known as Mèo in Vietnam and Hmong in Laos)—resisted the assimilative pressures of Han cultural integration (Sinicization) and retreated into mountainous enclaves. Over time, descendants of those who remained in isolation but refused collaboration with Han authorities were gradually displaced, migrating southward out of China South into Giaochỉ 交趾 (Jiaozhi; later renamed 交州 Jiaozhou, the region historically known as Annam 安南). These migrants were later intermixed with successive waves of racially diverse immigrants arriving from the north.
Upon reaching the outer frontier, Han conquerors and colonists, initially sojourners, were often compelled to settle permanently in these territories to fulfill imperial China's 'national policy' which was still being enforced as late as 787 during the Tang Dynasty (Bo Yang, Zizhi Tongjian, Vol. 56, p. 83). These new settlers inevitably intermarried with local populations {2YMK}, either due to limited availability of Han women or through integration with other waves of mixed-stock Han immigrants {X4Y6Z8H} from both China North (華北 Huabei) and China South (華南 Huanan), who followed the military expeditions into the frontier prefecture of Annam.
From this convergence emerged the local "Kinh" {4Y6Z8HMK} people, also known in modern Chinese as Jing 京 ethnicity, i.e., Vietnamese. This migratory and integrative pattern continued over the next two millennia, extending into the present day with over one million Chinese immigrants since 1995. For example, new Chinatowns have proliferated around industrial zones across Vietnam in the past two decades. The descendants of these settlers multiplied over generations, gradually becoming part of the national population and forming the majority of the Kinh demographic in contemporary Vietnam.
The following table is designed to accommodate Austroasiatic without compromise the prominent position of the Yue polities with their very own Yue lects, on par with the former entities.
Table 1: Outline of the isoglottal languages in China South
-
1.0 Taic Languages
-
1.1 Austroasiatic Linguistic Family
- 1.1.1 Mon-Khmer Languages
-
1.2 Yue Languages
- 1.2.1 Zhuang Language
- 1.2.2 Daic Language
- 1.2.3 Miao Languages
- 1.2.4 Maonan Language
-
1.2.5 VietMuong Languages
- 1.2.5.1 Muong Dialects
- 1.2.5.2 Vietic Language
- 1.2.6 Proto-Cantonese (NanYue)
- 1.2.7 Proto-Fukienese (MinYue)
- 1.2.8 ... etc.
- 1.3 Proto-Sinitic Languages
-
1.4 Sinitic-Yue Languages
- 1.4.1 Ancient Annamese
- 1.4.2 Sinitic-Vietnamese
- 1.4.3 Vietnamese
- 1.4.4 ... etc.
-
1.1 Austroasiatic Linguistic Family
-
2.0 Sino-Tibetan Linguistic Family
-
2.1 Archaic Chinese
- 2.1.1 Old Chinese
-
2.2 Ancient Chinese
- 2.2.1 Chinese Dialects (Fukienese, Wu Dialects, etc.)
- 2.3 Early Middle Chinese
-
2.4 Middle Chinese
- 2.4.1 Cantonese Dialects
- 2.4.2 Sino-Vietnamese
- 2.4.3 ... etc.
- 2.5 Early Mandarin
-
2.6 Mandarin
- 2.6.1 Northwestern Mandarin
- 2.6.2 Putonghua
- 2.6.3 Northeastern Mandarin
- 2.6.4 Southwestern Mandarin
- 2.6.5 ... etc.
-
2.7 Cantonese
- 2.7.1 Guangzhou Dialect
- 2.7.2 Taishan Dialect
-
2.8 Fukienese
- 2.8.1 Xiamen Dialect
- 2.8.2 Hainanese Dialect
- 2.8.3 Chaozhou Dialect
-
2.9 Wu Dialects
- 2.9.1 Wenzhou Dialect
- 2.9.2 Shanghainese Dialect
- 2.10 ... etc.
-
2.1 Archaic Chinese
Under such positional circumstances, languages in "the Austroasiatic linguistic family" {1.1} (a anthropological value for symbolistically weighed hierarchy) had been formed out of Taic languages {1.0} some 6,000 years ago, long before the emergence of the Western Zhou (西周) Dynasty. In other words, they all had been stemmed from an ancestral proto-Taic linguistic form {1.0} supposedly spoken by the so-called "larger Taic indigenous people" and finally evolved themselves into linguistic forms of the Yue {1.2}, including those speeches currently spoken by the Zhuang, the Dai, the Miao, the Maonan, the Vietmuong, etc. {1.2.1, 1.2.2, 1.2.3, etc.}, while other branches had diverged into other Mon-Khmer languages included in what is now universally named as "the Austroasiatic linguistic family" {1.1.1, 1.1.2, 1.1.3, etc.}.
During the reigns of the Zhou kings, Taic glosses {1.0} had also found their way into, intertwined and interpolated, and merged with the Archaic Chinese (ArC) {2.1} and Old Chinese (OC) {2.1.1}, including Ancient Chinese (AC) {2.2} of the Later-Han, since its break-off from the Sino-Tibetan route {2.0} and evolved itself independently (see Brodrick 1942, Norman 1988, Wiens 1967, FitzGerald 1972). (cf. Tibetan and Sinitic linguistic cluster as opposed to Mon-Khmer and VietMuong cluster) Variants of this early form of OC {2.1.1, 2.1.2, 2.1.3, etc.} later were brought by the 'Han' foot soldiers and emigrants to have gone south all the way to Annamese land ("Tonkin") and then blended well gradually with the Vietic language {1.2.5.2} after it had separated from the Viet-Muong group.
Symbolistically, in a broader sense, on the one hand, Austroasiatic languages (1.1} may have the same footing with properties overlapped inclusively or even mean the same thing as the Yue languages {1.2}, which is covered under the Taic stage {1.0} before the emergence of the historical Yue {1.2} language. The implication of the concept of the historical 'Yue' is that it does not include Vietnamese as having had a direct genetic affinity with the Mon-Khmer sub-family that is what the Austroasiatic hypothesis is all about {1.1); therefore, the concept of "Austroasiatic" is engrossed in a 'union' with "the Yue languages". What is known as the Austroasiatic linguistic family was postulated by its theorists as an ancestral form of the Mon-Khmer languages that gave birth to the proto-Vietmuong and the later Vietnamese as commonly referred to by modern linguists. In other word, they all are descendant languages under the larger ancestral Austroasiatic linguistic family and the Vietnamese language was descended directly from the Mon-Khmer linguistic branch. That is misleading.
In a broader sense, if we begin with the premise that the Vietnamese language originated from the Taic family, the same framework would logically extend to other linguistic groups such as Zhuang, Daic, Miao, etc. {1.2.1, 1.2.2, 1.2.3...}. Within this view, it may be postulated that the Vietmuong sub-family diverged from the Yue mainstream several centuries ago, giving rise to the Vietic language, evidenced by residual Muong linguistic features embedded in early Annamese. Such a postulation would conveniently position the Viet-Muong group under the Austroasiatic linguistic umbrella {1.1}, placing it on par with other Mon-Khmer daughter languages {1.1.1, 1.1.2, 1.1.3, etc.}, including Vietnamese. This alignment would help account for the commonly cited Mon-Khmer basic words, while simultaneously preserving the etymological continuity of ancestral Yue forms {1.2} within the Taic languages mainstream {1.0}.
This approach is further justified by the relative ease of identifying commonalities between Vietnamese and Daic-Kadai languages, as opposed to the more distant Munda languages of India (see Henri Maspero's "Les Langues Mounda" in Les Langues du Monde, 1952, pp. 624–25).
As for the cognateness of basic vocabulary between Vietnamese and
Mon-Khmer languages, their etymological connections do not align
diachronically with the Sinitic synchronizing patterns under examination
here.
From a developmental standpoint, any Chinese lexical traces found in
Mon-Khmer languages, if present at all, are relatively recent, likely
introduced within the last 300 to 800 years, and plausibly transmitted
via trade routes through North Vietnam. Per Ferlus,
"By the period of 3rd-8th centuries, an ancient land trade route linked North Vietnam to the Gulf of Thailand. The circulation of traders and travelers along this route has left cultural and linguistic influences of Ancient China as well as Ancient Vietnam (under Chinese rule) through the Khmer area. (1) Some Chinese words, few but highly significant, were borrowed into Khmer, and later passed in Thai, (2) The names of animals of the duodenary cycle in Ancient Vietnamese were borrowed by the Khmer and are still used today, and (3) The syllabic contrast /Tense ~ Lax/ of Middle Chinese was transferred, with various effects, in Vietic, and thence in Katuic and Pearic." (Michel Ferlus, Linguistic evidence of the trans-peninsular trade route from North Vietnam to the Gulf of Thailand (3rd-8th centuries). 2012.)
Those loanwords from the Muong and Vietnamese in a contemporary setting might find their way into Mon-Khmer wordlist as cited in the Chapter 9 on the Mon-Khmer etymologies.
In the case of Vietnamese and its early formation, the historical backdrop begins with the loss of resistance wars against Chinese incursions. The freedom fighters followed native Muong groups who resisted Han rule and retreated into mountainous and remote southern territories. The ancient indigenous Vietmuong languages—originally spoken by Yue natives inhabiting regions from China South to the Red River Basin in northern Annam—eventually bifurcated into proto-Vietic and proto-Muong branches as a result.
Such a proposition could help clarify why the Muong language appears to contain more lexicons closer to Mon-Khmer besides what its speakers shared with Vietnamese. There is no contradiction in this triangular interconnection if we consider a parallel phenomenon between Chinese and Tibetan linguistic structures. Despite their genetic affinity, these are two distinct languages, especially in terms of core lexical inventories and grammatical architecture.
Meanwhile, the lingua franca of those who remained in lowland and coastal settlements and cooperated with Han occupiers underwent a process of fusion with evolving forms of Ancient Chinese. These Chinese variants already contained Taic-Yue lexical admixtures, inherited from the Chu State and later from the Yue of the NamViet polity, as previously discussed. This fusion was further reinforced by the arrival of Han colonists and successive waves of emigrants from China South to ancient Annam following its annexation in 111 B.C.
Food for your future linguistics doctorate thesis: "Could it have been that Vietnamese is the result of 'pidginization' of some form of Chinese vernacular starting from the Han Dynasty?Bthat time, the early ancient Vietic language had already absorbed a substantial layer of Old Chinese vocabulary, particularly from vernacular Mandarin, and likely began to take shape following its divergence from the Vietmuong group. In effect, under the Han Empire, Vietnamese evolved around the Chinese nucleus. Ancient Chinese elements were adopted and repurposed as lexical raw material by early Vietnamese speakers to coin new terms in the emerging Vietic language—used both by native Annamese and later Chinese resettlers.
Han colonial agents, including administrative officials and stationed soldiers, were gradually assimilated into the local population through intermarriage with native women. Over centuries of sustained Sinicization, they also adopted local customs. This prolonged period of cultural fusion is reflected in the presence of Ancient Chinese etyma in Vietnamese, supporting the argument for deep lexical integration. Examples include: vuquy (于歸 yúguī, 'bridal nuptial'), goábụa (寡婦 guǎfù, 'widow'), trờinắng (太陽 tàiyáng, 'sunshine'), trăngrằm (月圓 yuèyuán, 'full moon'), cửasổ (窗戶 chuānghù, 'window'), xecộ (車子 chēzǐ, 'carriages'), among others. These terms were widely used and embedded in the speech of common people, nurturing the genesis of ancient Annamese. Their prevalence and semantic depth underscore the plausibility of cognacy between Ancient Chinese and Vietnamese basic vocabulary far exceeding, in both quantity and integration, what is found in other Austroasiatic Mon-Khmer languages.
The newly emerged Annamese was characteristically unique, shaped by the habitual speech patterns of its speakers, particularly what may be termed Yue grammar, typified by the syntactic structure {modified + modifier}. This pattern parallels the grammar of Zhuang and other Daic languages and stands in contrast to the syntactic models of Munda or Mon-Khmer languages (see Henri Maspero, Les Langues du Monde, 1952).
Vietic speech was further enriched and reshaped by descendants of racially mixed Yue-Han immigrants who arrived in successive waves, as recorded in Chinese historical sources during the period when Annam—then Giaochỉ County (交趾郡) of the Greater Giaochâu Prefecture (交州)—remained under Han rule. Their language reflected a blend of earlier Vietic forms and various Han dialectal pronunciations from different periods. Lexical residues such as Bụt = 佛 Fó (Buddha), bụa = 婦 fù (wife), khơi = 海 hǎi (sea), buồng = 房 fáng (room), giường = 床 chuáng (bed), tủ = 櫝 dú (bedhead case), đũa = 箸 zhú (chopsticks), thìa = 匙 chí (spoon), etc., are foundational Vietnamese words, indisputably plausible cognates with their Chinese counterparts across time. (See Wang Li's 安南譯語 'Annamese translated glosses', Bùi Khánh Thế in Appendix I, Nguyễn Tài Cẩn's Nguồn gốc Hình thành Cách đọc Âm Hán-Việt 'Origin of the formation of Sino-Vietnamese pronunciation', and Appendix H)
While the question of whether the Mon-Khmer affinity of Vietnamese is valid remains open to debate, the primary rationale here is to challenge the Austroasiatic theory that posits a Mon-Khmer origin for the Vietnamese language. What this inquiry truly centers on is the cognateness of fundamental Vietnamese vocabulary items that appear across various Mon-Khmer languages. Such an argument inevitably raises questions like "who borrowed what from whom?" and similar lines of inquiry.
Amusingly, a significant portion of these same etyma also turn out to be cognate with Chinese and other Sino-Tibetan languages. These are what we designate as Sinitic-Vietnamese vocabularies. The phenomenon is largely attributed to centuries of trade and migratory contact, as suggested by Ferlus (2012), extending well into the late 18th century.
Nonetheless, there are no definitive historical records documenting these affiliations—only a limited set of basic words preserved under the umbrella of prehistoric rhetoric. The new findings presented in this research are intended to remain open for further scholarly discussion and investigation.
III) On the Relativity of Historical Phonology and the Limits of Reconstruction
With respect to the etymology of Vietnamese doublets, words sharing a common root, there must have been an incubation period prior to the emergence of derived forms in a union of cognacy. At first glance, many of these items appear to be "pure Vietnamese" or indigenous. However, upon closer examination, they may descend from a pre- or non-Chinese substrate, here designated as [X].
For example:
- VS sông (river) <~ { Taic ~ [X] ~ Yue } ~> 江 jiāng (C 'river')
- VS sọ (cranium) <~ { Taic ~ [X] ~ Yue } ~> 首 shǒu (C 'head')
This stands in contrast to the later development of the doublet 頭 tóu, which is also SV đầu (head). Interestingly, for the same Chinese character 首 shǒu, the corresponding ancient Vietnamese form is VS trốc. The SV đầu 頭 tóu, as found in the disyllabic compound 頭腦 tóunǎo, gives rise to both SV đầunão (headquarter) and VS đầunậu (ringleader), the latter of which means 'headquarter' in Chinese.
Some etyma may appear speculative but reveal layered cognacy upon analysis. For instance:
- VS ngà 牙 yá (SV nha) for C 'tooth' vs. V 'tusk'
- VS răng ('tooth') cf. OC */run/ vs. 齡 líng (C 'age')
(See Tsu-lin Mei’s discussion on 牙 in Appendix G.)
In other cases, historical perspective helps determine the chronological order of emergence. Consider:
- VS lẽsống (raison d’être) ~ SV lýtưởng 理想 lǐxiǎng
- SV sinhhoạt (生活 shēnghuó, 'living activities') ~ VS cuộcsống (life), where 生 shēng corresponds to VS sống and 活 huó to VS cuộc.
Additional doublets include:
- VS bậnviệc (busy) ~ 忙活 mánghuó
- VS loliệu (handle) ~ 料理 liáolǐ
Each character—生, 活, 忙, 想, 料, 理—carries multiple etymological meanings and aligns closely with the semantic content of its Vietnamese counterpart. Notably, in Japanese, 料理 also conveys the concept of 'cooking'.
In contrast, Vietnamese etyma in relation to Mon-Khmer languages rarely allow for such layered analysis. Direct basic cognates are difficult to establish, with the exception of a few items, such as numerals from one to five, that are assumed to be of Mon-Khmer origin.
Rather than applying a blanket one-to-one model to sound change patterns in surveyed etyma, syllabically and phonologically, it is unrealistic to assume that a single Chinese word corresponds to only one cognate in the target language. Exceptions abound, and recognizable patterns do emerge, such as initial /s- > t-/, /sh- > th-/, /th- > t-/, or phonemic shifts like /san/ > /tam/, /sui/ > /tuy/, /shui/ > /thuy/, etc. While such tabulation methods are valid in Sinitic-Vietnamese historical linguistics, they also apply to foreign lexical elements absorbed into Vietnamese from French, English, and other non-Yue Austroasiatic languages, including Munda and Austronesian. Yet, Chinese–Vietnamese phonological correspondences also exhibit unique deviations, such as /l- ~ s-/, /l- ~ d-/, /b- > s-/, /s- > r-/, /h- ~ t-/, /j- ~ m-/, and others.
These irregularities underscore the need to distinguish genuine cognates from coincidental lookalikes—e.g., English 'six' vs. VS sáu, or 'cut' vs. VS cắt. Similarly, Khmer numerals from one to five may align with Vietnamese forms either by coincidence or as loanwords, much like the subset of animal names in the cyclic zodiac table. For regional languages, the challenge of determining shared linguistic ancestry extends beyond Chinese–Vietnamese parallels to include Malay 'mat' vs. VS mắt (eye), Chamic ni ~ VS nầy (this), nớ ~ VS đó (that), tê ~ VS đấy (there), etc. In many cases, individual analysis is unavoidable.
While foreign elements are excluded for methodological clarity, Old Chinese phonology must be deliberately considered when examining Chinese–Vietnamese etymology. Both languages exhibit irregular sound change patterns that support plausible cognacy. Specialists in Chinese historical linguistics have long embraced this approach, recognizing that languages evolve over time. With more than half a dozen Old Chinese reconstruction models now available—each developed by distinguished scholars—the question arises: which framework best serves the Sinitic-Vietnamese study?
Selecting a single reconstruction model is not straightforward. One may be tempted to favor a particular scholar whose system aligns with a pre-established hypothesis. For instance, the author has repeatedly resisted the temptation to rely solely on Pulleyblank’s 1984 reconstruction, despite its compelling consonantal vocalism that mirrors modern Vietnamese closed rounded finals—e.g., không /k'owngw1/ for 空 (empty), học /hɔwkw8/ for 學 xué (study)—which no Chinese dialect replicates with such precision. Many more cases remain to be explored.
Unless otherwise noted, the author will present a modified reconstruction based on multiple sources, integrating and adapting various models. This approach may disappoint readers expecting a conventional module on Old Chinese phonology. Instead, this paper offers an unconventional presentation of variably ancient sound values, given that every reconstructed sound has been debated and no single value is universally accepted. This reflects the dynamic nature of synchronic sound change.
Take 羅 luó (net), which has yielded SV la, VS rọ, chài, lưới, chàilưới. Old Chinese reconstructions include: (1) *la (Coblin, 1983) (2) *jraih (Norman, 1957) (3) *lâ (Karlgren, GSR:6, 1957) (4) *lar (Li, 1976) (5) *raj (Schuessler, 1987) (6) *la (Zhou, 1973)
I opt for /jraih/, as this vocalism plausibly yields chài (net-fishing), and other variants of 羅 luó, including SV la and VS lưới. For 籮, which shares the phonetic stem 羅 (cf. 維 wéi), I retain the Mandarin luó, aligning with VS rỗ (basket) and rọ (bamboo net). These Vietnamese doublets may be late loanwords from vernacular Mandarin and are thus classified as Sinitic-Vietnamese etyma of Chinese origin.
In the case of 羅 luó as VS lưới (net), the phonetic similarity suggests another loan from an earlier vernacular Mandarin form meaning "sieve"—or possibly the reverse. Given that ancient northern Chinese were primarily nomadic and less adept at aquatic practices, while southern Chinese excelled in net-fishing, the compound chàilưới likely reflects a southern innovation. The form /jraih/ may also underlie chài, making it an older variant embedded in the compound structure.
This revision of reconstruction methodology is not contradictory. Each model represents a variation from a shared ancient root in Old Chinese. In this case, we may assume a common origin in [lwo].
Phonetically, all these variants could plausibly derive from /jraih/—the most convincing sound value when compared across reconstructions. This root links modern Chinese luó and VS chài (net). The archaic form [X], closely cognate to modern Vietnamese, gave rise to chàilưới, a compound formed through synonymous syllable pairing. Like many Vietnamese two-syllable compounds, both chài and lưới function as verbs and nouns.
The sound change analysis of chàilưới follows my revised reconstruction approach, which treats ancient Chinese phonology with flexibility—recognizing that sounds shift across time and geography. No matter how rigorous a reconstruction may be, as evidenced by the dozen models from renowned linguists, no single version can claim absolute precision for how a character was pronounced in a specific locality centuries ago.
Historical phonology is inherently relative. Sound changes may occur internally within a language, and a given phonetic value may be valid in one time and place but not in another. Reconstruction models, therefore, are best understood as suggestive frameworks, representative of hypothesized sound systems within specific localities and historical periods. For any given Chinese character, multiple versions likely existed, each with its own interpretation of archaic pronunciation. A plausible reconstruction merely reflects the most widely accepted value derived from a particular dataset, often based on rhyme books, phonetic glosses, or Buddhist canons.
Take 羅 luó (net), for instance. Its pronunciation varies across modern Chinese dialects, even within the same subdialect. For the core phonetic value of 羅 in 籮 luó, we may assume a base form */xxx/, which evolved into Mandarin luó and yielded multiple Vietnamese doublets for "basket": rỗ, sọt, rõ, rọ, rá. These sound changes likely occurred across different temporal and spatial settings. Notably, the Middle Chinese value of 羅 during the Tang Dynasty was /la1/, as in 羅漢 Luóhàn ("arhat"), corresponding to SV Lahán.
To further illustrate, consider 車 chē. In the 後漢書 Hòu Hànshū, 車 is glossed as [tɕy] ("車 讀若 居"), suggesting a reading akin to 居 jū (SV cư). Today, Chinese chess players still refer to the 車 piece as [tɕy], reinforcing this phonetic continuity. The ideograph 居 carries the phonetic component 古 gǔ (SV cổ), which has given rise to Vietnamese terms related to 'carriage': cộ, xe, xecộ, cỗ, cỗxe—all derived from 車 chē (SV xa). These interpretations, drawn from various scholars, represent relatively approximate reconstructed */X/ values.
Additional evidence is found in quartets of 車 chē such as 輂 jù, 輋 jù, 檋 jù, 梮 jù, which correspond to VS cộ, as in 輂車 jùchē ~> VS xecộ (carriages). This compound formation—"車 chē (xe)" + "車 chē (cộ)"—conveys the modern concept of 'automobile'. This model of vocabulary development parallels the case of chàilưới (net-fishing), where chài aligns with /jraih/ and is synonymous with lưới. Together, they form a compound construction. Diachronically, chài predates lưới, which in turn predates SV la and Mandarin luó. The latest sound value likely reflects Middle Chinese pronunciation, which synchronically gave rise to Vietnamese forms like rỗ, rọ, etc.
Of course, reconstructed compound forms must adhere to certain constraints. For instance, 山水 is equivalent to SV sơnthuỷ and VS nonsông, but hybrid variants like sơnsông or nonthuỷ cannot be formed under the {VS+SV} or {SV+VS} model unless organically adopted by speakers.
As Axel Schuessler (1987, p. xi) aptly noted, “all of them are hypotheses… most of them contain one or other idea which I believe ought to be taken into consideration when attempting to retrieve the Old Chinese language.” This proposition underpins our framework for reconstructing relative sound values across time and space. Each form, symbolized as */xxx/ and its doublet /XXX/, is drawn from diverse sources, rebuilt or adapted to match all plausible etyma in both Chinese and Vietnamese.
Rather than reinventing the wheel, we adapt the results of renowned specialists to suit the needs of Sinitic-Vietnamese studies. This is the most effective way to navigate the complexities of Old Chinese phonology without becoming mired in reconstruction disputes. Relying solely on one author’s model—whether our own or another’s—risks running into problems of phonemic reconciliation and etymological mismatch.
Consider the following examples:
- 季 jì (season) ~ SV quí /qwej1/ ~ VS mùa
- 貴 guì (expensive) ~ VS mắc ~ SV quí /qwej1/
- 活 huó (work) ~ VS việc ~ SV hoạt /hwat7/ ~ 務 wù (SV vụ) ~ 役 yì (SV dịch /zijt7/)
- 時 shí (time) ~ VS khi /k'ej1/ ~ VS giờ /jiə2/
These etyma may appear cognate across Chinese and Vietnamese, yet they resist fitting neatly into any single reconstruction system. A flexible, comparative approach—grounded in historical context and linguistic nuance—is essential for meaningful analysis.
CONCLUSION
TThis chapter has advanced a reclassification framework situating Vietnamese within a Sinitic-Yue sub-branch of the Sino-Tibetan family—distinct from the Austroasiatic Mon-Khmer hypothesis. The proposed lineage (Taic > Sinitic-Yue > Sinitic > Sinitic-Vietnamese > Annamese > Vietnamese) is supported by over 400 core cognates shared with Chinese, reinforced by historical records and archaeological findings.
Vietnamese literature, deeply Sinicized until the late 20th century, reflects enduring Tang-style aesthetics and classical Chinese imagery. Though modern writing has shifted toward colloquial expression, the linguistic substratum remains unmistakably Sinitic-Yue.
Archaeological data from Chu and Yue states, genetic studies, and classical texts affirm the presence of proto-Taic and Yue elements in Vietnamese, predating Mon-Khmer migrations and challenging the Austroasiatic narrative. The term “Yue” (越, 粵, 戉, 鉞) encapsulates a layered historical identity embedded in Vietnamese etymology.
Methodologically, this chapter critiques the Austroasiatic model for its reliance on outdated fieldwork and speculative etymons. It calls for renewed scholarly rigor—anchored in bilingual competence and historical linguistics—to advance a more accurate, evidence-based classification of Vietnamese within the Sinitic-Yue continuum.
This reclassification not only reframes Vietnamese linguistic identity but also invites a renewed scholarly dialogue on the deeper Sinitic-Yue continuum—one that will be further explored in the next chapter.
ENDNOTES
(S)^ 商朝又稱殷、殷商(約前十七世紀至約前十一世紀),是中國第一個有直接且同時期文字記載的王朝。商朝前期屢屢遷都,而最後的二百七十三年,盤庚定都於殷(今中國安陽市),因此商朝又稱殷朝。有時也稱為殷商或殷。
商朝晚期,中國的歷史由半信半疑的時代過渡到信史時代。商是中國歷史上繼夏朝之後的一個朝代,相較於夏,具有更豐富的考古發現。
原夏之諸侯國商部落首領商湯率諸侯國於鳴條之戰滅夏帝國後建立。歷經十七代三十一王,末代君王商紂王於牧野之戰被周武王擊敗而亡。 https://zh.wikipedia.org/wiki/商朝 )根據《嶺南摭怪》中的越南傳說,中國殷代時,雄王因「缺朝覲之禮」,而招致殷王率兵來襲(又稱「殷寇」;而《大越史記全書·外紀·鴻厖紀》則記載為「雄王六世」時期「國內有警」)。正當大軍壓境之際,仙游縣(或作武寧縣)扶董鄉有一位三歲童子自動請纓,率領雄王軍隊前往殷軍陣前,「揮劍前進,官軍(雄王軍)隨後」,殷王陣前戰死,而童子亦隨即「脫衣騎馬升天」。其後,雄王尊該童子為「扶董天王」,立祠祭拜。
然而,近代越南學者陳仲金(Trần
Trọng-Kim)以實事求是的態度指出,中國殷朝入侵的傳說「實屬謬誤」,理由如下:「中國殷朝位於黃河流域一帶,即今之河南、直隸、山西和陝西地區。而長江一帶全為蠻夷之地。從長江至我北越,路途甚為遙遠。即使當時我國有鴻厖氏為王,無疑也不會有什麼紀綱可言,無非像芒族的一位郎官而已,因此他與殷朝無任何來往,怎能引起彼此間的戰爭?而且,中國史書亦無任何記載此事。因此,有何理由說殷寇就是中國殷朝之人呢?」因此,陳仲金將之視為「有一股賊寇稱為殷寇」而已。
(Source: https://web.archive.org/web/http://baike.baidu.com/view/1854748.htm)
[UNLESS LACVIET HAD BEEN PART OF THE ANCIENT CHU STATE(?) While they are about some legends of Thanh Giong, we focus only the
linguistic aspect of the matter here. Howerver, there exist evidences that
the ancient Vănlang state had already been in contact with the Shang
Dynasty with the Shang's 10th century B.C. bronze artifacts found in Hunan
Province. ] In Chinese group to bring relic back to Hunan, by Lin Qi,:
"A 3,000-year-old Chinese bronze, called min fanglei, will soon return to
its birthplace to be reunited with the lid from which it was separated
nearly a century ago. The reunion was made possible by a private purchase by
Chinese collectors on April 19 in New York. Acclaimed as the "king of all
fanglei", the square bronze, which dates to the Shang Dynasty (c.16th
century-11th century B.C), served as a ritual wine vessel. It was excavated
in Taoyuan, Hunan province, in 1922." (Source:
https://web.archive.org/web/http://www.chinadaily.com.cn/cndy/2014-03/21/content_17366159.htm)
(Remarks
in between [ ] and the string 'https://web.archive.org/web/' are made and
added by dchph.)
(M) Michel Ferlus, Linguistic evidence of the trans-peninsular trade route from North Vietnam to the Gulf of Thailand (3rd-8th centuries). 2012.
(K) When one sees there are Mon-Khmer elements in Vietnamese, it is easier to say that Vietnamese originated from the Mon-Khmer linguistic family whether initially it originated from the same root as those of Mon-Khmer languages or not. However, most of the specialists of Vietnamese prefer the other and this is where all the debates started even though one could still say Vietnamese loanwords exist in other Mon-Khmer languages (Ferlus, 2012). See more in section Chapter 8 - The Mon-Khmer Association.
(北) The Tianyuan specimen, a partial
human skeleton, was unearthed with abundant late Pleistocene faunal remains
in 2003 in the Tianyuan Cave near the Zhoukoudian site in northern China,
about 50 km southwest of Beijing. The skeleton was radiocarbon-dated to
34,430 ± 510 years before present (BP) (uncalibrated), which corresponds to
~40,000 calendar years BP. A morphological analysis of the skeleton confirms
initial assessments that this individual is a modern human, but suggests
that it carries some archaic traits that could indicate gene flow from
earlier hominin forms. The Tianyuan skeleton is thus one of a small number
of early modern humans more than 30,000 years old discovered across Eurasia
and an even smaller number known from East Asia.
The Tianyuan skeleton, unearthed with abundant late Pleistocene faunal remains in 2003 in the Tianyuan Cave near the Zhoukoudian site in northern China, about 50 km southwest of Beijing. (Image by GAO Xing)
Source: DNA Analyses Show Early Modern Human 40000 Years Ago in Beijing Area Related to Present-Day Asians and Native Americans (posted 1/27/2013 Chinese Academy of Sciences)
(Y) As we have learned, at the fall of the Qin Dynasty, Liu Bang (劉邦) and his generals of the resurrected Chu State (楚國) were exiled from into the region of Hanzhong (漢中); hence, the name of the Han Dynasty (漢) was picked by Liu Bang after he and his troops defeated General Xiang Yu (項羽) of the Chu State .





