Languages Not Isolated Codes But Living Archives of History
"文 以 載道, 語 以 傳世. " (Writing carries the Way; language transmits the world.)
by dchph
Ethnic distinctions are more accurately traced through language than through biological race, a principle fundamental to understanding the ancient population structures of Vietnam and southern China. In prehistoric contexts, communities did not define themselves by physical appearance or biological typology, but by the languages they spoke, the customs they maintained, and the cultural patterns transmitted across generations. Language thus functions as a durable historical archive, preserving evidence of long‑term contact, borrowing, and convergence among neighboring groups.
The phonological and semantic strata embedded in Vietnamese reveal that its deepest affinities lie with Sinitic, Yue, and broader Sino‑Tibetan traditions. These connections provide far more explanatory power than frameworks grounded in biological race. Ethnic identity in this region cannot be understood through modern racial categories; it must instead be approached as a dynamic historical process in which language plays the central role in shaping, reshaping, and sustaining communities.
Repositioning Vietnamese within a wider East Asian linguistic sphere is therefore not a matter of methodological preference but a necessary step toward reconstructing Vietnam's population history and cultural identity. The present study outlines the key issues that emerge from this reorientation.
I) Language as cultural memory
From a historical perspective, both the Vietnamese and the Zhuang (Nùng) are recorded as "Bjet" or a form close to "Bod" (cf. 百越 BaiYue, 百姓 Baixing; see Terrien de Lacouperie, 1887). These groups represent the most distinct and well‑attested branches of the ancient Bai‑Yue populations, documented across numerous historical sources.
The descendants of the ancient Bai‑Yue peoples are recorded as various ethnic groups such as the Yue, Tong, Zhuang, Dai, Mon, Miao, Dao, and others. Yet the linguistic development of these communities followed markedly different trajectories.
Take the case of the Zhuang language: the largest minority population in Guangxi, China, numbering over 17 million people. Despite heavy Sinitic influence, it is still classified under modern linguistic frameworks as belonging to the Tai‑Kadai family, a branch entirely separate from Sino‑Tibetan.
The Zhuang are more an ethnic community than a unified linguistic entity as in the cases of Chinese Han and Vietnamese. In practice, Zhuang speech varieties display extensive non‑genetic influence, incorporating features from Zhuang, Tày (Daic), and Sinitic languages (Lan Hongyin, 1984, pp. 131–138). Most "urban Zhuang" living in major cities such as Nanning, Beihai, and Liuzhou speak Zhuang varieties that have been thoroughly Sinicized. Because of the divergence among regional lects, many Zhuang speakers even struggle to communicate across dialect boundaries.
Vietnamese identity, more than any racial factor, is defined primarily through language, with Sinitic elements playing a central role in unifying the community. Sinitic features such as syllable structure, tonal systems, and semantic development appear throughout Nôm, Sinitic‑Vietnamese, and Sino‑Vietnamese vocabulary. The phonological "melody" of Vietnamese, reflected in the adaptation of foreign toponyms, further reinforces these connections. For example, ancient Cham place names such as 'Vijaya' and 'Kauthara' were softened through Sino‑Vietnamese transformation into Quynhơn (歸仁 Guiren) and Nhatrang (牙莊 Yazhuang).
Meanwhile, Vietnamese is commonly classified today under the dominant Austroasiatic framework as possessing "Mon‑Khmer features". This simply illustrates that the differences lie not in race, but in the history of contact and the structural pathways of linguistic development.
II) Limits of the Austroasiatic framework
People of Khmer origin, by contrast, represent communities that speak Austroasiatic Mon‑Khmer languages and are classified more on the basis of ethnic ancestry than linguistic affiliation. Those belonging to this lineage typically identify themselves through the minority ethnic identity of the upland regions of Vietnam, distinguishing themselves from the Kinh majority and from neighboring Muong‑related groups. A Muong person generally has no difficulty identifying as Vietnamese, whereas a Vietnamese citizen of Khmer descent, even if born and raised among the Kinh and fully fluent in Vietnamese, may not consider themselves ethnically Vietnamese and instead self‑identify as Khmer. This distinction is rooted more in ethnicity than in language. For comparison, Vietnamese of Chinese descent present the opposite pattern: they often identify as Vietnamese after one or two generations.
Bilingual speakers of Khmer and Vietnamese frequently view their ethnic identity as tied to the Khmer community in Cambodia, while simultaneously recognizing themselves as Vietnamese citizens in the legal sense. This dual perspective illustrates the complex relationship between ethnicity, language, and nationality among minority communities in Vietnam.
The formation of Vietnamese place names also reflects linguistic divergence. If Vietnamese and Khmer were genetically related, the Vietnamese would not have needed to create entirely new toponyms such as Sóctrăng for 'Khleang', Càmau for 'Khmaw', Nam Vang for 'Phnom Penh', or Caomiên for modern 'Khmer'. Similarly, they produced adapted forms such as Sàigòn (西岸 Xī'àn, Cantonese /Sajngon/, meaning "West Bank"). By contrast, Sino‑Vietnamese place names like Tâyninh (西寧 Xining) or Bắcninh (北寧 Beining) correspond directly to their Chinese originals, reflecting a tightly aligned linguistic pairing.
Popular Vietnamese wisdom asks whether this is simply "putting the plow before the buffalo" (cáicày đặt trước contrâu) in preparation for the paddy field. The question is pointed: has the Austroasiatic Mon‑Khmer paradigm been constructed first, with the expectation that the data will eventually fall into place? The analogy recalls the familiar chicken‑and‑egg paradox, a reminder of how easily logic can be bent to serve a predetermined narrative.
This folk axiom also echoes an equally dubious assertion made by certain Western grammarians in the early twentieth century, namely, that Vietnamese possessed no grammar until French structures were introduced and adapted, as though the language only "came into being" through colonial intervention. Such a view fundamentally misunderstands the nature of language. Grammar does not create a language, just as a list of words does not constitute one. Languages exist prior to, and independent of, the analytical frameworks later imposed upon them.
The principal strength of the Austroasiatic Mon‑Khmer hypothesis lies in its claim that Vietnamese shares a core layer of basic vocabulary with Mon‑Khmer languages. However, as this study demonstrates, those very core items also appear in Sino‑Tibetan languages. This foundational lexical layer extends far beyond the scope of Austroasiatic basic‑word lists, encompassing both indigenous terms and regional areal vocabulary within the Vietnamese lexicon. Western specialists in the Austroasiatic Mon‑Khmer school recognized this pattern early on and continued to attract new generations of scholars into their framework. The expansion of academic participation in this field further reinforced the Mon‑Khmer classification. Newcomers to Vietnamese historical linguistics often repeat the narratives taught within academic settings, turning linguistic classification into a self‑perpetuating cycle.
If growing interest in this field opens the way for broader discussion, scholars may reconsider the classification of Vietnamese by placing Sino‑Tibetan etyma alongside Austroasiatic Mon‑Khmer word lists. This article introduces new evidence of shared etyma between Sino‑Tibetan and Vietnamese within the basic vocabulary, revealing linguistic relationships that extend beyond the Austroasiatic narrative. (See Vietnamese Parallels with the Sino‑Tibetan languages.)
We must re‑examine the basic‑word lists that Austroasiatic specialists have relied upon for more than a century to construct their hypothesis. In the early twentieth century, in order to strengthen their theory, Austroasiatic pioneers advanced counterarguments against the Sino‑Tibetan classification of Vietnamese, a view that had been widely accepted at the time (Meillet, A., 1952, pp. 526–27). Their method relied primarily on compiling and classifying Khmer vocabulary, what they termed "etymological harvesting" within Mon‑Khmer subbranches such as Bahnaric and Katuic, and then comparing these forms with Viet‑Muong languages such as Mường, Rục, and Thà Vừng. Austroasiatic scholars maintained confidence in their hypothesis, asserting that Vietnamese basic vocabulary aligned closely with Austroasiatic etyma found across numerous Mon‑Khmer lects.
However, once the initial enthusiasm surrounding the Austroasiatic framework subsided, a crucial fact became clear: these basic etyma are unevenly distributed across Mon‑Khmer languages. In other words, some lects preserve similar forms while others do not, suggesting that linguistic diffusion rather than shared genetic inheritance may account for the similarities. In certain cases, this phenomenon may stem from regional contact, particularly among Muong lects.
III) Migration, assimilation, and the formation of the Kinh
From a humanistic perspective, Chinese identity is defined more as a cultural structure than as a racial category. For centuries, the Chinese script has been regarded as the unifying element among the diverse peoples of China, enabling written communication across dialect boundaries. For example, residents of southern China in Jiangxi, Hunan, Guangxi, Sichuan, and Yunnan, speak southwestern varieties of Mandarin that differ from the dialects of the west or northeast, such as those of Shaanxi, Shanxi, or Shandong, and from southeastern varieties in Jiangsu or Zhejiang. Yet all of them can read written Chinese scripts – including speakers of Cantonese, Teochew, Hokkien, and Hainanese, and notably, even the Annamese in the early modern period. Vietnam itself produced original literary works written in Classical Chinese, such as Chinhphụngâm (Ode of a Soldier's Wife) by Đặng Trần Côn.
However, from a geo‑linguistic standpoint, the process of Sinicization has not been uniform across all ethnic groups within China. Mandarin, for instance, has had very limited influence on groups such as the Uyghurs of Xinjiang, the Inner Mongols, or the Tibetans, even though these regions were incorporated into China centuries ago, a process comparable to the period when Annam was under Han rule.
Vietnamese, meanwhile, is often viewed as a linguistic transformation of an ancient Yue language due to its early separation from the main Sinitic stream since 939 A.D. Yet this view is not absolute. Unlike Cantonese or Hokkien language spoken by communities that have remained in their ancestral homelands for millennia, Vietnamese followed its own developmental trajectory. Historically, although ancient Annam was once part of China, its linguistic evolution reflects the demographic composition of this southern region.
After separating from China and becoming an independent polity, the early Vietnamese migrated southward from the Red River region, establishing the state of Annam. They continued expanding southward, leading to the gradual incorporation of Cham and Mon‑Khmer elements, both ethnically and linguistically. Even so, the core population of the kingdom remained the descendants of earlier settlers, whose demographic structure had already taken shape during the period of Han‑ruled Annam. Over time, intermarriage with local groups accompanied territorial expansion, paralleling the assimilation processes carried out by the Qin and Han dynasties among the indigenous peoples of ancient northern Vietnam.
Although these ethnic dynamics are clearly visible, the issue has received little attention in Vietnamese‑focused humanities and linguistics. The Austroasiatic Mon‑Khmer hypothesis has shaped the narrative in such a way that Mon‑Khmer indigenous populations are assumed to have been "Vietnamized", rather than the reverse, a perspective likely influenced by the historical prestige of the Khmer Empire. Yet supporters of this hypothesis seem to overlook key historical events, particularly the arrival of settler groups from southern China. These "southern Han subjects", including soldiers and administrators, became localized and subsequently Sinicized the resident populations.
This process of admixture began in 111 BCE and continued for a thousand years into the Common Era. These very groups formed the core of the Kinh population, who later "Vietnamized" later waves of Chinese migrants, including the Minhhương (明鄉人) communities, descendants of Ming loyalists who fled Manchu rule. Hundreds of Ming refugee ships sailed south seeking asylum in Vietnam during the eighteenth century. Some groups initially settled in Cambodia, living among Khmer communities before relocating to Vietnam.
After the twelfth century, the Annamese continued migrating into central and southern regions, settling lands formerly belonging to the now‑defunct kingdoms of Champa and Khmer. These encounters created conditions for intense ethnolinguistic exchange, with communities borrowing vocabulary from one another. Basic Mon‑Khmer words entered Vietnamese through direct contact during this historical period (as Khmer vocabulary is absent from early Vietnamese literary texts).
The entry of Khmer vocabulary into Vietnamese parallels the way Yue words entered ancient Chinese. Evidence of these contacts appears in the uneven distribution of basic Mon‑Khmer vocabulary, present in some branches but absent in others. These words are used along the western mountain ranges of Vietnam, dating back to early phases of territorial expansion.
Despite the historical significance of these migration waves, linguists rarely discuss them. Instead, scholarly focus has centered on classifying Vietnamese within the Austroasiatic Mon‑Khmer family, a classification shaped more by academic fashion than by rigorous historical analysis. This framework relies primarily on identifying basic lexical correspondences between Mon‑Khmer and Vietnamese, emphasizing similarities such as the numerals from one to five.
From an ethnological perspective, this reflects historical precedent: migrant groups from southern China arrived only after they had already been Sinicized, and when they reached ancient Annam, it was itself a Sinitic‑speaking region under Han rule. Wherever they went, they encountered Han populations, to put it colloquially, "everywhere they went, they met Chinese." The presence of Mon‑Khmer vocabulary in Vietnamese represents an Austroasiatic layer far smaller than the extensive Sino‑Vietnamese and Sino‑Tibetan strata reflected in Hán‑Nôm.
In reconstructing Vietnamese through a Sinitic comparative framework, humanistic factors, history, culture, and regional phonology, must be weighed carefully, for they shape Vietnamese ethnic identity. Today, Vietnam has 54 minority communities, each with its own lect (such as Hmong, Tày, Nùng, Cham, and various Mon‑Khmer languages). These linguistic divisions exist independently of whether their ancestors once belonged to the ancient NamViet Kingdom. For example, the Li (黎族) people of Hainan share genetic ties with the Cham of central Vietnam, reflecting their Austronesian origins. Yet they had no direct connection with ancient Annamese populations until after the twelfth century, when Cham groups began intermixing with resettled Annamese. This raises doubts about attributing Cham words such as ni and nớ ("that, there") to the Chinese character 那 (nà).
IV) Vietnamese within the Sinitic – Yue continuum
The internal development of Vietnamese reflects the general branching patterns of the ancient Bai-Yue languages. Old Chinese itself diversified into numerous regional lects; Vietnamese did the same, as the most recent example shows, only two decades of North-South division (1954-1975) were enough to produce noticeable differences. This demonstrates that linguistic divergence is a natural process, not an anomaly in the case of Vietnamese.
After 939 CE, Vietnamese separated from the orbit of Sinicization and developed independently, unlike Cantonese, Hokkien, Hainanese, or Taiwanese, which remained within the Sinitic linguistic sphere. Vietnamese therefore retains Sinitic-Vietnamese features while also maintaining its own structural distinctiveness, an outcome consistent with its history of contact and branching.
Meanwhile, Austroasiatic discourse often assumes that Sinitic features in Vietnamese are merely the result of Sinicization. Yet this argument overlooks the very process of deep Sinicization that transformed Cantonese and Hokkien into "Sinitic dialects". If Sinicization could restructure the entire syllabic and tonal systems of indigenous languages, then the presence of Sinitic–Vietnamese features in Vietnamese is not unusual but a natural consequence of long‑term contact.
Despite more than a millennium of territorial expansion, Vietnamese has maintained a high degree of internal unity: the North shows stronger Sinitic-Vietnamese influence, the South absorbed Cham–Khmer elements, yet mutual intelligibility across regions remains robust. This pattern aligns with the branching model of Sinitic–Vietic languages rather than the fragmentation typical of Mon‑Khmer languages.
The full formation of Vietnamese as we know it today likely unfolded over roughly 1,900 years, beginning in 111 BCE and concluding in 939 CE for the Old Vietnamese, and the process continued on as Middle Vietnamese had then emerged as an independent linguistic entity. This marks the point at which Vietnamese diverged from the Sinicizing trajectory that shaped the Cantonese and Hokkien languages that remained within the Han cultural sphere.
For the contemporary development of Vietnamese language, Austroasiatic Mon‑Khmer elements can be attributed to two distinct periods: the prehistoric era for the base layer of the Yue linguistic strata and the twelfth century, when Annam expanded south of the 16th parallel and encountered offspring of the early Yue resettlers who are now generally known as Austroasiatic Mon-Khmer. However, linguistically speaking, just as Cham influence can be isolated in the historical study of Vietnamese, Mon‑Khmer influence played only a limited role in the early stages of the language's development.
Historical evidence strongly reinforces this view. Over many centuries, waves of Han migrants, including infantrymen, newly appointed or exiled officials, and refugees, left the southern regions of China to settle permanently in Annam. These groups continued to assimilate into local communities long after Annam gained independence from China. Notably, this process is still going on today: since the 1990s, impoverished Chinese laborers have continued to migrate into Vietnam, and even after their work contracts or visas expire, many choose to remain, marry locals, establish livelihoods, and raise families.
During nearly a century of French colonial rule, linguistic reforms left a lasting imprint on the Vietnamese writing system. The Latin‑based Quốcngữ script spread widely as the intellectual elite led a movement to replace the traditional Chinese script in the early twentieth century. This radical transformation marked a decisive break from the Sinicized literary cycle, altering the semantic and syntactic structures that had been grounded in Classical Chinese texts prior to the seventeenth century. Today, Vietnamese in both its spoken and written forms has been modernized with a new degree of precision and logical structure, thanks to its active adoption of Western linguistic mechanisms such as thesis‑driven composition, complete sentence structure, and punctuation, all while retaining a vast Sinitic lexicon.
When Annam gained independence in 939 CE, its territory was confined to the rice‑growing region surrounding the Red River delta, essentially what is now northern Vietnam. History shows that this region had belonged to the kingdom of Nanyue, part of Lingnan in southern China, roughly 300 years before 111 BCE. The Austroasiatic hypothesis of a genetic relationship between Vietnamese and Mon‑Khmer faces a major challenge: the central and southern regions of Vietnam from south of the 16th parallel were only incorporated into Annam after the twelfth century. This territorial expansion occurred through warfare and political concessions from the kingdom of Champa (192-1832 CE). The Cham, speakers of an Austronesian language, had built a powerful and long‑lasting state in what is now the Central Vietnam, serving as a buffer between the early Vietnamese and Mon‑Khmer communities. Thus, the claim that Vietnamese speakers – truly the Kinh people – have an ancestral relationship with Mon‑Khmer is problematic, because contact between the two groups occurred far later than Austroasiatic scholars propose.
When Annamese groups of now known as the Kinh ethnicity resettled further south, they carried with them a mixed ancestry, that is, descendants of the indigenous populations of northeastern Vietnam and Han migrants from southern China. These groups historically included people from the states of Chu (楚), Wu (吳), Yue (越), Min (閩), and other Bai‑Yue polities recorded since the Western Zhou. As a matter of fact, after the Han annexed NanYue in 111 BCE, waves of Han migrants poured into Annam – part of a process known as of searching for a "good earth" of the Chinese people are they are commonly known today – intermarried with local women, and settled permanently, partly because the journey back to their homeland was too far. Their descendants continued migrating deeper into the regions that later became central and southern Vietnam, beginning in the twelfth century and continuing into the early sixteenth.
Linguistically, it is the contact between southern Annamese settlers and speakers of Mon‑Khmer lects in the newly annexed highlands and lowlands that likely introduced more Mon‑Khmer vocabulary into Vietnamese. These interactions occurred mainly in the Central Highlands along the Trường Sơn range and in the Mekong delta, where Khmer populations were concentrated. Thus, the presence of Mon‑Khmer vocabulary in Vietnamese is primarily the result of language contact, not genetic inheritance.
For the same argument, the Austroasiatic school further emphasizes Mon‑Khmer influence in classifying Vietnamese. Geographically, nevertheless, Mon‑Khmer speakers inhabited the Mekong Basin long before Annamese settlers arrived, yet proponents of the hypothesis often assume that the earliest Vietnamese were originally Mon‑Khmer, overlooking evidence of large‑scale Han migration. In fact, whether Vietnamese truly belongs to the Austroasiatic family depends on historical perspective. Historically, Austroasiatic claims of a shared genetic origin between Vietnamese and Mon‑Khmer remain speculative, especially given the long north‑to‑south migration of the Vietnamese. Literally, moreover, Khmer‑derived vocabulary is also largely absent from classical Vietnamese literature.
A comparison may be drawn between these Austroasiatic claims and the ethnonationalist theories of certain Vietnamese scholars who assert cultural ownership over artifacts excavated from the Sahuỳnh and Óc-Eo cultures. These regions flourished under Cham polities long before Annamese expansion. Nationalist scholars, eager to construct an unbroken Vietnamese lineage, boldly attribute these artifacts to their own ancestors, even though the original indigenous populations had long disappeared.
For the same theoretical reason, Austroasiatic linguists did attempt to trace Vietnamese back to a prehistoric Mon‑Khmer stage, seeking scholarly support for Mon‑Khmer etyma. As early as the twentieth century, many Vietnamese scholars began asserting ancestral inheritance over Đôngsơn bronze drums, discovered across a vast area stretching from China South to that of New Guinea of East Indonesia (but completely absent from the Khmer cultural realm. These artifacts are often attributed to the ancestors of modern Vietnamese, despite the fact that evidence of Vietnamese mastery of their casting techniques is questionably limited, if not nonexistent. Strikingly, few Vietnamese scholars connect these drums to the Zhuang (Nùng), who still use similar bronze drums in ritual ceremonies in northwestern Vietnam and their Autonomous Regions in Guangxi Province, China.
Again, the central question persists: who were the true creators of the exquisite bronze drums found in those specific locations of Guangxi, northern Vietnam, and New Guinea of East Indonesia? Were the drums found on remote Indonesian islands brought there by ancient Yue migrants, including ancestors of today's Vietnamese people, who moved southward from China? Or do they belong to Austroasiatic communities that spread across Southeast Asia thousands of years earlier? Ethnonationalist Vietnamese scholars repeatedly, like old broken records, cite the HouHanshu (後漢書 - Annals of the Later Han) to argue that Han armies destroyed the indigenous culture of the ancient Vietnamese, as recorded in the campaigns of General Ma Yuan, when LạcViệt bronze drums were confiscated and melted down to cast bronze horse statues. So the forebearers of the Vietnamese people were abruptly cut off form the the Bronze Culture.
The geographical distribution of bronze drums places both Vietnamese and Zhuang within the same cultural sphere. Historically, as noted, Zhuang communities have continued using bronze drums in ritual contexts. Interestingly, though, Zhuang folklore clearly describes the origins of their bronze‑drum tradition, whereas the Viet‑Muong group has no corresponding cultural memory. If one accepts ethnonationalist claims about Vietnamese heritage, then the Vietnamese must be the descendants of the bronze‑drum makers; yet this argument becomes contradictory when they simultaneously claim inheritance of the Óc-Eo and Khmer cultural legacy, a tradition in which bronze drums are entirely absent. So, is this the case that the winners take it all? Who are the legitimate descendants of creators of the bronze drums, then? Both spatially and temporally, this contradiction demands explanation.
As said, the Austroasiatic Mon‑Khmer hypothesis disregards historical sequence and implicitly asserts that the Vietnamese originated from indigenous populations inhabiting a vast southern territory around 6300 BCE, long before Annam emerged as an independent polity, though. These hypothetical Austroasiatic ancestors are assumed to have spoken ancient Mon‑Khmer lects, and the model of Vietnamese is adjusted to fit this framework. Yet this classification fails to account for the overwhelming presence of Sinitic etyma in Vietnamese, which shaped the language's structure in ways fundamentally different from traditional Mon‑Khmer languages.
In broader comparison, Vietnamese is not a language that simply borrowed wholesale from Sinitic, as many other languages have done with dominant linguistic sources, for example, Bulgarian absorbing Slavic elements. Instead, the way Vietnamese developed shares certain features with the "French Creole" model in Haiti, making it a kind of "Sinitic dialect," similar in pattern to Cantonese: a language with a minimal indigenous core overlaid by linguistic strata that are entirely cognate with Chinese in every respect. To understand this more clearly, consider how many national languages around the world, such as Spanish in Latin America, do not originate from indigenous languages but from colonial influence. A similar situation applies to Mandarin, which is used by indigenous populations in Taiwan and Singapore despite its external origins. More importantly, ancestral Bai‑Yue elements in Vietnamese existed before Sinitic linguistic formations emerged, as evidenced by the diplomatic language Yayu (雅語) used in interstate communication during the Eastern Zhou.
Although the Austroasiatic hypothesis is now widely accepted and Vietnamese has been classified within the Mon‑Khmer family, its basic vocabulary has never been systematically reviewed alongside Sino‑Tibetan etyma to justify the core matters of the long-forgotten debating issues. Most Austroasiatic and Sino‑Tibetan specialists have never recognized the existence and degree of correspondence between more than 400 Sino‑Vietnamese etyma and Sino‑Tibetan structures that the author of this study identified based on Shafer's (1974) Sino‑Tibetan research. This study employed hundreds of such etyma to examine Sino‑Tibetan roots beyond the traditional Austroasiatic Mon‑Khmer framework.
Austroasiatic scholars may be surprised when they examine the Sino‑Tibetan basic‑word lists presented in this series of studies. Moreover, the rejection of the Austroasiatic Mon‑Khmer hypothesis is based not only on linguistic evidence but also on archaeology, anthropology, cultural scholarship, and, importantly, history. Historical records state that the ancestors of the Bai-Yue peoples originated in the wild‑rice region around Lake Dongting in Hunan Province, China, thousands of miles from the Indochinese peninsula, with a documented antiquity of at least 3,000 years. Culturally, Vietnamese mythology asserts that the Vietnamese are "Children of the Dragon and Fairy," and they were also once regarded as descendants of the Flame Emperor (炎帝) or Yellow Emperor (黃帝), a legend likewise claimed by the Han, much like the "Dragon" motif. Both traditions seem to reinforce a shared Bai‑Yue heritage, suggesting that ancient Yue peoples may have worshipped crocodiles (perhaps symbolizing "dragons"?), a cult absent in Mon‑Khmer culture (see Terrien de Lacouperie, 1887).
Historically and culturally, descendants of pre‑Qin states such as Chu (楚), Wu (吳), Yue (越), and other polities in the Hunan region continue to commemorate Qu Yuan (屈原) on the fifth day of the fifth lunar month (the Dragon Boat Festival), honoring his sacrifice in resisting Qin (see Trần Trọng Kim, Việt-nam Sử-lược; Ngô Sĩ Liên, Đại-Việt Sử-ký; Bo Yang, Zizhi Tongjian, 1983, Vol. 1).
Whether or not modern Vietnamese are the true descendants of the bronze‑drum makers, they often identify themselves with the metallurgical tradition of bronze casting. However, they are careful not to claim ownership of the Hindu temple ruins of the Cham along the central coast, recognizing their distinct historical origins.
Vietnamese ethnonationalist narratives parallel the Austroasiatic Mon‑Khmer hypothesis, which situates Vietnamese linguistic and cultural history within a broader Southeast Asian sphere. This perspective places Vietnam within large cultural zones, including the Khmer Empire, the region's dominant power before the eleventh century. Austroasiatic theorists are encouraged to believe that Mon‑Khmer communities, whose cultural traces appear throughout Southeast Asia, were the ancestors of the Vietnamese. Yet from a historical standpoint, no Khmer artifacts or millennia‑old cultural remains in central Vietnam bear any connection to ancient Annamese populations.
Over three centuries of settlement waves interacting with local Khmer communities introduced additional Mon‑Khmer elements into Vietnamese. On one hand, indigenous Mon‑Khmer groups in Vietnam naturally retained their own languages, surviving in remote mountainous regions. On the other hand, the Kinh population concentrated in fertile lowlands, the Red River Basin (the Tonkin Delta), the central coast, and the Mekong Basin, with the largest settlement waves occurring roughly 320 years ago. Vietnamese identity formed through intermarriage between local women and migrant men, creating family structures grounded in Confucian values. Over many generations, a mixed population emerged as Vietnam continued expanding southward, accompanied by significant demographic growth and long‑term territorial consolidation.
As inconsistencies in the Austroasiatic Mon‑Khmer hypothesis continue to surface, historical analysis must take precedence over speculative prehistoric models. It is crucial for Austroasiatic scholars to recognize that indigenous Mon‑Khmer communities who withdrew into remote highlands never held administrative roles in the Annamese state. The historical Annamese population, living in lowland regions and urban centers governed by China for roughly 1,050 years until 939 CE, are the true ancestors of today's Kinh, regardless of lineage; as long as they spoke the same language, they were Kinh.
The Austroasiatic school has long maintained that Mon‑Khmer elements coexisted with Vietic elements, regardless of whether the former shared Bai‑Yue origins. They also assume that both groups originated from prehistoric Tai communities. The presence of modern Mon‑Khmer minority groups, likely descendants of indigenous populations from neighboring parts of Indochina, suggests that they may once have been dominant in their own territories (Nguyễn Ngọc Sơn, 1993).
V) Archaeology, heritage, and the politics of identity
Ancestral Yue elements in Vietnamese predate the formation of later Sinitic linguistic entities, as suggested by the diplomatic language Yayu used during the Eastern Zhou era. Terms such as Taic, Yue, Daic, Vietic, Muong, Annamese, Kinh, and Vietnamese each correspond to distinct historical periods and should not be treated as interchangeable labels for all cultural achievements. National pride, however, often encourages retrospective claims of ownership. Communities frequently attribute ancient artifacts, such as bronze drums, to their own ancestors, even when archaeological and historical evidence suggests otherwise. Similar patterns appear in Chinese scholarship, where southern cultural artifacts have been retroactively assigned to northern dynasties.
The discovery of Đôngsơn drums in New Guinea of East Indonesia further supports evidence of ancient trade networks. These findings allow Austroasiatic scholars to align aspects of their narrative with Yue‑based theories, as both frameworks demonstrate inclusivity despite originating in distinct historical periods. This convergence is particularly significant for Vietic entities, both racially and linguistically, whose history spans more than 3,000 years and is rooted in early Chinese references to the "Southern barbarians."
The question "Who are the Vietnamese?" must therefore be grounded in historical evidence rather than modern sentiment. Han Chinese society itself emerged from a fusion with Yue peoples. Early Muong groups, descendants of Yue populations after the Viet‑Muong split, retreated into mountainous regions rather than assimilate under Han rule, preserving a more aboriginal lineage. Those who remained intermarried with southern Chinese migrants, forming the demographic core of what would become the Kinh majority.
The formation of Vietnamese as a distinct linguistic and cultural entity likely unfolded over roughly 1,900 years, from the Han annexation of Nam Việt in 111 B.C. to the establishment of independence in 939 A.D. This long process produced Middle Vietnamese, which diverged from the Sinicization that shaped Cantonese and Fukienese, both of which remained within the Sino‑sphere. Austroasiatic Mon‑Khmer components in Vietnamese can be situated in two periods: remote antiquity or the 12th century, when Annam expanded south of the 16th parallel. Just as Chamic elements can be analytically separated from earlier Vietnamese development, Mon‑Khmer components exerted limited influence on the language's earliest evolutionary stages.
Historical evidence strongly supports this interpretation. Over many centuries, waves of Han immigrants, soldiers, officials, exiles, and refugees, migrated from southern China and settled permanently in Annam. These settlers continued to integrate into local communities long after Annam's political separation from China. Remarkably, this pattern persists today, as Chinese migrant laborers continue to resettle in Vietnam.
During the nearly 100 years of French colonial rule, linguistic reforms profoundly reshaped Vietnam's writing system. The Romanized script gained widespread acceptance as intellectuals in the early 20th century sought to replace the traditional writing system. This transformation marked a decisive break from the Sinitic cycle, altering semantic and syntactic structures inherited from classical Chinese texts. Modern Vietnamese now incorporates Western linguistic mechanisms, structured topics, complete sentences, punctuation, while retaining a vast vocabulary of Chinese origin.
When Annam gained independence in 939 A.D., its territory was confined to the rice‑growing regions of the Red River Basin in present‑day northern Vietnam. Historically, this region had been part of the Nam Việt Kingdom in southern China roughly 300 years before 111 B.C. The Austroasiatic theory of a Mon‑Khmer genetic affiliation with Vietnamese is challenged by the fact that central and southern Vietnam, territories south of the 16th parallel, were incorporated only after the 12th century. This expansion resulted from warfare and political concessions from the Champa Kingdom (192, 1832 A.D.). The Chams, an Austronesian people, maintained a powerful state that served as a geographic buffer between ancient Vietnamese and Mon‑Khmer populations. Thus, the hypothesis of a deep ancestral connection between Vietnamese and Mon‑Khmer groups is problematic, as sustained contact occurred far later than Austroasiatic theorists propose.
By the time Annamese settlers expanded southward, they were already of mixed heritage, descended from early northeastern Vietnamese aborigines and Han immigrants from southern China. These settlers included people from Chu (楚), Wu (吳), Yue (越), Min (閩), and other Yue‑related states recorded during the Western Zhou period. After the Han Empire annexed Nam Việt in 111 B.C., Han settlers migrated en masse into Annam, intermarrying with local populations. Their descendants continued migrating into what would later become central and southern Vietnam from the 12th to the early 16th centuries.
Linguistically, interactions between southern Annamese settlers and Mon‑Khmer speakers in newly acquired mountainous and delta regions contributed to the absorption of Mon‑Khmer vocabulary. These contacts occurred primarily in the Central Highlands along the Trường Sơn Range and in the Mekong Delta, where Khmer populations were concentrated. The presence of Mon‑Khmer vocabulary in Vietnamese therefore reflects linguistic contact, not genetic inheritance.
Austroasiatic theorists have long emphasized Mon‑Khmer influences in Vietnamese classification. Yet whether Vietnamese belongs within the Austroasiatic family depends on whether the analysis is historical or geographical. Historically, Austroasiatic claims remain speculative, especially given Vietnam's prolonged northern migrations. Geographically, Mon‑Khmer speakers inhabited the Mekong Basin long before Annamese settlers arrived, but geographic precedence does not establish genetic affiliation. Proponents often assume that the earliest Vietnamese were Mon‑Khmer, overlooking extensive evidence of Han migration and Yue ancestry.
A parallel emerges between these Austroasiatic claims and Vietnamese nationalist narratives that assert cultural ownership over artifacts from the Sahuỳnh and Óc‑Eo civilizations. These regions flourished under Chamic monarchs long before Annamese expansion. Nationalist scholars, eager to construct an unbroken Vietnamese lineage, often claim these artifacts as ancestral creations, despite the disappearance of the indigenous artisans who produced them.
Taken together, archaeological, historical, and linguistic evidence reveals that Vietnamese identity emerged from a long process of Yue heritage, Han migration, and later contact with Chamic and Mon‑Khmer populations, not from a primordial Austroasiatic origin.
Austroasiatic linguists have long attempted to trace Vietnamese linguistic ancestry to prehistoric Mon‑Khmer origins, assembling comparative word lists to support Mon‑Khmer cognacy. In parallel, early 20th‑century Vietnamese scholars began asserting ancestral ownership over Dongsonian bronze drums, discovered across vast regions of Southeast Asia. These drums were widely credited as creations of the forefathers of modern Vietnamese, despite limited evidence regarding their manufacturing techniques. Surprisingly, few Vietnamese scholars have connected these drums to Zhuang communities, who continue to use similar instruments in northwestern Vietnam and southern China.
This raises a fundamental question: who were the actual creators of the bronze drums found across Southeast Asia? Were they introduced by migrating Yue peoples from the north, or produced by Austroasiatic groups who spread across the region thousands of years earlier? Nationalist scholars often cite the Book of the Later Han that records General Ma Yuan melting captured LạcViệt drums into bronze horses. Yet the geographical distribution of bronze drums places both Vietnamese and Zhuang within the same cultural sphere, and only the Zhuang preserve a continuous ritual tradition involving these drums. Zhuang folklore even recounts the origins of their bronze drum practices, whereas the Viet‑Muong possess no equivalent cultural memory.
If we accept such nationalist claim at its face value, the Vietnamese would be heirs to the bronze drum tradition. But this logic becomes inconsistent when the same narratives simultaneously assert Khmer heritage whose cultural footprint is undeniably vast. So who should be considered legitimate descendants of the Yue? Both spatially and temporally, this issue requires clarification.
The Austroasiatic Mon‑Khmer hypothesis further complicates matters by disregarding historical chronology. It implicitly asserts that the Vietnamese descend from aboriginal populations who inhabited southern territories around 6300 B.C., long before Annam emerged as an independent polity. These hypothetical Austroasiatic ancestors are assumed to have spoken archaic Mon‑Khmer languages, with Vietnamese retroactively fitted into this framework. Yet this classification fails to account for the overwhelming presence of Chinese lexical influence in Vietnamese, which shaped its structure in ways fundamentally distinct from Mon‑Khmer languages.
Vietnamese is not simply a language composed of borrowed Chinese vocabulary – as comparably as Bulgarian is with Slavic elements. Rather, it resembles a creolized Sinitic form – parallel to Haitian Creole's relationship to French, a "false Chinese dialect" in the same sense that Cantonese is sometimes described as a "false Mandarin." Many national languages worldwide, including Spanish in Latin America, did not originate from indigenous roots but from prolonged colonial influence. Likewise, Mandarin is spoken by indigenous Taiwanese and native Singaporeans despite its foreign origins. Crucially, ancestral Yue elements in Vietnamese predate the formation of Sinitic linguistic entities, as evidenced by the Yayu (雅語) diplomatic language used for interstate communication during the Eastern Zhou era.
Although the Austroasiatic hypothesis has been widely accepted, its foundational word list has never been systematically reviewed alongside Sino‑Tibetan etymologies. Specialists in Austroasiatic and Sino‑Tibetan studies have remained unaware of the extent to which over 400 Sinitic‑Vietnamese etyma align with Sino‑Tibetan linguistic structures. This study therefore seeks to examine Sino‑Tibetan etymologies beyond those traditionally classified within the Austroasiatic framework.
Austroasiatic theorists may be surprised by the Sino‑Tibetan basic word lists presented here. The critique of the Austroasiatic Mon‑Khmer hypothesis rests not only on linguistic evidence but also on archaeology, anthropology, history, and philology. Recorded history indicates that Vietnamese forebears originated from the north, far removed from the Indo‑Chinese peninsula, dating back at least 3,000 years. Vietnamese mythology reinforces this northern orientation: the Vietnamese are described as "Offspring of Dragons and Deities" (Con Rồng Cháu Tiên) and were once considered descendants of the Yellow Emperor (黃帝 or 炎帝 Viêmđế), a legend also embraced by the Chinese. Both traditions suggest a shared Yue heritage, possibly linked to alligator worship, an ancient practice absent among Mon‑Khmer cultures (Terrien de Lacouperie, 1887). Culturally, descendants of pre‑Qin states such as Chu (楚), Wu (吳), and Yue (越) continue to commemorate the poet Khuất Nguyên (屈原, Qu Yuan) during the Duanwu Festival (端午節), honoring his resistance to Qin domination. Vietnamese historical texts, including Trần Trọng Kim's Việt‑nam Sử‑lược and Ngô Sĩ Liên's Đại‑Việt Sử‑ký, preserve similar traditions.
Whether or not modern Vietnamese are direct Yue descendants, they frequently identify with the prestigious bronze‑working tradition that spread across Southeast Asia and into the Indonesian archipelago. Yet they have refrained from claiming ownership of Chamic Hindu temple ruins along the central coast, recognizing their distinct historical origins.
Vietnamese nationalist enthusiasm often aligns with academic narratives crafted by Austroasiatic theorists, who situate Vietnam within a broader Southeast Asian civilizational framework. This perspective places Vietnam within the cultural orbit of the Khmer Empire, which dominated the region prior to the 11th century. Austroasiatic followers have been blindly led to believe that Mon‑Khmer speakers, having left cultural remnants across Southeast Asia, were ancestral Vietnamese. Historically, however, none of the Khmer ruins or artifacts discovered in central Vietnam are connected to early Annamese populations.
Over three millennia, successive waves of settlers encountered Khmer groups in situ, contributing to linguistic exchange and the absorption of Mon‑Khmer elements into Vietnamese. Indigenous Mon‑Khmer speakers retained their own languages, which persist among minority communities in remote mountainous regions. Meanwhile, the Vietnamese Kinh majority remained concentrated in arable lowland areas of the Red River Basin, along the central coast, and the Mekong Basin, with major settlement waves occurring as recently as 310 years ago. Vietnamese identity formed through intermarriage between indigenous foremothers and immigrant men, producing family structures rooted in Confucian values. Over generations, this process created a racially mixed population that expanded southward through demographic growth and territorial consolidation.
As inconsistencies within the Austroasiatic hypothesis continue to surface, historical analysis must take precedence over speculative prehistoric timelines. Austroasiatic scholars must recognize that indigenous Mon‑Khmer speakers, having retreated into remote mountain regions, never played a governing role in Annam's statehood. The historical Annamese, inhabitants of the Chinese‑administered Annam Prefecture for roughly 1,060 years prior to independence in 939 CE, constitute the true ancestral lineage of today's Kinh majority.
The Austroasiatic camp maintains that Mon‑Khmer linguistic elements coexisted alongside Vietic ones, and that both groups ultimately derived from prehistoric Taic origins. Contemporary Mon‑Khmer minority communities in Vietnam, likely descendants of aboriginal settlers from neighboring Indo‑Chinese territories, may once have been dominant in their native regions (Nguyễn Ngọc Sơn, 1993). But their presence does not establish a genetic linguistic affiliation with Vietnamese.
The method of wet‑rice cultivation practiced by many Mon‑Khmer groups today likely spread southward from regions just below Dongting Lake (Độngđìnhhồ) in Hunan Province, an area traditionally regarded as the ancestral homeland of the Yue people some 3,000 years ago. Over time, wet‑rice agriculture expanded into the mountainous zones inhabited by Daic and Zhuang communities, suggesting that paddy cultivation existed in these regions long before its widespread adoption by Mon‑Khmer populations. This agricultural pattern reinforces the broader historical narrative that cultural innovations radiated southward from Yue‑dominated territories.
China South home to the descendants of the ancient Yue, who had Taic roots and established the Chu State, later contributed significantly to the population of the NamViệt Kingdom. These regions likely served as the point of origin for ancestral Vietic groups before their migration into what is now northern Vietnam. Immigrants from southern China included not only refugees and exiles but also officials, soldiers, servants, and others who accompanied Han colonial expansion. As noted earlier, these groups intermarried with earlier resettlers, producing a racially mixed demographic profile, a classic example of anthropological assimilation.
Over generations, the descendants of these Yue emigrants completed their southward migration and permanently settled in Annam, laying the foundation for its emerging sovereignty. Early generations communicated using either their mother's or father's language, or a hybrid of both, eventually developing a distinct local speech. As successive generations moved further south, their descendants continued to identify as "people of the southern Yue" (Việtnam), whether through cultural assimilation or as a declaration of political autonomy from China. This population ultimately evolved into the Kinh majority, who today speak the Vietnamese national language.
Taken together, archaeological, linguistic, and historical evidence demonstrates that Vietnamese identity emerged from the admixture of Yue foundations, Han migration, and later contact with Chamic and Mon‑Khmer populations, a true case of "E Pluribus Unum", yet, not from a primordial Austroasiatic origin, though, because where do Austroasiatic elements fit within the broader Yue‑based framework supported by historical evidence?
Although the Austroasiatic Mon‑Khmer model proposes speculative prehistoric connections, Mon‑Khmer speakers may also trace aspects of their lineage to the same Taic roots associated with early Annamese populations, possibly originating from a southern Yue branch. They may share ethnic ties with Zhuang or Daic groups, who are historically credited with creating bronze drums. More specifically, they may be linked to the Maonan (冒南族) of China South, a group potentially related to ancestral Mon peoples. This hypothesis aligns with the cultural significance of bronze drums but excludes artifacts from the Óc‑Eo and Sahuỳnh civilizations, which were created by indigenous populations distinct from early Vietnamese settlers and predate the emergence of Chamic peoples, likely related to the Li minorities of Hainan Island.
The earliest Annamese resettled into the central coastal corridor relatively late, around the 12th century. In contrast, dominant Mon‑Khmer speakers had inhabited the Indochinese region for more than 6,000 years, distinguishing them from later Vietnamese settlers. Vietnamese remain a distinct and major group, separate from the 53 indigenous minority groups in Vietnam today. Among these minorities are southern Mon‑Khmer speakers as referred to by the former French colonists as "Montagnards", who continue to live on their ancestral lands. These groups only came into contact with late‑arriving Vietnamese settlers within the last few centuries. They inhabit regions along the Cambodian border, spanning the western highlands and plateaus, and extending into the southern Mekong Basin, territory annexed by Vietnam from Cambodia during the 16th century.
This, again, raises two central questions regarding Vietnamese origins. First, do the Vietnamese descend from a branch of the Yue, or are they Austroasiatic descending from the Mon-Khmer branch? A difficult reality is that, despite nationalist claims, modern Vietnamese, including their Muong cousins, may not be direct Yue descendants. Unlike the Zhuang, who continue to use bronze drums in sacrificial ceremonies, Vietnamese nationalist narratives linking their origins to the Yue often reflect wishful thinking reinforced by collective belief. Second, scholars often overlook the possibility that ancestral religious practices, particularly the belief in ever‑present ancestral spirits offering protection and blessings, may have originated abroad as early as 5,000 years ago (Dong Zuobin, 1933; Wu Qichang, 1934; Fu Sinian, 1934).
This hypothesis becomes especially relevant when examining southern populations living in Vietnam's recently annexed territories. These groups had no direct connection to later immigrants from the north, who resettled and intermingled with earlier inhabitants, contributing to the gradual genetic transformation of the Annamese population. This regional transmutation laid the foundation for what ultimately became the Vietnamese nation, a process that aligns with the formula {4Y6Z8HCMK} (See section VII below).
Ethnically, the descendants of these populations, the modern Vietnamese , now live atop archaeological sites where cultural artifacts, including bronze drums, have been unearthed. These relics have been found not only in China South, the ancestral homeland of the Yue.
Figure 5 - Dongson Bronze Drums found in Indonesia
(Source:http://en.wikipedia.org/wiki/Dong_Son_drum)
x X x
Table 1 - Dongson bronze drums
Đôngsơn drums (also called Heger Type I drums) are bronze drums fabricated by the Đôngsơn culture in the Red River Basin (Tonkin Delta) of northern Vietnam. The drums were produced from about 600 BCE or earlier until the third century CE and are one of the culture's finest examples of metalworking.
The drums, cast in bronze using the lost-wax casting method are up to a meter in height and weigh up to 100 kilograms (220 lbs.) Đôngsơn drums were apparently both musical instruments and cult objects. They are decorated with geometric patterns, scenes of daily life and war, animals and birds, and boats. The latter alludes to the importance of trade to the culture in which they were made, and the drums themselves became objects of trade and heirlooms. More than 200 have been found, across an area from eastern Indonesia to Vietnam and parts of China South.
The earliest drum found in 1976 existed 2700 years ago in Wangjiaba (万家坝) in Yunnan Chuxiong Yi Autonomous Prefecture China. It is classified into the bigger and heavier Yue (粤系) drums including the Dong Son drums, and the Dian (滇系) drums, into 8 subtypes, purported to be invented by Ma Yuan and Zhuge Liang. But the Book of the Later Han said Ma melt the bronze drums seized from the rebel Lạc Việt in Jiaozhi into horse.
The discovery of Đôngsơn drums in New Guinea, is seen as proof of trade connections , spanning at least the past thousand years , between this region and the technologically advanced societies of Java and China South.
In 1902, a collection of 165 large bronze drums was published by F. Heger, who subdivided them into a classification of four types.
VI) Structural parallels between Vietnamese and Chinese
Vietnamese is the outcome of a combined linguistic inheritance: an underlying substratum of ancient Bai‑Yue (Yue) features overlaid with an early Sinitic (pre‑Sinitic) superstratum, shaped by the long historical and cultural processes described above. Much like the formation of the Southern Min (Hokkien) and Yue (Cantonese) branches, Vietnamese is not historically directly tied to any prehistoric Austroasiatic framework. Regardless of whether the ancestors of the ancient Vietnamese knew how to cast and use bronze drums, the ethnic composition of the modern Vietnamese -Kinh population only took shape during the later centuries of the Han‑dominated period of Annam. Even if one accepts the timeline that places the proto‑Tai peoples, ancestors of the Bai‑Yue groups, of which the ancient Vietnamese were one branch, before the appearance of bronze drums, that moment still predates the arrival of Austroasiatic populations in the Indochinese peninsula by 6,000-4,000 years. This period lies outside the scope of the present study, which focuses on the development of Vietnamese and its speech community from 111 BCE onward.
Regarding the relationship between Vietnamese and Sinitic, a relationship that parallels the ethnic formation of the Vietnamese people , one may argue that if ancient Vietnam had not gained independence from the Southern Han (南漢) and had continued under Sino-sphere after 939 CE, Vietnamese would likely have been classified as another Sinitic dialect, just as Hokkien or Cantonese are today.
The divergence among Vietnamese, Hokkien, and Cantonese, along with their dialects such as Amoy, Hainanese, Teochew, and Taishanese, suggests that all developed from a pre‑Bai‑Yue linguistic stage. After 111 BCE, when the Han annexed the vast territory of NamViet (NanYue), those lects separated from their original substrate and underwent significant Sinicization as they became fully incorporated into Han territory. Yet even today, regardless of dynasty, this southeastern region of the Chinese mainland is still called "NanYue." Meanwhile, Annam, ancient Vietnam in the north, remained under direct Chinese rule for 1,009 years until achieving independence in the mid‑tenth century.
To imagine the state of Vietnamese during the period of Han rule, one may consider a hypothetical scenario: if Fujian and Guangdong provinces had separated from China at the same moment, then Vietnamese, Fujianese, and Cantonese might all be regarded today as three independent "post‑Han" languages. This thought experiment highlights the continuous Sinicization of Vietnam, even after political separation, a process similar to that experienced by the northern Bai‑Yue polities.
Humanistic linguistic evidence indicates a shared ancestry between the Vietnamese and the inhabitants of Hunan in southern China. Lexically, many basic Vietnamese etyma still preserve traces of ancient Bai‑Yue:
- con ↔ 子(仔) in Fujianese (Amoy, Southern Min) /kẽ/
- mợ ↔ 母 mǔ in Hainanese /maj2/
- biết ↔ 明白 in Hainanese /(ming2)bat8/, Amoy /mɓat7/
- soài ↔ 檨 in Hokkien /swãj4/
- dê ↔ 羊 in Teochew /jẽw1/
- gàcồ ↔ 雞公 in Hainanese /kōj1koŋ1/
- gàmái ↔ 雞母 in Hainanese /kōj1maj2/
Whereas Hokkien and Cantonese became fully Sinicized and classified as Sinitic dialects after the Han annexation of Nanyue, the Vietnamese spoken in Jiaozhi Commandery (交趾), when it became one of the nine commanderies of the Western Han, followed its own trajectory. Unlike the northern regions, Vietnamese developed within a mixed community of indigenous Bai‑Yue populations, Han officials, and Han soldiers. After independence, successive waves of southward resettlement continued to introduce external influences into Vietnamese.
Archaeological discoveries complicate ethnonationalist claims about indigenous artifacts, but linguistic relationships can be traced through stable patterns of development. When Sinitic languages spread southward around 100 BCE, the Kingdom of Champa, known in Chinese sources as Lâmấp, located south of Annam, was under strong Indian cultural influence, lacked continuous contact with the north, and had lost ties with its ethnic relatives, the Li people of Hainan. Cham groups frequently clashed with the Khmer after abandoning Hinduism and adopting Islam.
The incorporation of Champa into Annam in the eighteenth century left only limited traces in Vietnamese: aside from place names, only a small number of Cham loanwords survive, such as mặttrời (the sun'), mặttrăng ('the moon'), or u ('mother'), ni ('this'), nớ ('that') in the Huế dialect, and even these remain debated.
Beyond geography and anthropology, the phonotactic and tonal features found only in Chinese and Vietnamese, including disyllabicity and the predominance of consonant‑initiated vocabulary, reinforce the Sinitic–Yue model. This stands in sharp contrast to how non‑tonal Altaic languages like Korean and Japanese borrowed from Chinese: their loanwords were restructured in Kanji and Hanji to fit their own phonological systems.
The linguistic proximity between Vietnamese and Chinese appears in semantics, tones, classifiers, prepositions, conjunctions, and syntactic structures, all reflecting a shared linguistic heritage. Additionally, Annam possessed the Nôm script. Before the Nôm system (based on Chinese characters) was used to transcribe the vernacular, the original Chinese script itself was used to record Nôm words, the indigenous language, as well as local products and place names. These coexisted with Chinese: one for administration, one for the vernacular. For example, "Nôm" (喃) and "Nam" (南) were used for Nồm; "tử" (子) and "tý" for con and chuột; "xú" (丑) and "sửu" for xấu and trâu; "đák" for nước (淂); "tơ" and "ty" (絲) for silk‑related terms, etc. For this reason, the present study does not attempt to revive fossilized etyma herein, which may simply be regional remnants of an Austroasiatic layer, a domain long defended by the Austroasiatic school.
Beyond spatial and temporal factors, Sinitic cultural influence, especially Confucianism, directly shaped Vietnamese phonological change, including taboo avoidance and euphemism. Words homophonous with the names of rulers or revered figures were often avoided or altered, such as lời or lãi replacing Lợi (利) in the name of King Lê Lợi, or Huỳnh instead of Hoàng (euphemism for Lord Nguyễn Hoàng). Thus, phonological shifts must be central in examining Bai‑Yue roots in Vietnamese, as sound changes over time reflect deeper linguistic patterns.
Linguistic truth belongs to those who notice what others overlook and persist in defending their conclusions, even when they diverge from orthodox Sino‑Tibetan classifications. Grammar must also be examined, even though it is the most rapidly changing component of language. A 2017 study published on Phys.org, The myth of language history: Languages do not share a single history, emphasizes this volatility. For example, Vietnamese often forms words through an inverted order {root + modifier}, unlike Sinitic lects, yet still preserves similar syllable‑morpheme structures. Many ancient terms appear in both languages with parallel grammatical constructions, such as Hoanam (華南) or Thầnnông (神農), instead of the inverted modern Chinese forms Nánhuá (南華) or Nóngshén (農神). Despite differences in syllable order, especially in phonology and grammar, Vietnamese and Chinese remain closely linked semantically. This is evident in many Middle Chinese terms preserved in Vietnamese as reversed compounds:
- đảmbảo sovới bảođảm (擔保 dānbǎo ↔ Sino‑Vietnamese @ 保擔 bǎodān)
- áiân sovới ânái (愛恩 ēn’ài ↔ SV @ 恩愛 ài’ēn)
- ônhiễm sovới hoen-ố (污染 wūrǎn ↔ SV @ 染污 rǎnwū)
Sino‑Vietnamese etymology must be distinguished from the natural sciences, where measurement tools are standardized. Linguistic methods based on Indo‑European models are unsuitable for tonal languages. Thus, one cannot expect cognates from different language families to exhibit identical phonological forms. A word is often considered a loan if its phonology closely matches the source language as in Sino‑Vietnamese, which reflects Middle Chinese pronunciation with high fidelity. By contrast, Mon‑Khmer cognates show phonological similarity across many unrelated languages, increasing the likelihood of accidental resemblance. This contradicts a linguistic principle: the more two words in two languages resemble each other phonetically, the less likely they are to be genetically related. This pattern becomes especially clear when comparing tonal and non‑tonal languages, for example, Vietnamese chồmhỗm ("to squat") and Khmer /chorahom/ versus Chinese 犬坐 quǎnzuò ("to squat like a dog"). Who borrowed from whom in this case?
Taken together with comparative etymological analysis, the evidence indicates that Vietnam’s linguistic‑historical position lies within the Sino‑Tibetan framework rather than the Austroasiatic Mon‑Khmer model. The continuity of shared features between Vietnamese and Chinese. compared with the much greater distance between Chinese and Korean or Japanese, further strengthens the argument that Vietnamese belongs to the Sino‑Tibetan family.
VII) Challenges and accessibility of Vietnamese antiquity
Anthropologically, the racial admixture that shaped the Vietnamese closely parallels the evolutionary processes that produced the Han Chinese. Expressed symbolically, proto‑Chinese groups {X}, originating from the Tibetan regions of southwestern China, intermingled with proto‑Yue aboriginals {YY}, likely Taic‑speaking peoples who formed the majority population of the Chu State. Their interaction, roughly at a ratio of 1 to 2, or X/2Y, gave rise to the indigenous Yue populace {ZZZ}, inhabiting states such as Shu, Wu, and Yue. Over time, their mixed descendants were classified as Han {HHHH}, represented symbolically as 3Z4H. Under the Han Dynasty, these groups unified within the Middle Kingdom, effectively a "United States of Qin", marking the transition from Qin subjects to Han Chinese.
Thus, the racial composition of the Han Chinese, expressed as {X2Y3Z4H}, emerged through the fusion of proto‑Chinese (X), proto‑Yue (YY), indigenous Yue (ZZZ), and Han (HHHH). A similar process shaped the Vietnamese. Their ancestry evolved from proto‑Yue {YY} and later Yue {ZZZ} to proto‑Vietic {YYZZZ}, forming the early Annamese population symbolized as {2Y3Z+4H}. Over centuries, this population transformed into modern Vietnamese, represented as {4Y6Z8H+CMK}, where {C} denotes Cham influence and {MK} represents Mon‑Khmer contributions. This pattern mirrors the demographic formation of Fukienese and Cantonese populations, which were shaped by similar fusion processes before and after 111 B.C.
Consequently, the Austroasiatic formula may be tentatively expressed as {6YCMK}, contrasting with the modern Vietnamese formula {4Y6Z8H+CMK}. These symbolic models encapsulate the historical processes that forged distinct yet interconnected racial and cultural identities.
As later chapters will elaborate, the development of Vietnamese progressed in parallel with the racial composition of its speakers ({4Y6Z8H+CMK}). Historically, when Qin armies advanced southward, native Yue inhabitants {2Y3Z2H} from the Động Đình Hồ region of present‑day Hunan migrated en masse to the Red River Delta. This movement led to intermixing with indigenous groups, including the Muong and the peoples associated with the Phùng Nguyên Culture (c. 2000-1500 B.C.) (2) In subsequent periods, earlier resettlers {2Y3Z2H} intermingled with newly arrived Yue groups {4Y3Z2H}. Later, the ancestors of the Viet‑Muong {4Y3Z2H} fled into the southwestern mountains in response to Han invasions beginning in 208 B.C., placing their linguistic heritage in direct contact with Mon‑Khmer speakers {4Y+MK}. This historical interplay helps explain why certain Viet‑Muong dialects exhibit phonological proximity to Mon‑Khmer languages.
Symbolically, if Yue entities were expressed numerically to represent the proportions of racial blending that shaped ancient Annamese populations, a plausible model might assign weighted values of {2Y3Z4H}. This construct draws upon historical records, including census data showing population growth from 400,000 to 980,000 across the Han prefectures of Giaochỉ, Cửuchân, and Nhậtnam within a century (111 B.C., 11 B.C.). Additional accounts record that between 15,000 and 30,000 unmarried women from the NamViệt Kingdom were forcibly married to Qin soldiers during the brief Qin Dynasty (Lu Shih‑Peng, 1964).
Since antiquity, Muong‑speaking communities in mountainous regions have borrowed loanwords from Khmer or Kinh speakers during trade or in pursuit of prestige, integrating Mon‑Khmer terms into the broader Vietnamese linguistic mainstream. This mutual exchange facilitated the transmission of essential vocabulary across linguistic boundaries. These interactions persist today, as northern migrants continue to resettle in the western Central Highlands. Observing speech patterns in Muong villages of Hoàbình Province or Mon‑Khmer communities in Gialai and Kontum provides direct insight into this ongoing linguistic integration.
In practice, Vietnamese Kinh speakers in lowland areas rarely borrow vocabulary from Montagnard groups for concepts already present in their own language. Even among their close Muong relatives, lexical redundancy often negates the need for borrowing. More often, the reverse occurs. Additionally, some shared words may result from coincidence rather than direct borrowing. Examples include:
- chồmhỗm "squat" = Khmer /chorahom/
- chòhõ "stand" = Khmer /ch ho/
- tầmvong "stick" = Khmer /dm boong/
- rùmbeng "fuss" = Khmer /rm poong/
- hầmbàlàng "mix" = Khmer /ʔhm blang/
(Nguyễn Ngọc San, 1993, p. 45)
Austroasiatic theorists focus primarily on demonstrating shared linguistic roots between Mon‑Khmer and Vietnamese, rather than examining the sociolinguistic interactions that may explain these similarities. This approach underpins their classification of Vietnamese within the Austroasiatic family. At the same time, they have largely overlooked comparisons between Vietnamese and Sino‑Tibetan etymologies, likely due to limited awareness of potential affiliations. Their analytical framework also neglects structural parallels between Vietnamese and Chinese. This is where the second approach, etymological analysis, becomes essential.
The Austroasiatic classification compensates for discrepancies in Vietnamese by linking its development to other Viet‑Muong languages. This assumption implies a common ancestral root within the broader Yue linguistic family of southern China. Yet given the extensive migrations and historical shifts that shaped Vietnamese linguistic evolution, this classification remains incomplete without accounting for its deep Sinitic connections.
Basic word cognacy across languages often reflects linguistic contact rather than direct genetic affiliation. Indo‑European numeral systems offer a familiar example: September and October originally denoted the seventh and eighth months, yet now correspond to the ninth and tenth due to later calendar reforms. Such semantic drift illustrates how surface similarities can obscure deeper historical processes.
The Austroasiatic Mon‑Khmer classification was strategically constructed to encompass remnants of Indo‑Chinese languages spoken in isolated communities across Vietnam's western highlands south of the 16th parallel. It also incorporates dialectal enclaves further north in southern China, below the Yangtze River Basin, dating to prehistoric periods. This framework remains flexible, expanding to accommodate linguistic features that do not fit neatly elsewhere, including Daic or Zhuang elements.
Methodologically, Austroasiatic specialists adapted Indo‑European comparative models. appearing rigorous to novice researchers, to advance Mon‑Khmer etymological studies. In practice, Mon‑Khmer basic words represent the primary Austroasiatic layer that entered Vietnamese much later, particularly after independence and subsequent territorial expansion. As a result, nearly all Viet‑Muong dialects originating from the Red River Delta have been mapped onto southwestern Mon‑Khmer languages spoken in regions that did not historically belong to Vietnam before the 12th century. The assertion of Austroasiatic roots in Vietnamese thus traces a linguistic heritage to a population that had not yet emerged, the later Kinh people. Historically and linguistically, ancient Viet‑Muong resettlers had no direct affiliation with Mon‑Khmer speakers before the 2nd century B.C., nor with the Khmer Kingdom, which developed centuries later around the 10th century.
Conclusion
The historical development of the Vietnamese language cannot be understood through a single interpretive lens. It is a stratified palimpsest, combining indigenous Vietnamese heritage, Sinitic influence, successive waves of southward expansion, and centuries of regional contact. Its vocabulary, phonology, and syntax preserve the memory of ancient migrations, demographic shifts, and cultural convergences. Vietnamese is not a collection of isolated signs, but a living archive of history – constantly moving, absorbing, and transforming over time.
The lexical, phonological, and syntactic layers retain the imprint of ancient population movements and ethnolinguistic change. Vietnamese is therefore not an isolated code, but a living body of historical evidence, continually evolving and reshaping itself.
"Languages are not isolated codes but living archives of history."
This principle captures the central argument of the chapter: linguistic classification must be grounded in historical reality, not retrofitted to speculative prehistoric models.
References:
Aitchison, Jean. 1994. Language Change: Progress or Decay? Cambridge University Press.
Alves, Mark J. 2001. "What's So Chinese About Vietnamese?" In Papers from the Ninth Annual Meeting of the Southeast Asian Linguistics Society, edited by Graham W. Thurgood, 221, 242. Arizona State University, Program for Southeast Asian Studies.
Alves, Mark J. 2007. "Categories of Grammatical Sino‑Vietnamese Vocabulary." Mon‑Khmer Studies 37: 217, 229.
Alves, Mark J. 2009. "Loanwords in Vietnamese." In Loanwords in the World's Languages: A Comparative Handbook, edited by Martin Haspelmath and Uri Tadmor, 617, 637. De Gruyter Mouton.
An Chi. 2016, 2024. Rong chơi Miền Chữ nghĩa, Vols. 1, 5. Ho Chi Minh City: NXB Tổng hợp TP HCM.
An Chi. 2024. Từ nguyên. Ho Chi Minh City: NXB Tổng hợp TP HCM.
Baxter, William H. III. 1991. "Zhou and Han Phonology in Shijing." In Studies in the Historical Phonology of Asian Languages, edited by William G. Boltz and Michael C. Shapiro. Amsterdam: John Benjamins.
Benedict, Paul. 1975. Austro‑Thai Language and Culture. New Haven: HRAF Press.
Karlgren, Bernhard. 1957. Grammata Serica Recensa. Stockholm: Museum of Far Eastern Antiquities.
Karlgren, Bernhard. 1960. "Tones in Archaic Chinese." Museum of Far Eastern Antiquities 32: 113, 142.
Karlgren, Bernhard. 1964. "Loan Characters from Pre‑Han Texts II." Museum of Far Eastern Antiquities 36: 1, 106.
Kelley, Liam C. 2012. "The Biography of the Hồng Bàng Clan as a Medieval Vietnamese Invented Tradition." Journal of Vietnamese Studies 7 (2): 87, 122.
Nguyễn, Đình‑Hoà. 1966. Vietnamese‑English Dictionary. Tokyo: Charles E. Tuttle Company.
Nguyễn, Tài Cẩn. 1979. Nguồn gốc và Quá trình Hình thành Cách đọc Âm Hán Việt. Ho Chi Minh City: NXB Khoa học Xã hội.
Nguyễn, Tài Cẩn. 2000. Giáo Trình Ngữ âm Lịch sử Tiếng Việt. Ho Chi Minh City: NXB Giáo dục.
Pulleyblank, E. G. 1984. Middle Chinese: A Study in Historical Phonology. Vancouver: University of British Columbia Press.
Shafer, Robert. 1966, 1974. Introduction to Sino‑Tibetan, 4 vols. Wiesbaden: Otto Harrassowitz.
Sidwell, Paul. 2010. "The Austroasiatic Central Riverine Hypothesis." Journal of Language Relationship 4: 117, 134.
Taylor, Keith Weller. 1983. The Birth of Vietnam. Berkeley: University of California Press.
Wang, Li. 王力. 1948. HanYueyu Yanjiu 漢越語 研究. Lingnan Journal (Vol. 9. Issue 1. Jan. 1948): WangLi-1948-SinoVietnamese.pdf
Zhou, Zumo. 周祖謨. 1991. 中原音韻. Zhongyuan Yinyun. Beijing: Beijing Daxue Chubanche
FOOTNOTES
(1)^ English translation of quoted text:
The Shang Dynasty, also known as Yin or Yin‑Shang (c. 17th century BCE to
c. 11th century BCE), was the first dynasty in China to leave direct and
contemporaneous written records. In its early period, the Shang court
relocated frequently; during the final 273 years, King Pan Geng
established the capital at Yin (present‑day Anyang, China). For this
reason, the Shang Dynasty is also called the Yin Dynasty, and is sometimes
referred to as Yin‑Shang or simply Yin.
In the late Shang period, Chinese history transitioned from a
semi‑legendary era to one supported by reliable historical documentation.
The Shang succeeded the Xia as the next dynasty in Chinese history, and
compared with the Xia, it has far richer archaeological evidence.
The Shang state was originally a vassal domain under the Xia. After the
Shang chieftain Tang led allied vassal states to defeat the Xia empire in
the Battle of Mingtiao, he founded the dynasty. Over seventeen generations
and thirty‑one kings, the last ruler, King Zhou of Shang, was defeated and
killed by King Wu of Zhou in the Battle of Muye. https://zh.wikipedia.org/wiki/商朝 )
According to Vietnamese legends recorded in
Lĩnhnam Chích Quái, during China's Yin period, the sixth Hùng King incurred an invasion by
the Yin ruler because he "failed to perform the rites of court tribute."
This invading force was called the "Yin raiders." Meanwhile, the Đại Việt Sử ký Toàn thư
(Outer Annals, Hồng Bàng Chronicle) records that during the reign of the
sixth Hùng King, "there was unrest within the realm." As the enemy army
approached, a three‑year‑old boy from Phùđổng Village in Xianyou County
(or Wuning County) volunteered to take up arms. Leading the Hùng King's
troops to fight against the Yin army, he "brandished his sword and
advanced, with the royal troops following behind." The Yin ruler was slain
on the battlefield, and the boy immediately "cast off his garments,
mounted his horse, and ascended to heaven." Thereafter, the Hùng King
honored the child as "Phùđổng Thiênvương" and established a shrine for
worship.
However, the modern Vietnamese scholar Trần Trọng Kim, adopting an
evidence‑based approach, argued that the legend of a Shang invasion of
Vietnam "is entirely erroneous." He reasoned: "The Shang Dynasty of China
was located in the Yellow River Basin, today's Henan, Zhili, Shanxi, and
Shaanxi. The entire Yangtze region at that time was inhabited by
non‑Chinese tribes. From the Yangtze to our northern Vietnam is an
extremely long distance. Even if our country had a Hồng Bàng ruler at the
time, there would have been no established institutions, he would have
been no more than a local chieftain of a Mang tribe. Therefore, he would
have had no contact whatsoever with the Shang Dynasty. How could a war
between them have arisen? Moreover, Chinese historical records contain no
mention of such an event. On what basis, then, can one claim that the ‘Yin
raiders' were people of the Chinese Shang Dynasty?"
Thus, Trần Trọng Kim regarded the so‑called "Yin raiders" simply as "a
band of marauders who happened to be called by that name."
[UNLESS LACVIET HAD BEEN PART OF THE ANCIENT CHU
STATE(?) While they are about some legends of Thanh Giong, we focus only the
linguistic aspect of the matter here. However, there exist evidences that
the ancient Vănlang state had already been in contact with the Shang
Dynasty with the Shang's 10th century B.C. bronze artifacts found in Hunan
Province. ] In Chinese group to bring relic back to Hunan, by Lin Qi,: "A 3,000-year-old Chinese bronze, called min fanglei, will
soon return to its birthplace to be reunited with the lid from which it
was separated nearly a century ago. The reunion was made possible by a
private purchase by Chinese collectors on April 19 in New York. Acclaimed
as the "king of all fanglei", the square bronze, which dates to the Shang
Dynasty (c.16th century-11th century B.C), served as a ritual wine vessel.
It was excavated in Taoyuan, Hunan province, in 1922."
(2)^ Cultures through archaeological relics:
- Phùng Nguyên culture (2,000, 1,500 B.C.).
- Đồng Đậu culture (1,500, 1,000 BC)
- Gò Mun culture (1,000, 800 B.C.)
- Đông Sơn culture (1,000 B.C., 100 A.D.)
- Iron Age · Sa Huỳnh culture (1,000 B.C., 200 A.D.).
- Óc-Eo culture (1, 630 AD)
- The Gò Mun culture (c. 1,100-800 B.C.) was a culture of Bronze Age Vietnam during the Hong Bang reigns.

