Sinitic-Vietnamese : The Mon‑Khmer Misclassification

Restoring the Yue–Sinitic Foundations of Vietnamese

"名正言順 – When names are correct, understanding follows."

– Analects

by dchph

I) Introduction

The linguistic classification of Vietnamese has long stood at the crossroads of competing scholarly traditions, ideological commitments, and historical contingencies. For more than a century, debates over its origins have oscillated between two dominant frameworks: the Austroasiatic Mon‑Khmer hypothesis and the Sino‑Tibetan perspective. Each has generated its own intellectual lineage, methodological assumptions, and interpretive blind spots. Yet the prevailing narrative – shaped by colonial scholarship, nationalist sentiment, and the algorithmic biases of the digital age – has overwhelmingly favored the Austroasiatic model, often at the expense of deeper historical and linguistic evidence.

This chapter reopens the question of Vietnamese origins by tracing the shifting intellectual climate surrounding its classification. It examines how political motivations, academic inertia, and the proliferation of online misinformation have shaped contemporary understanding. At the same time, it introduces new methodological tools for identifying Sinitic‑Vietnamese cognates that have remained obscured within the Austroasiatic framework. These findings, grounded in Sino‑Tibetan historical phonology and supported by more than four hundred reconstructed etyma, challenge long‑standing assumptions and invite a reassessment of Vietnamese within a broader Sinitic‑Tibetan continuum.

The discussion unfolds as a continuous narrative, moving from the conceptual weaknesses of the Mon‑Khmer hypothesis to the historical realities of Yue ethnolinguistic identity, the sociopolitical forces that shaped Vietnamese scholarship, and the deeper historical logic that underpins Vietnamese ethnogenesis. The chapter culminates in a synthesis of linguistic, historical, and archaeological evidence demonstrating that Vietnamese emerged from Yue‑Taic foundations enriched by Sinitic contact – not from a Mon‑Khmer substratum.

With this foundation in place, the chapter now turns to the first major fault line: the rise of the Austroasiatic Mon‑Khmer hypothesis and the conceptual distortions it introduced.

II) The Mon‑Khmer blunders

This section reexamines the Austroasiatic Mon‑Khmer hypothesis through an anthropological and historical lens, with particular emphasis on the Yue foundations of Sinitic‑Vietnamese (VS). The analysis integrates newly identified basic cognates within Sino‑Tibetan (ST) etymologies, offering a revised understanding of Vietnamese linguistic development. The Austroasiatic family can only be meaningfully connected to the Yue if it includes populations originating from the Yangtze River basin (揚子江). When reframed through Sino‑Tibetan insights, these etymological findings illuminate the origins of basic Vietnamese words long misclassified as Mon‑Khmer. The significance of this reframing becomes increasingly evident as the chapter unfolds. (1)

Historical grounding is indispensable to any rigorous linguistic theory. Yet the widely circulated Austroasiatic Mon‑Khmer framework lacks such grounding. It is not supported by a comprehensive historical perspective nor substantiated by concrete records. Despite this, Austroasiatic advocates have long enjoyed academic prominence, while Sino‑Tibetan proponents – though relying on written documentation – continue to face institutional resistance.

For Mon‑Khmer theorists, the framework rests primarily on a narrow set of basic lexical items, some transcribed from oral speech and others reconstructed from early substratal forms. Yet the number of selectively identified loanwords represents less than 0.5 percent of the roughly 80,000 Sinitic‑Vietnamese entries potentially present in the Vietnamese lexicon – a negligible fraction of the language’s total vocabulary. While Vietnamese does exhibit some cognates with Austroasiatic Mon‑Khmer, it also reveals substantial overlap with Sino‑Tibetan etyma. Given the limited lexical support for the Austroasiatic narrative, its long‑standing dominance may prove unsustainable.

In contemporary discourse, the term Austroasiatic is often used interchangeably with Mon‑Khmer, though the former encompasses a broader geographic and conceptual scope. The hypothesis, which implies a southern origin, proposes that ancestral Austroasiatic populations migrated both northward and southward from a homeland in Southeast Asia. These migrations are believed to have crossed now‑submerged land bridges into South Asia, India, and southern China, while also extending toward present‑day island regions.

However, the theory remains speculative. Archaeological evidence suggests that populations reaching Oceania likely traveled by sea rather than land. Whether future research will expand the hypothesis to include Austronesian and Austric classifications – potentially linking Polynesian and Malaysian populations across the South Pacific – remains an open question.

Recent scholarship has reevaluated the Austroasiatic homeland, situating it around the Mekong River region (Sidwell 2010). Sidwell argues that Austroasiatic peoples could not have originated from the same racial stock as the Yue, whose historical habitat lay in South China below the Yangtze River. Prior to Sidwell’s revision, earlier theories had proposed Yunnan Province as the Austroasiatic homeland – an area now recognized as the primary territory of the ancient Yue, who expanded southward.

For the indigenous populations of the southern regions, Paul Benedict (1975) introduced a distinct linguistic branch under the designation Austro‑Thai, representing an alternative model separate from traditional Austroasiatic classifications and suggesting a potential link between Austroasiatic and Tai‑Kadai languages. (2)

Regardless of whether Vietnamese origins are framed through a Yue or Austroasiatic lens, it is important to recognize that the cultural artifacts unearthed in southern Vietnam predate the arrival of late Vietic‑speaking populations. These relics cannot be considered part of the ancestral heritage of the later inhabitants who came to dominate the region. As some readers may already infer, the Kinh majority emerged from racially mixed origins, primarily descending from Sinitic‑Vietnamese speakers shaped during the early phases of Han colonization. For this reason, the ancient Austroasiatic Mon‑Khmer populations of Indochina cannot be directly linked to the prehistoric Yue ancestry of the Vietnamese further north. (3)

It was not until the sixteenth century that Annamese Kinh groups began resettling the Mekong River Delta, migrating from the north. Long before this movement, ancestral Vietic speakers had already undergone significant linguistic and ethnic transformation through sustained contact with Yue and Han populations. These changes were later compounded by fusion with Mon‑Khmer and Chamic communities during westward and southward migrations. Upon reaching the southern tip of Càmau Cape, they encountered the southern Khmer people. The Mon‑Khmer influences found in Vietnamese culture today are therefore relatively recent, layered onto an existing cultural framework over the past ten centuries.

This historical trajectory precludes the possibility of the Vietnamese being direct descendants of prehistoric Austroasiatic Mon‑Khmer populations, particularly in linguistic terms. While genetic analysis may offer further insight into these ancestral relationships, such biological inquiries fall outside the scope of this linguistic investigation.

Etymologically, recent findings counter Austroasiatic claims by demonstrating that the Vietnamese lexical base shares linguistic traits with more than 400 fundamental words in the Sino‑Tibetan family. These roots span a vast geographical terrain across southern Asia, forming a robust foundation for reconsidering the classification of Vietnamese.

The Austroasiatic Mon‑Khmer hypothesis, which attempts to reconstruct prehistoric Yue – or, more broadly, Austroasiatic – lexicons, lacks historical substantiation and has produced notable linguistic inconsistencies. This framework has often served as the foundation for subsequent theories, requiring counterarguments to challenge its claims. In contrast, the Sino‑Tibetan hypothesis is supported by ancient Chinese records. Historical annals acknowledge the existence of the Bai Yue tribes, among whom certain groups are identified as ancestral pre‑Viet‑Muong peoples, including those from ÂuLạc (歐雒) and LạcViệt (雒越).

One of the most significant shortcomings of the Austroasiatic hypothesis is its inability to demonstrate a historical relationship – through ancient Khmer scripts – between basic Vietnamese words and Austroasiatic Mon‑Khmer cognates. This stands in stark contrast to the careful reconstruction of Sinitic roots in Sinitic‑Vietnamese etyma, supported by extensive documentation in Chinese characters across multiple linguistic periods. For example, the four basic human needs, ăn, ngủ, đụ, ỉa, align well with Chinese characters of equivalent meaning: 唵 ǎn (SV àm, "eat"), 卧 wò (SV ngoạ, "sleep"), 屌 diǎo (SV điệu, "copulate"), 屙 ē (SV a, "defecate").

III) The Yue foundations and the misreading of Vietnamese prehistory

The limitations of the Austroasiatic framework become even clearer when we examine how early theorists misinterpreted the Yue world. From the outset, the classification Austroasiatic was a conceptual misstep. Early Mon‑Khmer scholars possessed limited knowledge of ancient Vietnamese and Chinese history, and Sinology itself was still an emerging discipline in the seventeenth century (Lundbæk 1986). These scholars overlooked the historical Yue peoples and their polity known as LuóYuè (雒越, SV LạcViệt), dismissing them as folklore despite extensive documentation in Chinese annals and classical texts.

Whether due to the difficulty of studying the Yue or an inability to reconcile ancient Vietnamese history with Mon‑Khmer linguistic ancestry, Austroasiatic pioneers struggled to correlate the LạcViệt with other Yue groups – ŌuYuè (歐越, ÂuViệt), XīYuè (西越, TâyViệt), MǐnYuè (閩越, MânViệt), and WǔYuè (吳越, NgôViệt). Collectively known as the Hundred Yues (百越, BáchViệt), these groups encompassed the Chu State, the Kingdom of NamViệt (南越), and numerous related polities. Within this broader historical framework, the placement of Austroasiatic and Mon‑Khmer peoples remains ambiguous.

The proto‑Viet‑Muong speakers – equated with the LạcViệt – undeniably existed. Their presence requires theorizing dialectal forms of an ancestral Yue language that laid the foundation for the Vietic branch. To navigate this complexity, Austroasiatic specialists adopted a simplified approach: they equated proto‑Vietic forms with Austroasiatic ones, relying on cognate etyma in the Viet‑Muong subbranch that happen to align with modern Mon‑Khmer languages (Parkin 1991).

Yet historically, Austroasiatic theorists face significant challenges in reconstructing ancient linguistic forms that plausibly align with connections between the LạcViệt, the BaiYue, and Old Chinese. These interactions – dating back more than three millennia – left enduring imprints on the Vietnamese lexicon.

Consider Vietnamese cầy and chó, historically linked to 狗 gǒu (SV cẩu) and 犬 quán (Western Sichuan Mandarin /co1/), both meaning "dog." Their derived disyllabic forms further reinforce cognacy with Sino‑Tibetan roots:

犬坐 (quánzuò) → chồmhỗm ("to squat")
犬牙 (quányá) → răngkhểnh ("canine tooth")
小狗 (xiǎogǒu) → cầytơ ("puppy")
犬子 (quánzi, "pup") → concún ("puppy‑dog")

The form chồmhỗm, often attributed to Khmer chorohom (Nguyễn Ngọc San 1993), is merely coincidental. As later chapters demonstrate, such etyma – once emphatically grouped within the Mon‑Khmer classification (see Mei Tsu‑lin, What Makes Chinese So Vietnamese, Appendices) – are now validated as belonging to the Sino‑Tibetan family.

Recorded history strongly supports the Yue theory of an ancestral Yue language and its descendant speakers. Jerry Norman (1979) referred to this speech as a foreign, extinct language. Ancient Chinese classics describe the Yue people speaking archaic Yue forms, coexisting alongside what were likely early Taic languages spoken by subjects of the Chu State (楚國).

A notable example is The Yuèrén Song (越人歌), recorded in the Yue language in the sixth century B.C. by Ejun Zizhe (鄂君子皙, Ngạcquân Tửtích). Chinese linguists have analyzed its lyrics to recover fragments of the Yue lexicon. Additionally, Liu Bang (劉邦), founder of the Western Han Dynasty, likely spoke a subdialect of the Chu language, as he and his followers were former subjects of the Chu State before establishing the Han Dynasty.

Austroasiatic theorists lack historical records to substantiate their claims and cannot demonstrate how the Mon‑Khmer framework fits into the prehistory of the Viet state – i.e., the LạcViệt – especially when viewed within the broader Yue historical context.

At the same time, one must acknowledge the limitations inherent in postulating a Taic‑Yue or proto‑Yue language. Yet these limitations pale in comparison to the Austroasiatic hypothesis, which lacks historical evidence altogether. By the time the Mon peoples migrated from South China into the Indo‑Chinese peninsula, they had established a southern Mon‑Khmer homeland that later became a geographic pivot extending northward toward the Viet‑Muong group. This movement preceded subsequent waves of Yue‑mixed Han infantry who followed Han colonial administrators into ancient Annam.

It is therefore plausible that the LạcViệt ancestors of the Vietnamese spoke an archaic form of Yue – possibly a proto‑Viet‑Muong speech derived from a Taic ancestral language. Ironically, Austroasiatic theorists inadvertently incorporated such linguistic ancestry into their reconstructions of Vietnamese origins, even though it does not align with modern Mon‑Khmer languages.

A prehistoric Taic language likely served as the ancestral root of proto‑Mon‑Khmer. As linguistic evolution progressed, the Mon‑Khmer and Viet‑Muong branches diverged. The Mon‑Khmer lineage gradually shifted away from archaic Taic forms, while the Viet‑Muong branch integrated elements of Sinitic linguistic fusion through interactions with ancient emigrants from South China. Initially regarded as guest settlers regardless of social status, these migrants intermarried with local populations, ultimately forming the majority ethnic group now known as the Kinh – distinct from minority groups such as Mon‑Khmer speakers.

This dynamic reveals a deeper pattern: the Austroasiatic Mon‑Khmer narrative often mirrors claims made by certain Vietnamese scholars who assert national ownership over excavated artifacts found in annexed territories. Such claims frequently frame indigenous relics as ancestral heritage despite the absence of direct lineage. Similarly, the Austroasiatic theory relies heavily on linguistic and anthropological assumptions grounded in minority populations historically exiled to mountainous regions.

The analogy extends further. It is akin to branding modern American citizens as direct descendants of indigenous American Indians, or equating Taiwanese identity exclusively with Austronesian or Daic‑Han roots – despite both nations having less than three centuries of recognized history.

Any credible theory regarding the origins of languages – whether Indo‑European or Sino‑Tibetan – requires historical validation, often reinforced by written records, as seen with Latin, Greek, Pali, or Sanskrit. Without historical substantiation, such theories remain hypothetical. Both prehistoric and documented historical periods shape linguistic evolution, influencing whether languages endure or disappear. In essence, history is the foundation of both a nation and its language.

By contrast, the Austroasiatic Mon‑Khmer hypothesis lacks direct historical evidence and remains largely speculative, relying primarily on reconstructed basic lexicons. While its proponents have devised plausible classifications and methodologies, the theory notably omits any substantive connection to Chinese linguistic development. This study adopts linguistic principles from the Austroasiatic hypothesis to examine the structural framework that shaped its evolution. Its strength lies in methodological ingenuity – using limited sets of basic words, substituting historical records with archaeological evidence, and incorporating preliminary DNA analyses of Vietnamese Kinh populations where applicable.

Much like structuralist theories of sound change – impersonal, mechanical, and strictly formal – the methodological framework applied in Austroasiatic studies could theoretically be expanded to construct hypotheses for other languages. Using the same tools and methodologies, one could formulate a linguistic model for any language, even one without historical reference, and present it as a legitimate framework.

It is undeniable that Western methodologies have significantly advanced linguistic research, yielding breakthroughs across multiple families. Beginning with Indo‑European studies, these methods later extended into Sino‑Tibetan research, contributing innovations such as the reconstruction of Old Chinese phonology in the early twentieth century. The Austroasiatic Mon‑Khmer hypothesis emerged from this same wave of Western inquiry. Early scholars – including Maspero in the 1940s and Thomas in the 1960s – introduced Vietnamese lexical roots based on comparative Mon‑Khmer word lists. Their work gained prominence by identifying Mon‑Khmer‑Vietnamese cognates that conveniently aligned with structuralist frameworks addressing sound change and tonal genesis, reinforcing dominant Western paradigms of the time.

Under such influential conditions, the Austroasiatic Mon‑Khmer perspective on Vietnamese origins gained widespread acceptance, largely due to institutional backing. In academia, following the prevailing consensus often appears to be the safest path. Much like their Sinologist predecessors, Western‑trained Vietnamese scholars – emerging from nearly a century of French colonial rule – generally conformed to new rationalizations, whether under external pressure or voluntarily. Consequently, local specialists frequently aligned themselves with the Austroasiatic camp to ensure their research received recognition, avoiding the obscurity that befell many overlooked studies in earlier decades.

IV) Academic inertia and the politics of Vietnamese classification

For many scholars, adherence to the Austroasiatic paradigm has been less a matter of intellectual conviction than of academic survival – an effort to remain aligned with mainstream scholarly circles. Yet most researchers entrenched in this framework have produced few genuinely groundbreaking insights, remaining caught in a scholarly merry‑go‑round. Escaping this paradigm is a challenge only they themselves can resolve.

A reconsideration of the geographical foundations of Austroasiatic theories reveals striking contrasts with the historical realities of northern Vietnam. Austroasiatic models typically situate their origins in the southeastern portion of the Southeast Asian peninsula, where the Mekong Basin meets the sea. In contrast, northern Vietnam maintained deep historical ties to South China. These regions were once home to ancient Yue speakers – LuoYue, OuYue, and related groups documented extensively in Chinese historical texts – who later intermingled with early Han resettlers following the annexation of the Nam Việt Kingdom into the Han Empire in 111 B.C. Linguistically, the ancient Vietnamese language and certain Chinese dialects developed in parallel through comparable processes of racial and cultural blending.

By 939 A.D., it is highly plausible that the ancient Annamese population possessed bilingual proficiency, conducting official affairs in Middle Chinese (MC) while maintaining colloquial speech in a Sinitic‑Yue mixed language. This hybrid tongue – referred to here as Ancient Annamese – would have been intelligible to metropolitan subjects within the Nam Hán State, encompassing Guangdong and Guangxi.

Despite methodological shortcomings, Austroasiatic theorists did introduce a form of linguistic instrumentalism into their studies. Their work produced a catalog of over one hundred basic Vietnamese words that appear to share cognates with Mon‑Khmer languages. However, based on recent Sino‑Tibetan findings presented later in this research, many of these words likely emerged from linguistic contact with Mon‑Khmer speakers residing in remote mountainous regions. Such contact likely dates back to a distant period when Mon‑Khmer and Viet‑Muong speakers – displaced by Han expansion – remained in their ancestral homelands.

Over time, both local inhabitants and northern settlers were absorbed into a colonial society, forming the emergent majority now recognized as the Kinh. Linguistic convergence likely occurred during interactions between Annamese and Mon‑Khmer speakers as Vietnam expanded southward beyond the 16th parallel after the twelfth century, culminating in the late eighteenth century when Cà Mau reached the Gulf of Thailand. Consequently, linguistic exchanges and borrowings became inevitable, as evidenced by Chamic lexical elements in the Central Hue subdialect.

Through ongoing territorial expansion and intermarriage between settlers and indigenous populations, Mon‑Khmer elements gradually integrated into Vietnamese vocabularies over time.

The homeland of all Southeast Asian languages lies in the same general region as Vietnamese. Merritt Ruhlen, in The Origin of Language (1994), outlines the Austric linguistic family and its classification, situating Austroasiatic, Miao‑Yao, Daic, and Austronesian within a broader Southeast Asian linguistic continuum. His analysis underscores the deep antiquity of agricultural migrations from the Yellow River and Yangtze basins, which spread southward into Vietnam and Thailand by 5,000 B.P. and eastward into Taiwan, the Philippines, and Oceania.

Ruhlen’s broader framework highlights two essential stages in historical linguistics:

Classification (Taxonomy) – defining language families before reconstruction begins.

Comparative method (Reconstruction) – addressing sound correspondences, homelands, and proto‑form evolution.

He critiques twentieth‑century Indo‑Europeanists for reversing these analytical levels, insisting that reconstruction alone could determine classification. This inversion led to theoretical stagnation, where anything beyond the obvious was deemed outside the comparative method’s scope.

This methodological flaw is precisely what Austroasiatic theorists perpetuated in the Mon‑Khmer hypothesis concerning Vietnamese. Their classification efforts relied heavily on reconstructed basic words and sound‑change patterns, yet lacked engagement with historical documentation and failed to incorporate Sino‑Tibetan etymological continuity.

As linguistic research advances, reassessing the Austroasiatic Mon‑Khmer hypothesis through comparative analysis – paired with historical records – is essential for a comprehensive understanding of Vietnamese linguistic ancestry. The methodology applied by Austroasiatic theorists was rooted in rigid mechanical paradigms, often modeled after mathematical formulas derived from Indo‑European linguistic schools – approaches historically insufficient for Southeast Asian linguistic reconstruction. These early methods lacked substantial evidence regarding the people, their language, and their homeland. Consequently, they failed to establish the language family prior to engaging in comparative analysis – an inversion of proper historical linguistic methodology, as Ruhlen emphasized.

This brings the discussion to a deeper question: if the Mon‑Khmer hypothesis lacks historical grounding, what historical logic does align with the evolution of Vietnamese? To answer this, the chapter now turns to the role of naming, state formation, and ethnolinguistic identity in shaping Vietnamese self‑understanding.

V) Historical naming, state identity, and the Yue-Vietnamese continuum

Historical names play a decisive role in distinguishing origins, timelines, and cultural affiliations. In academic discourse, naming conventions shape the very frameworks through which linguistic classifications are constructed. The term Sinitic, for example, designates an entity that did not yet exist in antiquity, whereas Yue refers to an earlier, distinct ethnolinguistic group that predated the southward movement of northwestern and northeastern resettlers. These migrants intermarried with native Yue populations, eventually forming the entity later recognized as Sinitic. A comparable transformation unfolded further south, in the region of present‑day northern Vietnam, giving rise to the people who would eventually be known as Vietnamese.

Geopolitically, the historical name Vietnam emerged long after the term Annamese. Yet Austroasiatic theorists constructed a linguistic narrative in which Austroasiatic predates the Mon‑Khmer epoch, followed by Viet‑Muong, which then evolved into modern Vietnamese – an assertion lacking concrete historical substantiation. (4)

As Confucius observed, 名正言順 – "When names are correct, understanding follows." The Austroasiatic camp appears not to have considered that the ancient state names associated with "Vietnam," its people, and its language only materialized after 939 A.D. During this period, the polity was known as Nhà Ngô (吳朝 Wǔcháo, the Ngô Dynasty), a designation bearing no direct relation to the nominal state of Annam. Over the following centuries, the nation gradually evolved into an independent polity through successive changes in state names.

Notably, during this era, the historical Nam Hán Kingdom (南漢 NánHàn, "Southern State of Han"), which encompassed coastal stretches of present‑day Guangdong and the northwestern portion of northern Vietnam, adopted an intriguing nomenclature. King Liu Yan (劉嚴) initially named his newly founded state ĐạiViệt (大越 DàYuè, "The Great Viet") before adopting the enduring title NamHán (南漢), as documented in Chinese historical records. This shift reflects the demographic composition of the population itself, where Việt and Hán symbolized the integration of these identities.

The name ĐạiViệt would later become synonymous with ancient Vietnam beginning with the Lý Dynasty (1009–1225). Interestingly, 大越 DàYuè appears multiple times in Chinese history. One notable instance occurred in 895, during the decline of the Tang Dynasty, when Dong Chang (董昌) declared himself king and established 大越羅平國, later known as 越州 Yuezhou, in what is now Shaoxing (紹興), Zhejiang Province.

Figure 1 - The Southern Han (917-971 A.D.)
Source: https://en.wikipedia.org/wiki/Southern_Han

History is the soul of a nation and her language.

Further reinforcing this historical continuity, today’s Guangdong Province retains its ancestral state designation NamJyut Kwok (南越國, SV NamViệtquốc), a reminder of the region’s Yue origins.

In Annam, successive dynasties adopted varying state names (quốchiệu 國號) throughout antiquity. Linguistically, the categories Austroasiatic Mon‑Khmer, Viet‑Muong, Vietic, and even Vietnamese are modern scholarly constructs used to describe the independent Annam of the tenth century. At that time, its territorial boundaries extended only to what is now Hà Tĩnh Province and did not yet encompass the southern‑central region.

Within the Austroasiatic Mon‑Khmer theoretical framework, Vietnamese is conveniently aligned with each state name assigned in later historical periods, projecting its origins backward into prehistory. Yet the intrinsic linguistic nature of pre‑Vietnamese – before evolving into modern Vietnamese – was not identical to what Austroasiatic theorists classify as proto‑Vietnamese.

Contemporary discourse on Vietnamese often shifts toward analyzing distinct linguistic influences, such as Chamic elements embedded in regional subdialects like Huế. Indicative pronouns such as ni, nớ, mô, tê, ri, rứ, chừ, and others have been theorized to originate from Chinese influence, among additional sources. This underscores the reality that discussions of Vietnamese encompass disparate linguistic strata. The Vietnamese people and language of the tenth century likely bore little resemblance to the Austroasiatic Mon‑Khmer linguistic enclaves referenced in early twentieth‑century scholarship.

These assumptions are further challenged by the earliest forms of Nôm vocabulary preserved in fifteenth‑century texts such as Phật thuyết Đại Báo Phụ mẫu Ân trọng Kinh (Buddhist Canon on Returning Favors to One’s Parents).

The Austroasiatic Mon‑Khmer hypothesis relies heavily on a curated list of basic words purported to determine Vietnamese linguistic origins. Yet such an approach is limited in its ability to reflect historical reality – particularly given that Vietnam was first officially designated as a state only in 1804 under King Gia Long of the Nguyễn Dynasty.

This brings the narrative to its final movement: the broader historical and linguistic landscape in which Vietnamese emerged – a landscape shaped not by Mon‑Khmer ancestry, but by Yue‑Taic foundations enriched through centuries of Sinitic contact.

VI) Taic-Yue foundations and the historical logic of Vietnamese

By the time Vietnamese resettlers reached the southern territories annexed from Cambodia in the sixteenth century, their interaction with Khmer communities was limited. The ethnic composition of these regions today reflects roughly equal proportions of three major groups: Vietnamese Kinh, Chinese Teochew, and Khmer. If the Vietnamese word chồmhỗm ("squat") resembles Khmer chrohom, such similarity is unsurprising given geographic overlap. Yet chồmhỗm can also be traced to 犬坐 (quánzuò), a term recorded more than two millennia ago in the Zuǒzhuàn (左傳), revealing a deeper etymological lineage that predates Mon‑Khmer influence.

Labels such as Austroasiatic, Mon‑Khmer, Viet‑Muong, and Vietic are modern constructs applied retrospectively to historical linguistic entities that remain largely theoretical. Contemporary census figures – listing Khmer and Chinese minorities as 1.37% and 0.78% of Vietnam’s population – obscure broader historical realities: early Chinese immigrants gradually assimilated into the Kinh majority, while Khmer communities have been legally and socially integrated as Vietnamese nationals since the early 1960s.

From a historical perspective, Austroasiatic specialists often disregard Vietnamese history, likely because no historical evidence supports their linguistic hypothesis. The southern territories of modern Vietnam once belonged to the Khmer Kingdom, yet events in Khmer history bear no direct connection to ancient Annam.

Figure 2 - CHINA 2500 B.C.-1500 B.C.

(See Time maps of China - Source: http://www.timemaps.com/history/china-1500bc)

The Yellow River region of China, under the rule of the Shang Dynasty (1766–1122 B.C.), marks the beginning of a long succession of Chinese dynasties documented in written history. This flourishing Bronze Age civilization featured some of East Asia’s earliest true cities, each housing tens of thousands of inhabitants.

While the Shang kings likely exercised direct authority over only portions of the region, their influence extended across a much larger expanse of northern and central China. Subordinate lords and tribal chiefs, ruling their own territories independently, nevertheless recognized the Shang dynasty’s overarching sovereignty.

A system of writing, an early form of the modern Chinese script, was already in use during this period, alongside advanced bronze-working techniques. Shang craftsmen produced exceptionally refined bronzes, regarded among the finest in world history.

Cultural influences radiating from the Yellow River region introduced more advanced material traditions to the Yangtze River basin, fostering population expansion. A distinctive, non-literate yet materially sophisticated culture was emerging in this area.

To situate Vietnamese linguistic evolution within a broader historical framework, we must return to the Yue groups who once inhabited the territorial domain of the NamViệt Kingdom (南越王國). These include:

LạcViệt (雒越, LuóYuè) and ÂuLạc (歐雒, ŌuLuó): widely regarded as ancestral populations of early Vietnamese.
TâyViệt (西越, XīYuè): considered precursors to Cantonese and Teochew‑speaking communities.
ĐôngViệt (東越, DōngYuè): proto‑Fukienese groups associated with the region now known as Fujian.

This progression of Yue linguistic strata illustrates how Annam remained deeply interwoven with China’s historical and linguistic legacy even as it gradually charted an independent course.

Historical records suggest that both before and after 111 B.C., the Yue tribes likely spoke mutually intelligible variants of a common ancestral Yue language. This included the speech of neighboring communities in the Chu State (楚國), whose linguistic patterns may reflect early Taic affiliations. King Liu Bang (劉邦), founder of the Western Han Dynasty, and his followers were originally subjects of Chu. Their ancestors likely spoke an archaic Daic language belonging to the broader Taic family, which gradually diverged into distinct forms during the Warring States Period, culminating in the Qin unification under Qin Shi Huang (秦始皇).

Following the Han conquest and annexation of NamViệt (南越) in 111 B.C., its inhabitants likely retained mutual intelligibility with neighboring Yue communities. Over time, linguistic divergence increased as geographic distances widened. The territorial expanse of ancient NamViệt included parts of northeastern Vietnam, notably the Han prefecture of Giaochâu (交州), which later transitioned into the protectorate of Annam, known as the Pacified South.

From this multilingual environment, early forms of Vietnamese and Cantonese emerged from Taic‑Yue foundations, gradually incorporating Sinitic elements through successive periods of Han rule. These linguistic trajectories show minimal overlap with Austroasiatic Mon‑Khmer classifications. Even the Daic ancestry of early Cantonese populations diverges from Austroasiatic models. While the Mon‑Khmer framework has contributed methodologically, it fails to align with the documented historical processes that shaped the evolution of the Annamese language.

In contrast, the Sino‑Tibetan classification aligns more closely with historical narratives. It traces linguistic convergence between ancient Vietnamese and Middle Chinese (MC), suggesting bilingual proficiency among Annamese populations by the tenth century. Language contact between Annamese and Cantonese likely persisted within the NamHán State, which encompassed today’s Guangdong, Guangxi, and parts of northern Vietnam.

Ultimately, it is difficult to reconcile Austroasiatic Mon‑Khmer elements with the historical development of Annamese – a language whose formative evolution reflects closer affinities with Sinitic‑Yue transitions than with Mon‑Khmer derivations.

Historically, China – referred to as the Middle Kingdom (中國 Zhōngguó) – functioned as a central state among smaller vassal entities. Today, it operates as a union of multinational regions under centralized governance. Within this framework, regions such as Tibet, Inner Mongolia, Xinjiang, and the Daic‑Kadai areas of Guangxi, along with Hong Kong and Taiwan, retain distinct historical identities regardless of linguistic or ethnic composition.

Within China’s borders, most Sinitic languages are classified as dialects of the broader Sinitic family. Consider Annam, which once served as a Chinese prefecture. Hypothetically, if Canton were to separate from China and evolve into an independent state, it could eventually resemble Vietnam or Taiwan – an outcome consistent with historical patterns. For example, Hainanese, spoken on Hainan Island, is linguistically related to MinNan, introduced by Fujianese settlers during the Han Dynasty. Yet, as with Teochew, speakers of these linguistic cousins often struggle to understand one another despite shared historical roots.

Understanding Vietnam’s development requires recognizing its emergence from a breakaway prefecture of Greater China. Had Vietnam remained part of China, there would be no debate over whether its population spoke a Sinitic language, similar to Cantonese or Fukienese – both of which fall under the Sino‑Tibetan classification.

By contrast, the Austroasiatic Mon‑Khmer perspective lacks historical grounding, not only linguistically but also in relation to the former Khmer Kingdom, which developed independently of Vietnam’s trajectory. Politically, no aspect of Khmer history aligns with the narrative of ancient Vietnam under imperial Chinese influence.

After a millennium of Chinese colonial rule, it is notable that the primary language spoken in Vietnam did not evolve into a Sinitic language simply because the region separated from mainland China. Instead, it developed into full‑fledged Vietnamese, with Middle Vietnamese emerging as an independent linguistic entity around 939 A.D.

To understand Vietnamese linguistic evolution, its development may be likened to that of English. Just as Greek and Latin lexical components enriched the Anglo‑Saxon foundation of English, integrating within the Indo‑European family, so too did Sino‑Tibetan and Sinitic elements merge with the Yue substratum, shaping Vietnamese into its distinct form.

Conclusion

The "rainwash effect" illustrates how entrenched narratives about Vietnamese origins can be rinsed clean through renewed scrutiny. The discovery of hundreds of Sinitic‑Vietnamese cognates demonstrates that Vietnamese shares structural and lexical affinities with Chinese that cannot be explained by Mon‑Khmer affiliation alone. While Austroasiatic models have long dominated, they falter when confronted with the disyllabicity, tonality, and deep etymological parallels that bind Vietnamese to the Sino‑Tibetan continuum.

This reclassification is not merely a matter of linguistic taxonomy; it reshapes our understanding of Vietnam’s cultural and historical trajectory. By challenging inherited frameworks and resisting algorithmic bias, this study calls for a more balanced approach – one that acknowledges both contact and genetic affinity. The Rainwash metaphor thus becomes a call to scholars and readers alike: to clear away the residue of ideology and rediscover Vietnamese as a language forged at the crossroads of Yue, Taic, and Sino‑Tibetan traditions.

References

Aitchison, Jean. Language Change: Progress or Decay? Cambridge University Press, 1994.

Alves, Mark J. "What’s So Chinese About Vietnamese?" In Papers from the Ninth Annual Meeting of the Southeast Asian Linguistics Society, edited by Graham W. Thurgood, 221–242. Arizona State University, 2001.

Alves, Mark J. "Categories of Grammatical Sino‑Vietnamese Vocabulary". Mon‑Khmer Studies 37 (2007): 217–229.

Alves, Mark J. "Loanwords in Vietnamese" In Loanwords in the World’s Languages: A Comparative Handbook, edited by Martin Haspelmath and Uri Tadmor, 617–637. De Gruyter Mouton, 2009.

An Chi. Rong chơi Miền Chữ nghĩa (Vols. 1–5). Ho Chi Minh City: NXB Tổng hợp, 2016–2024.

An Chi. Từ nguyên. Ho Chi Minh City: NXB Tổng hợp, 2024.

Anderson, I. J. "Some Fossil Mammal Localities in Northern China" The Museum of Far Eastern Antiquities 14 (1945): 29–43.

Anttila, Raimo. Historical and Comparative Linguistics. Amsterdam/Philadelphia: John Benjamins, 1989.

Bai, Tiao‑Zhou. "集韵聲類考" Bulletin de l’Institut de Historique et Philologique 3, no. 2 (1928): 159–238.

Bai, Tiao‑Zhou. "關中聲調實驗錄" Bulletin de l’Institut de Historique et Philologique 4, no. 4 (1934): 447–488.

Baldi, Philip (de.). Patterns of Changes, Change of Patterns: Linguistic Change and Reconstruction Methodology. New York: Mouton de Gruyter, 1991.

Barker, Milton E. "Viet‑Muong Tone Correspondences" In Norman Zide (ed.), Studies in Comparative Austroasiatic Linguistics. The Hague: Mouton, 1966.

Benedict, Paul K. Austro‑Thai Language and Culture (With a Glossary of Roots). HRAF Press, 1975.

Baxter, William H. III. "Zhou and Han Phonology in Shijing" In William G. Boltz and Michael C. Shapiro (eds.), Studies in the Historical Phonology of Asian Languages. Amsterdam: John Benjamins, 1991.

Bình Nguyên Lộc. Nguồn gốc Mã Lai của Dân tộc Việt Nam. Los Alamitos: Xuân Thu, 1987 [orig. Saigon: Bách Bộc, 1971].

Bloomfield, Leonard. Language. New York: Henry Holt, 1933.

Bo Yang. Zizhi Tongjian (Modern Chinese edition, 72 vols.). Taipei: Yuan‑Liou Publishing, 1983–1993.

Bo Yang. "醜陋的中國人"(The Ugly Chinaman). Taipei: Yuan‑Liou Publishing, 1985.

Bodman, Nicholas C. "Proto‑Chinese and Sino‑Tibetan" In Frans Van Coetsem et al. (eds.), Contributions to Historical Linguistics. Leiden: Brill, 1980.

Boltz, William G. "Old Chinese Terrestrial Names in Saek" In William G. Boltz and Michael C. Shapiro (eds.), Studies in the Historical Phonology of Asian Languages. Amsterdam: John Benjamins, 1991.

Boodberg, Peter A. Selected Works of Peter A. Boodberg. Compiled by Alvin P. Cohen. Berkeley: University of California Press, 1979.

Breton, Roland J.‑L. Geolinguistics: Language Dynamics and Ethnolinguistic Geography. Ottawa: University of Ottawa Press, 1991.

Brodrick, Alan Houghton. Little China: The Annamese Lands. London: Oxford University Press, 1942.

Buck, Pearl. Impératrice de Chine (Imperial Woman). Paris: Le Livre de Poche, 1992 [orig. 1956].

Bùi Thanh Khiên. Nghệ thuật Nói lái. Ho Chi Minh City: NXB Tổng hợp, 2017.

Bynon, Theodora. Historical Linguistics. Cambridge: Cambridge University Press, 1977.

Camus, Albert. L’étranger. Paris: Gallimard, 1942.

Cao Xuân‑Hạo. Tiếng Việt Văn Việt Người Việt. Ho Chi Minh City: NXB Trẻ, 2001.

Chao Yuan‑Ren. "The Non‑Uniqueness of Phonemic Solutions of Phonetic Systems" Bulletin de l’Institut de Historique et Philologique 4, no. 4 (1933): 363–398.

Chao Yuan‑Ren. "Tone and Intonation in Chinese" Bulletin de l’Institut de Historique et Philologique 4, no. 2 (1933): 119–134.

Chen Guo‑hong. 成語辭典 (Chinese Idioms Dictionary). Hunan: Yuelu Chubanshe, 1988.

Chen Yin‑Ke. "李唐氏族之推測" Bulletin de l’Institut de Historique et Philologique 3, no. 1 (1928): 38–48.

Chen Yin‑Ke. "李唐氏族之推測後記" Bulletin de l’Institut de Historique et Philologique 3, no. 4 (1928): 511–516.

Chou Fa‑Kao et al. 漢字古今音彙 (Hanzi Gujin Yinhui). Hong Kong: Chinese University of Hong Kong, 1973.

Chou Fa‑Kao. "Monosyllabics of Chinese Reconsidered" Tsing‑hua Journal of Chinese Studies 14, no. 1–2 (1982): 105–110.

Coblin, W. South. "Notes on Western Han Initials" Tsing‑hua Journal of Chinese Studies 14, no. 1–2 (1982): 111–132.

Coblin, W. South. A Handbook of Eastern Han Sound Glosses. Hong Kong: Chinese University Press, 1983.

Coetsem, Frans Van and Linda Waugh (eds.). Contributions to Historical Linguistics. Leiden: Brill, 1980.

Cohen, Alvin P. Selected Works of Peter A. Boodberg. Berkeley: University of California Press, 1979.

Darwin, Charles. On the Origin of Species. London: 1859 [150th Anniversary Edition, Bridge Logos Foundation, 2009].

De Lacouperie, Terrien. The Languages of China Before the Chinese. London: 1887 [Taiwan reprint, 1966].

Delinger, B. Paul. "The Ch’ung Niu Problem and Vietnamese" Tsing‑hua Journal of Chinese Studies 11, no. 1–2 (1979): 217–227.

Ding Bangxin (ed.). 中國語言學論集 (Collection of Surveys of Chinese Linguistics). Taipei: You Shi Wenhua, 1977.

Ding Shan. "宗法考源" Bulletin de l’Institut de Historique et Philologique 4, no. 4 (1934): 399–416.

Dong Zuo‑Bin. "殷曆中幾個重要問題" Bulletin de l’Institut de Historique et Philologique 4, no. 3 (1933): 331–353.

Dong Zuo‑Bin. "譠" Bulletin de l’Institut de Historique et Philologique 4, no. 2 (1933): 159–174.

Dragunow, A. "對於中國古音重訂的貢獻" Bulletin de l’Institut de Historique et Philologique 3, no. 2 (1928): 295–307.

Drake, F. S. (ed.). Symposium on Historical Archaeological and Linguistic Studies on Southern China, South‑East Asia and the Hong Kong Region. Hong Kong: Hong Kong University Press, 1967.

Đào Trọng Đủ. Traugiồi TiếngViệt (Vietnamese Revisited). Toronto: Quê Hương, 1983.

Đỗ Hoàng Diệu. Lưng Rồng (Bóng Đè và những truyện mới). Hanoi: Nhã Nam, 2018.

Jeffers, J. Robert & Iles Lehiste. 1979. Principles and Methods for Historical Linguistics. London and Cambridge: The MIT Press.

Karlgren, Bernhard. 1957. Grammata Serica Recensa. Stockholm: Museum of Far Eastern Antiquities.

Karlgren, Bernhard. 1960. “Tones in Archaic Chinese.” Museum of Far Eastern Antiquities 32: 113–142.

Karlgren, Bernhard. 1964. “Loan Characters from Pre‑Han Texts II.” Museum of Far Eastern Antiquities 36: 1–106.

Kelley, Liam C. 2012. “The Biography of the Hồng Bàng Clan as a Medieval Vietnamese Invented Tradition.” Journal of Vietnamese Studies 7 (2): 87–122.

Nguyễn, Đình‑Hoà. 1966. Vietnamese‑English Dictionary. Tokyo: Charles E. Tuttle Company.

Nguyễn, Ngọc San. 1993. Tìm hiểu về Tiếng Việt Lịch sử. TP HCM: NXB Giáo dục.

Nguyễn, Tài Cẩn. 1979. Nguồn gốc và Quá trình Hình thành Cách đọc Âm Hán Việt. Ho Chi Minh City: NXB khoahọc Xã hội.

Nguyễn, Tài Cẩn. 2000. Giáo Trình Ngữ âmlịch sử Tiếng Việt. Ho Chi Minh City: NXB giáodục.

Pulleyblank, E. G. 1984. Middle Chinese: A Study in Historical Phonology. Vancouver: University of British Columbia Press.

Shafer, Robert. 1966–1974. Introduction to Sino‑Tibetan, Vols 1-4. Wiesbaden: Otto Harrassowitz.

Sidwell, Paul. 2010. “The Austroasiatic Central Riverine Hypothesis.” Journal of Language Relationship 4: 117–134.

Taylor, Keith Weller. 1983. The Birth of Vietnam. Berkeley: University of California Press.

Wang, Li. 王力. 1948. HanYueyu Yanjiu 漢越語研究. Lingnan Journal (Vol. 9. Issue 1. Jan. 1948): WangLi-1948-SinoVietnamese.pdf

Zhou, Zumo. 周祖謨. 1991. 中原音韻. Zhongyuan Yinyun. Beijing: Beijing Daxue Chubanche

FOOTNOTES

(1)^ The Austroasiatic (Austro-Asiatic) languages, in recent classifications synonymous with Mon-Khmer, are a large language family of continental Southeast Asia, also scattered throughout India, Bangladesh, and the southern border of China. The name Austroasiatic comes from the Latin words for "south" and "Asia", hence "South Asia". Among these languages, only Khmer, Vietnamese, and Mon have a long-established recorded history, and only Vietnamese and Khmer have official status (in Vietnam and Cambodia, respectively). The rest of the languages are spoken by minority groups. Ethnologue identifies 168 Austroasiatic languages. These form thirteen established families (plus perhaps Shompen, which is poorly attested, as a fourteenth), which have traditionally been grouped into two, as Mon-Khmer and Munda. However, recent classifications have abandoned Mon–Khmer as a taxon, either reducing it in scope or making it synonymous with the larger family.

Austroasiatic languages have a disjunct distribution across India, Bangladesh and Southeast Asia, separated by regions where other languages are spoken. They appear to be the autochthonous languages of Southeast Asia, with the neighboring Indic, Tai, Dravidian, Austronesian, and Tibeto-Burman languages being the result of later migrations (Sidwell & Blench, 2011). ( Source: https://en.wikipedia.org/wiki/Austroasiatic_languages)

(2)^ The first proposal of a genealogical relationship was that of Paul Benedict in 1942, which he expanded upon through 1990. This took the form of an expansion of Wilhelm Schmidt's Austric phylum, and posited that Tai-Kadai and Austronesian had a sister relationship within Austric, which Benedict then accepted. Benedict later abandoned Austric but maintained his Austro-Tai proposal. This remained controversial among linguists, especially after the publication of Benedict (1975) whose methods of reconstruction were idiosyncratic and considered unreliable. For example, Thurgood (1994) examined Benedict's claims and concluded that since the sound correspondences and tonal developments were irregular, there was no evidence of a genealogical relationship, and the numerous cognates must be chalked up to early language contact.

However, the fact that many of the Austro-Tai cognates are found in core vocabulary, which is generally resistant to borrowing, continued to intrigue scholars. There were later several advances over Benedict's approach: Abandoning the larger Austric proposal; focusing on lexical reconstruction and regular sound correspondences; including data from additional branches of Tai-Kadai, Hlai and Kra; using better reconstructions of Tai-Kadai; and reconsidering the nature of the relationship, with Tai-Kadai possibly being a branch (daughter) of Austronesian. (Source: https://en.wikipedia.org/wiki/Austroasiatic_languages

(3)^ See Ilia Peiros's Some Thoughts on the Problem of the Austro-Asiatic Homeland

(4)^ The name "Việtnam" [viə̀tnaːm] is a variation of "NamViệt" (南越 Nányuè; literally "Southern Việt"), a name that can be traced back to the Triệu Dynasty of the 2nd century B.C. The word Việt originated as a shortened form of BáchViệt (百越 BǎiYuè, "Bod"), a word applied to a group of peoples then living in southern China and Vietnam. The form "Vietnam" (越南) is first recorded in the 16th-century oracular poem Sấm Trạng Trình. The name has also been found on 12 steles carved in the 16th and 17th centuries, including one at Bao Lam Pagoda in Haiphong that dates to 1558.

Between 1804 and 1813, the name was used officially by Emperor Gia Long. It was revived in the early 20th century by Phan Bội Châu's History of the Loss of Vietnam, and later by the Vietnamese Nationalist Party. The country was usually called Annam until 1945, when both the imperial government in Huế and the Vietminh government in Hanoi adopted Vietnam. Since the use of Chinese characters was discontinued in 1918, the alphabetic spelling of Vietnam is official. (Source: https://en.wikipedia.org/wiki/Vietnam)

Monday, January 12, 2026

The Mon‑Khmer Misclassification