Migration, Integration, and Nationalism in Historical Context
by dchph
Vietnam is uniquely remembered for having repelled three Mongol invasions led by Genghis Khan and his heirs, who had previously shattered the Song Dynasty and established the Yuan Dynasty (元朝) on Chinese soil, a regime that endured for nearly a century.
In this context, nationalism refers to the indomitable spirit of the Vietnamese people and their hard‑won independence—a spirit they have consistently defended. This fervent nationalism has shaped their anthropological identity, especially their national language. It helps explain why many Vietnamese reject genetic affiliation with the Chinese and question aspects of the Austroasiatic theory, instead affirming an ancestral connection to the Yue, a non‑Chinese lineage, an interpretation steadfastly upheld by patriotic Vietnamese scholars.
I) National identity and linguistic formation
In an ethnically diverse society, elements assimilated into the Vietnamese melting pot emerge distinctly as Vietnamese, regardless of whether a person is of Chinese, Chamic, or Khmer descent. The history of the nation known as Vietnam is a chronicle of descendants from those who arrived either as conquerors or as refugees fleeing hunger and oppression from the north. Their long southern journey, culminating at the tip of the Indo‑Chinese peninsula, spanned nearly ten centuries during which they waged continuous wars against northern and southern external enemies, beginning as early as 939 A.D., in the relentless pursuit of national sovereignty.
Vietnamese history is shaped not only by resistance wars but also by ongoing patterns of immigration and emigration, much like China's. Taiwan offers a parallel: successive waves of Chinese migrants from the mainland settled over generations, while hundreds of thousands of Vietnamese women married into Taiwanese families. This long‑standing exchange continues today.
The Vietnamese people are also descendants of racially mixed immigrants from southern China. These groups included refugees fleeing war‑ravaged regions, as well as outcast proletarians from newly affluent provinces. Notably, Ming loyalists escaping execution after the Manchurian conquest and the founding of the Qing Dynasty (1644-1912) contributed to Vietnam’s migratory mosaic. This is reflected in the prevalence of Chinese surnames among the Kinh majority.
In the twenty‑first century, Vietnam continues to receive immigrants from its northern border with China, including economically disadvantaged laborers and so‑called technical workers, many of whom, critics argue, form a Chinese fifth column after overstaying their visas. Regardless of origin, many Chinese emigrants from inland provinces along the northern frontier have, over time, come to identify as Vietnamese. Since the 1990s, over one million new migrants from mainland China have settled permanently in Vietnam, often through marriage into Vietnamese families, a trend well documented at annual gatherings of Chinese expatriates.
The formation of the Kinh majority was shaped not only by immigration but also by domestic emigration. Hanoi, much like Shanghai, underwent significant demographic shifts as its original residents relocated – some moving south during the great migration of 1953-54, others departing overseas after the Vietnam War ended on April 30, 1975. As middle‑class urban dwellers left in search of opportunities abroad, their absence was gradually filled by incoming villagers, who arrived as new migrant laborers to occupy the growing vacancies in the city.
Taken together, these demographic shifts reveal that modern Vietnamese identity, and the Vietnamese language, cannot be traced solely to Mon‑Khmer origins. Instead, contemporary Vietnam reflects a complex mosaic of ancestry. Its citizens are primarily of mixed Chinese descent, tracing back to the ancestral Yue of Zhou Dynasty vassal states and the Yue‑influenced Han of the Chu region more than 2,100 years ago. They also carry genetic contributions from native Mon, Chamic, and Khmer populations from the twelfth century onward, along with more recent admixtures, such as Euro‑Asian children born to American servicemen during the Vietnam War (1965-1975), which added over 50,000 individuals to South Vietnam’s population of 20 million by 1975. This extensive intermingling underscores the profound racial mixing that defines Vietnam.
Linguistically, Austroasiatic theorists have pointed to Mon‑Khmer basic words in Vietnamese as evidence for their theory. For example, their numerical presence in the range from one to five does not align with Vietnamese counting from six to ten, and they bear no genetic relationship to the core vocabulary. Like any living language, Vietnamese has absorbed a wide range of loanwords over time, including those from Daic, Thai, and Malay, as well as English and French, alongside contributions from the Austroasiatic family.
Statistically, the rate of foreign lexical infiltration in Vietnamese remains modest. Even the decade of active American presence during the Vietnam War failed to significantly reshape the language, leaving only a small set of persistent English terms – hello, okay, bye‑bye, number‑one, one‑two‑three, snack‑bar, cowboy, (bus)boy, hippy, and jeep – in stark contrast to the enduring Sinitic influence.
The situation became somewhat farcical when certain French institutions sponsored Vietnamese scholars to publish works on French influence in Vietnamese, including one that argued for a French origin of select Vietnamese words (Cao Xuân‑Hạo, 2001). Had the French colonial presence in Annam lasted longer, it is conceivable that roughly 400 French loanwords might have entered mainstream usage. By proportion, French loanwords, remnants of the 96‑year colonial legacy ending on July 20, 1954, number several hundred in Vietnamese. Common terms in some Vietnamese circles, such as moi (I), toi (you), monsieur (mister), madame (madam), and various modern grammatical constructions, do not reflect a deep‑rooted etymological bond.
This stands in contrast to entrenched Chinese pronunciations in Vietnamese, such as anh (兄 xiōng, SV huynh, 'brother'), em (俺 ǎn, SV am, 'younger sibling'), chị (姊 zǐ, SV tỷ, 'sister'), cô (故 gū, SV cô, 'miss'), and mẹ (母 mǔ, SV mẫu, 'mother'), including the many modern Chinese loans that remain popular today, such as bảotrọng (保重 bǎozhòng, 'take care'), đảmbảo (擔保 dànbǎo, 'guarantee'), thịphạm (示范 shìfàn, 'demonstrate'), đạocụ (道具 dàojù, 'prop set'), and giaođãi (交待 jiāodài, 'to brief').
II) Yue Ancestry and the Vietnamese identity
Anthropologically, in addressing the origin of Vietnamese etymology, the author advances an independent argument grounded in data analysis to counter the claims put forth by the Austroasiatic linguistic camp, which he regards as having introduced a distracting agenda into the debate. Advocates of this camp approach the issue from a southern geospheric perspective, focusing on regions where the Austroasiatic boundary intersects with the Austronesian substratum, particularly among Chamic populations in the Indo‑Chinese peninsula, and extending across the archipelagos of Malaysia and Indonesia, the western islands of the Philippines, and Taiwan, formerly known as Formosa.
Why did Austroasiatic theorists group the Vietnamese language into the Mon‑Khmer branch in the first place? The hypothesis took root largely because Mon‑Khmer populations dominated the Indo‑Chinese peninsula and permeated deeply into local demographics. Additionally, this hypothesis emerged during the "gold rush" era of historical linguistics in the late nineteenth century, when Western linguists were yet to hear of the Yue people and their linguistic legacy. By contrast, Mon‑Khmer speakers in Southeast Asia resonated with the grandeur of the ancient Khmer Empire, a past that captured admiration and envy. This led to the creation of the Viet‑Muong subdivision within the Austroasiatic Mon‑Khmer subfamily as scholars sought connections among these groups.
In response, the author firmly establishes the theory that the Vietnamese people descend primarily from ancient Yue ancestry in southern China, having intermixed with Han settlers during the millennium of Chinese domination following 111 B.C. As the Annamese polity expanded southward into what is now central Vietnam, further admixture occurred with Austronesian Chamic and Austroasiatic Mon‑Khmer populations. Consequently, the modern Vietnamese population reflects a racially composite lineage shaped by centuries of migration, integration, and cultural synthesis.
This position stands as the author's anti‑thesis to the Austroasiatic Mon‑Khmer theorists, who argued that the Sinicization of indigenous Mon‑Khmer people in ancient Annam was the true process that produced Vietnamese identity. That viewpoint largely ignored the recorded history of the Yue, considered ancestors of early Annamese populations, who had advanced further south and bridged the anthropological gap leading to modern Vietnamese fusion. According to the Austroasiatic camp, the intermingling of Mon‑Khmer groups with Chinese resettlers during the colonial period was the origin of the Vietnamese. They claimed that Mon‑Khmer peoples from the Indo‑Chinese peninsula were the direct ancestors of modern Vietnamese. Crucially, the 'Vietnamization of the Mon‑Khmer' factors seemed overlooked, possibly because the timeframe of when Mon‑Khmer groups purportedly arrived in the Red River Basin, already inhabited by Daic populations, remains vague.
Archaeological findings in central Vietnam further affirm that the inhabitants prior to these migrations bore no ancestral connection to the Vietnamese. Historically, early Vietnamese emigrants ventured into the southern Indo‑Chinese peninsula only after the twelfth century, where they first mixed with the Chamic people. This mixing was facilitated by the concession of two Chamic prefectures to the Trần Dynasty through royal interracial marriage between the King of Champa and the Vietnamese princess Huyềntrân Côngchúa. That is how late Vietnamese communities appear along the central coastline and southwestern regions.
Intriguingly, the Austroasiatic hypothesis aligned neatly with domains historically attributed to the Yue in ancient Chinese annals, a coincidence that blurred distinctions between Yue and Austroasiatic entities. Austroasiatic Mon‑Khmer theorists discreetly adopted this notion while sidestepping the complexities of Sinitic‑Vietnamese linguistics. It was simpler to identify a set of basic words shared by Mon‑Khmer and Vietnamese and then draw conclusions about shared roots, rather than confronting more intricate etymological challenges.
For many Western‑educated scholars, it proved formidable to delve deeply into ancient Chinese classics to uncover the etymological roots of Vietnamese. While their expertise often excelled in proto‑Chinese, Old Chinese, and Middle Chinese phonology, this approach fell short in the case of Vietnamese, both historically and in contemporary studies.
It was not until the early twentieth century that Sinology became an established discipline, and even then, few scholars could confidently substantiate the connection between Sinology and Vietic roots. Renowned linguists such as De Lacouperie, Maspero, Haudricourt, Shafer, Forrest, and Karlgren were among the select few whose work pointed to Sinology as a vital key for understanding Vietnamese etymology. Without deep knowledge of Chinese language and history, no one could reliably offer a comprehensive view of Vietnamese linguistic origin.
Despite these competing frameworks, the broader picture can be synthesized by integrating the perspectives of Yue and Austroasiatic Mon‑Khmer into one concept, the Bod (Terrien De Lacouperie, 1887). It is conceivable that Indo‑European theorists deliberately substituted the term Yue with Austroasiatic in order to reframe aboriginal Yue entities along a continuum that aligned with established historical linguistic models. This interpretive shift, whether intentional or methodological, echoes earlier typological depictions found in the works of T. D. Lacouperie (1887) and R. A. D. Forrest (1948).
Geographically, by substituting the terminology Austroasiatic with the Yue (Bod or BáchViệt), the author traces the movements of early indigenous Yue emigrants – LuoYue (雒越), OuYue (歐越) or Xi'Ou (西甌), and MinYue (閩越) or Dong'Ou (東甌), as well as racially mixed groups like the Qin‑modified Shu (巴蜀 BaShǔ, BaThục), Yue‑modified Chu (楚 Chǔ, Sở), Yue‑modified Han (漢), Hakka (客家 Kèjiā, Cácchú), Hokkien, Hainanese, Cantonese – from China South to northern Vietnam and across Southeast Asia. These groups advanced southward, resettled, and intermingled with native inhabitants along their journey, and in the case of Vietnam, fused with Chamic and Mon‑Khmer peoples.
In a sense, this process is encapsulated in the official name Việtnam, first appearing in 1802. This designation can also be read as a reverse form of NamViệt, meaning 'the Việt of the South', often misinterpreted as 'to surpass in the south' or 'advance southward'. Such connotations highlight the migratory pattern of the ancestral Yue, whose emigration from China South became more pronounced around 300 B.C. in response to Qin expansion (Lu Shih Peng, 1964).
Figure 1 - Map of the historical ancient proto-Chinese migratory routes
Source: Multiple sources on the internet
III) Southward migration and Yue ethnogenesis
The author’s perspective on the southward geo‑spherical migration of the Yue, originating from a northern axis and radiating toward the southern hemisphere, can be expanded without invoking competing theories regarding Austronesian origins. Austronesian dispersal spans the eastern hemisphere over a timeline of 3,000 to 4,000 years, as supported by historical records. (1) This framework aligns with archaeological evidence indicating that the Yue were not the exclusive creators of bronze drums. Such artifacts have also been unearthed in the Shu State (蜀國) of Sichuan and across parts of Indonesia. In these regions, Austronesian interpretations have informed alternative hypotheses, including the Austro‑Thai theory. Fundamentally, all southern migratory trajectories appear to originate from northern sources.
Practically speaking, the Austroasiatic hypothesis overlooks alternative perspectives on proto‑Yue presence, which extended as far northeast as the Yangtze River and up to the Yellow River basin. For example, proto‑Yue groups were present in the ancient Lu State (魯國) within Shandong Province (山東), as suggested by the broader ethnological framework of the Taic‑Yue stock originating from the Chu State (楚國) near present‑day Hubei (湖北) and Anhui (安徽). Vietnamese legends also recount that their earliest ancestors emerged from the Dongtinghu Lake area (洞庭湖) in Hunan Province (湖南), south of Hubei. Together, these regions form a contiguous zone representing the racial principality of the Taic stock.
The author’s frameworks for both Yue and Austroasiatic theories are synchronized with ancient Chinese legends and history. Tribes of the Taic‑Yue spread eastward and westward, contributing to the racial composition of the pre‑Qin (先秦) era. Evidence includes early human fossils discovered in ancient Sichuan, where the Bashu State (巴蜀) was once located. These tribes collectively introduced new cultural elements to the pre‑Han (前漢) populace, with differences marked by evolving names. Notably, the first monarch of the Han Dynasty, Liu Bang, along with his generals and followers, were originally subjects of Chu (楚). Had the last Duke of Chu, Xiang Yu (項羽), defeated Liu Bang in the decisive battle, the dynasty might well have been named Chu rather than Han.
After the Han forces defeated Chu, subjects within the Han Empire’s periphery gradually came to identify as Han people (漢人 Hànrén), a process that took considerable time. This marked the emergence of the Chinese Han from a racially mixed population composed of pre‑Han peoples and Taic‑Yue descendants. These included groups from six ancient states conquered and unified under Qin rule in 221 B.C. The racial composition of Chu subjects primarily consisted of Taic‑Yue descendants, who in turn gave rise to the Southern Yue tribes (百越 BǎiYuè, SV BáchViệt, 'Bod') through historical stages spanning the Zhou, Qin, and Han periods. (2)
In essence, Vietnamese ethnogenesis reflects a layered process: rooted in ancient Yue ancestry from southern China, subsequently intermixed with Han settlers during a millennium of Chinese rule beginning in 111 B.C. As the Annamese advanced into central and southern Vietnam, further admixture occurred with Austronesian Chamic and Austroasiatic Mon‑Khmer populations. The result is a modern Vietnamese demographic profile shaped by centuries of migration, integration, and cultural synthesis.
The demographic evolution of ancient Annamese populations initially paralleled that of other Southern Yue‑descended groups, including the Cantonese (粵), Fukienese (閩越, 'Hokkien'), and WuYue (吳越). Yet this resemblance proved short‑lived. The Vietnamese trajectory diverged under prolonged Chinese domination, spanning from 235 B.C. to 939 A.D., punctuated only by brief episodes of autonomy. Following the twelfth century, the emergent Annamese polity began a sustained southward expansion beyond the 16th parallel, gradually consolidating its territorial reach over the next 1,080 years. This arc culminated in 1989, when Vietnam withdrew from Cambodia (formerly Kampuchea) and restored its pre‑1979 borders.
Figure 2 - The distribution of indigenous languages before the Vietnamese
Map of the Austroasiatic languages per the Austroasiatic view
Source: Multiple sources on the internet
The nature of a people's mother tongue, as commonly perceived, often reflects their racial composition, and vice versa. The Austroasiatic Mon-Khmer hypothesis for Vietnamese appears to align with this notion. A playful way to frame this theory is to liken the Vietnamese language to the product of a "forced marriage" between Mon-Khmer and Chinese influences. From an anthropological standpoint, the prolonged colonization of early Annamese populations might reflect a dynamic of role reversals: the "guests" (early Kinh settlers) ultimately became the new sovereign majority, while the indigenous natives assumed subordinate roles in their own land, newly annexed into a foreign state.
As life progressed in the resettlement, separate from mainland China, let us envision a "what-if" scenario. Imagine a family of new homeowners moving into a residence previously inhabited by others. While settling in, the new occupants discover cultural artifacts buried on the property. The head of the household could easily claim ownership of the artifacts, but it would be dishonest to present them as ancestral heirlooms, treasures passed down by their forebears. Meanwhile, their descendants adopt new surnames, such as Phạm or Trần, except for cases of Chamic or Khmer heritage, marked by surnames like Chế or Thạch. This illustrates how the Vietnamese identity absorbed not only Chinese surnames from a broader set of Chinese-origin names but also names rooted in Chamic or Khmer lineage.
Linguistically, a nation's language does not always reflect the tongue spoken by its ancestors. Analogous phenomena exist worldwide: for instance, modern French is distinct from the Gaulish language of ancient France, and people in former French colonies like Morocco or Haiti continue to speak French, albeit with distinctive local accents. For the Austroasiatic view, rooted in the heritability of language based on racial identity, to hold water, Vietnamese speakers would need to be "racially pure" Mon-Khmer, or at least comparable to the Muong linguistic stock. However, this does not seem to align with the evidence, just as Cantonese and Fukienese remain grouped within Chinese dialectology despite their divergence. To enforce such a standard would risk undermining broader notions of national identity, particularly for larger nations such as China.
It is also worth recalling that modern Vietnamese as a fully formed language did not emerge until after Vietnam gained independence from China in the 10th century
IV) Contact, cognates, and competing frameworks
Etymologically, the commonalities in certain basic words can be explained as the result of linguistic contact. Vocabulary from one Mon‑Khmer language spilled into Mường subdialects, which in turn influenced Vietnamese speech. This was facilitated by geographical proximity, particularly in mountainous regions further south, where aboriginal populations retreated in the face of Chinese occupation and Sinicization. Even though mutual intelligibility between Việt‑Mường and Mường waned long after their split, Mường speakers have remained anthropologically and culturally connected to the Kinh as neighboring kin.
Additionally, shared basic words spread between Vietnamese and Mon‑Khmer languages through everyday activities: trade, bartering, agricultural exchanges, handicraft production, and shared farming practices. In other words, while the Kinh collaborated with Chinese occupiers, they also maintained ties with other diasporas within their territory. This interaction bridged linguistic gaps between Vietnamese and Mon‑Khmer. Such encounters trace back to prehistoric times, beginning with the first wave of Mon‑Khmer speakers moving into the Red River Delta from southwestern Lower Laos (Nguyễn Ngọc‑San 1993: 43).
Methodologically, Austroasiatic linguists grouped related basic etyma spanning many Mon‑Khmer languages into a broad spectrum of mixed elements. Yet some of the Mon‑Khmer basic words in Vietnamese also have cognates in Chinese and Sino‑Tibetan languages, here referred to as Sinitic‑Vietnamese. Many fundamental etyma in Vietnamese reveal roots in Yue‑related languages such as Cantonese, Teochew, Hainanese, and Fukienese, as well as Sino‑Tibetan sources, further complicating the Austroasiatic hypothesis. Austroasiatic theorists appear to have grouped these elements under the Mon‑Khmer umbrella without addressing their potential origin elsewhere, while the pervasive influence of Khmer often served as an unchallenged foundation for their claims.
The Austroasiatic theory, with its Mon‑Khmer subfamily as focal point, also engaged in dismissing the older Sino‑Tibetan theory, which posits an alternative root for Vietnamese. The issue of linguistic affiliation thus involves not only Austroasiatic Mon‑Khmer versus Yue, but also Sinitic‑Vietnamese versus Sino‑Tibetan frameworks. This dynamic is further complicated by the vast number of Sino‑Tibetan cognates in Sinitic‑Vietnamese and the unique linguistic features shared between Vietnamese and Chinese. While it may be simpler to accept that ancient Annamese developed from a Yue foundation layered upon a Taic base, the claim that certain basic lexicons in Việt‑Mường subdialects could be loanwords from neighboring Mon‑Khmer languages aligns with the understanding that these languages formed part of a broader family spanning southern China centuries ago.
In any case, whether or not Vietnamese belongs to the Sino‑Tibetan family, Austroasiatic Mon‑Khmer theorists remain focused on genetic classification, proposing that Austroasiatic Mon‑Khmer is the mother language that gave rise to Vietnamese. Meanwhile, the Sino‑Tibetan camp highlights the Sinitic affinity of Vietnamese, tracing its historical foundations back approximately 3,000 years — a timeline notably absent in the prehistoric Austroasiatic Mon‑Khmer framework.
Regarding the timeframe in historical linguistics and their affiliations, Merritt Ruhlen, in The Origin of Language (1994 [1944]), quotes Hans Henrich Hock:
"We can never prove that two given languages are not related. It is always conceivable that they are in fact related, but that the relationship is of such an ancient date that millennia of divergent linguistic changes have completely obscured the original relationship.
Ultimately, this issue is tied up with the question of whether there was a single or a multiple origin of Language (writ large). And this question can be answered only in terms of unverifiable speculations, given the fact that even the added time depth provided by reconstruction, our knowledge of the history of human languages does not extend beyond ca. 5,000 B.C, a small 'slice' indeed out of the long prehistory of language." (Hock 1986:566).
V) Framing the Sino‑Tibetan perspective
In his work, the author is this article explores both dimensions of inquiry, genetic affinity and historical settings , and demonstrates how more than 400 fundamental lexical items in Vietnamese align with Sino‑Tibetan etymologies. These cognates are reinforced by elaboration on Chinese linguistic peculiarities embedded in Vietnamese, substantiating the core argument of the Sino‑Tibetan theory advanced in this study.
The author establishes new methodological foundations for approaching the Sino‑Tibetan Sinitic‑Vietnamese framework. In contrast to the accelerated information‑gathering capabilities of the Artificial Intelligence era, his research originates in the pre‑internet age and is grounded in traditional scholarship. Findings were documented the old‑fashioned way: through direct engagement with printed books, hundreds of them, examined page by page and line by line. Each insight was manually recorded on index cards, extracted from a vast corpus of publications. As of 2025, only about one third of these titles have been entered into the bibliography, with compilation still ongoing, a time‑consuming but meticulous task.
Over more than two decades, the author accumulated over 20,000 research notes by the year 2000, just as the internet‑information era reached full momentum. These notes were not merely archived but deeply internalized, forming a durable cognitive framework that continues to shape his analytical process. Rarely needing to revisit the original cards, he has mentally constructed from this foundation a comprehensive perspective on the structural and semantic essence of Vietnamese etymology, recovering a linguistic heritage long receded from scholarly view.
With this substantial body of evidence, he now systematically assembles corroborative data to support the argument that the majority of cited Sino‑Tibetan Sinitic‑Vietnamese lexical items can be traced to at least one cognate in Chinese, thereby reinforcing historical and linguistic continuity between the two traditions.
Digitally, the author has entered the electronic era and continues to advance the project through incremental releases on his website, prioritizing mobile accessibility and modular presentation. This format ensures broad public access while preserving editorial clarity and semantic precision, allowing the research to scale without compromising methodological rigor.
The overarching goal remains eventual publication in mainstream print, aimed at advancing a thesis that invites renewed linguistic inquiry into the Sino‑Tibetan continuum. Central to this thesis is the proposition that modern Sinitic lects emerged as branches shaped primarily by the fusion of Taic‑Yue substrata with proto‑Tibetan, culminating in proto‑Chinese and the layered pre‑Qin–Han linguistic strata across ancient China. By analogy, the integration of Yue elements into the Annamese sphere during the Qin and Han periods likely contributed to the genesis of early Vietnamese, crystallizing around the tenth century.
Perceptually, regardless of the final contours of this narrative, the author asserts that languages must be approached as holistic, living systems, understood in their full contemporary complexity rather than reduced to their earliest reconstructible stages, whether 3,000 or 5,000 years prior. This view parallels our treatment of English: not merely as an Indo‑European relic, but as a dynamic amalgam of Anglo‑Saxon, Germanic, Norman, Romance, Latin, Greek, and other influences that have shaped its present form.
Methodically, this paper’s arrangement is designed to engage both novices in Vietnamese historical linguistics and specialists in Chinese and Vietnamese philology. Learners of Vietnamese with a solid understanding of Mandarin (M), known in Vietnamese as Quanthoại (QT, sometimes referred to as tiếng Quanhoả or 官話 Guanhua), and now officially named 'Putonghua' (普通話), alongside a foundational grasp of historical linguistics, will find this study particularly enriching. While certain explanations may seem overly detailed or repetitive to highlight widely recognized points, already familiar to experts, or occasional gaps might challenge general readers, these choices aim to strike a balance that caters to diverse audiences. Introductory resources on historical linguistics are provided in the bibliography. (3)
This article seeks to establish rapport with readers and clarify the author's perspective ahead of the upcoming articles of the study, forging an academic connection with those seeking insight into the linguistic origins of Vietnamese. It does not present itself as a formal scientific paper replete with data tables and statistical modeling; rather, it offers a narrative exploration of the language, framed within the concept of the Vietnamese subtitle Ýthức Mới Về Nguồngốc TiếngViệt – its nuances carries the sense of "New reconciliation of theories on the origin of Vietnamese", to say the least.
VI) Toward a deeper engagement with Sinitic‑Vietnamese etymology
For readers who remain skeptical yet intrigued by the etymological ties between Vietnamese and Chinese, it may be worthwhile to await the full book publication of this research. Printed works often invite deeper engagement and a more sustained openness to complex arguments. It is unlikely that the full contours of this inquiry will be absorbed in an online format. In print, however, the material may be approached with greater impartiality, supported by quotable evidence, contingent, of course, on the author’s success in securing a reputable publisher.
Human nature tends to favor ideas that resonate with instinctive beliefs. To fully appreciate the insights offered here, readers ideally should possess a foundational understanding of the historical interplay between Vietnam and China. That said, newcomers are welcome, provided they bring sincere curiosity to the subject. At its core, this research is a retelling of history, a compelling linguistic and cultural narrative that may captivate both seasoned scholars and engaged lay readers alike.
Shared interests build trust and belief, akin to the solidarity of believers in the same tradition. A new theory typically begins with foundational premises, facts, quotations, supportive evidence, rules, paradigms, analogies, and logic. From these, we adopt a shared perspective, accepting them as a basis for further discussion. For instance, if we propose that 雞 (jī, SV kê) = VS gà ('chicken') alongside 蛋 (dàn, SV đản) = VS trứng ('egg') are cognates of indisputable origin, both likely stemming from Yue roots long predating Chinese, then proof need not be belabored. Once accepted as premises, our focus shifts to examining whether the bird originated from the south or the north, or even to the age‑old question of whether the egg predates the chicken.
From a historical linguistics standpoint, language contact between two groups typically results in the dominant language assimilating the less dominant one over time. The details hinge upon factors such as military prowess, population size, and cultural sophistication. For example, the language of a conquering population, following an extended period of bilingualism, ultimately becomes adopted by the subjugated group (Roberts J. Jeffers et al. 1979: 142).
In the Sinitic‑Vietnamese context, the process of Sinicization in Annam spanned centuries. The author has no intention of debating detractors who contest his arguments, nor of recruiting followers among those resistant to the Sinitic‑Vietnamese theory. Historical linguistics does not yield absolute truths; it is shaped by interpretation and perspective. Beliefs are often deeply entrenched, guided by instinct or predisposition. Predictable reactions such as "Chinazi propagandist!", "Wikipedia sources are unreliable!", "Bogus!", etc., are part of the discourse in the AI era. To such readers, the author encourages disengagement rather than discord.
How did the author arrive at this juncture in his etymological exploration? Admittedly, he is not formally trained as a historical linguist specializing in Vietnamese Linguistics. Yet his journey began with exposure to foundational linguistics courses taught by three towering figures: Professors Nguyễn Tài Cẩn, Hoàng Tuệ, and Bùi Khánh Thế, renowned scholars at the former Saigon University in the late 1970s. What began as academic curiosity soon evolved into a lifelong devotion to the study of Vietnamese etymology and its Sinitic underpinnings.
The author vividly recalls his first assigned project under Professor Hoàng Tuệ: an inquiry into the term tiếng ('sound') in Vietnamese This deceptively simple word encapsulates a constellation of meanings: sound, morpheme, syllable, word, and language. His comparative research into its Chinese counterpart 聲 (shēng, SV thanh) proved transformative. The semantic breadth of 'shēng', especially its appearance in expressions like 蠻聲 (Mánshēng), referring to tiếngMôn in the Shaozhou Tuhua (韶州土話) dialects of Guangdong, Hunan, and Guangxi, revealed profound linguistic resonance across cultural boundaries.
For the author, tiếng and 聲 have become the Đạo (道 Dào, 'the Way') through which the vaults of Sinitic‑Vietnamese etymology are unlocked. This guiding principle has propelled him to examine other Chinese characters whose meanings stretch far beyond conventional semantic domains. Such is the enchantment of language: a system at once rigid and fluid, historical and living.
VII) Advocating the Sinitic hypothesis
Why does the author advocate so confidently for a Sinitic hypothesis while remaining skeptical of Austroasiatic models? The interplay between Chinese and Vietnamese is intricate, and he approaches it with both scholarly rigor and tongue‑in‑cheek candor , a hypothesis rooted in long observation and lived experience. Few scholars are willing to wade deeply into these debates, which often resist definitive resolution. Linguistic affiliation theories modeled on Indo‑European paradigms tend to falter when applied to Austroasiatic contexts. Dissenters from orthodoxy are sometimes dismissed as “uninformed,” yet linguistics is a field where science, history, and human insight converge. Progress often comes from those bold enough to challenge prevailing narratives.
Fueled by enduring fascination, the author has spent decades immersed in self‑directed study of Vietnamese and Chinese historical linguistics. His efforts culminated in the painstaking construction of an online dictionary of Nôm words of Chinese origin, an annotated repository of Sinitic‑Vietnamese etyma, built one entry at a time.
Over the past thirty years, his exposure to Chinese has deepened through both scholarship and personal life. His mastery of modern Mandarin (Putonghua) has been shaped by daily conversations with his Chinese‑native wife, extensive reading of Chinese literature, and regular consumption of Chinese media, from satellite broadcasts to contemporary dramas. This sustained immersion sharpened his insights into the etymological ties between Chinese and Vietnamese. What captivates him most is the striking proximity, beyond mere lexical overlap, between modern Mandarin expressions and their Vietnamese counterparts in everyday usage. These parallels, observed in colloquial speech and sitcoms, reinforce his conviction that the linguistic bond between the two languages runs deeper than traditional Chinese linguistics often acknowledges. This affinity also surfaces in classical Chinese novels dating back to the twelfth century, suggesting a long‑standing intertextual and intercultural dialogue.
The author believes that any Vietnamese scholar fluent in modern Mandarin and equipped with an etymological lens would recognize the validity of this perspective. Yet he cautions against reducing Vietnamese to a mere Yue‑descendant variant, akin to Cantonese, Fukienese, Zhuang, or Daic. These languages, shaped by centuries of Chinese rule, have undergone extensive Sinicization, often subsumed within the Sino‑Tibetan classification, especially the modern Kadai‑Daic lects. Vietnamese, however, resists such simplification. Its historical trajectory and linguistic architecture demand a more nuanced and independent recognition.
Admittedly, this endeavor is not without monotony. The author often wonders why he has committed so deeply to this pursuit. There is no material reward awaiting him. Who, after all, truly cares whether a Chinese etymon is of Yue origin, or vice versa? Regardless of the outcome, Vietnamese will likely continue to be classified under either the Austroasiatic or Sino‑Tibetan family. Yet, as long as he retains the energy and passion to press forward, he insists: let us continue this journey together, until the day he can no longer do so.
Like a pilgrim in search of sacred revelations, the author approaches this etymological journey with wonder and resolve. Each discovery, whether breakthrough or setback, deepens his understanding of the Vietnamese linguistic landscape. Years of exploring Chinese historical linguistics have fueled his curiosity about China’s linguistic past. This experience, akin to the fascination English learners feel when delving into Greek, Latin, and Romance languages, has broadened his grasp of Vietnamese etymology while enriching his knowledge of Chinese itself.
In earlier stages of his research, he accepted the prevailing view among Vietnamese specialists that emphasized the Mon‑Khmer connection. That perspective, however, belongs to the past. With time, experience, and sustained inquiry, he cultivated a more nuanced understanding. As he delved deeper, he observed that Vietnamese shares more linguistic commonalities with Sinitic languages than Sino‑Tibetan languages do within their own family. These parallels extend beyond basic vocabulary into expressions and structural features that historical linguists use to establish genetic affiliations.
The Austroasiatic school focuses primarily on shared elements between Vietnamese and Mon‑Khmer languages. Yet it overlooks key findings in Sino‑Tibetan etymology, particularly basic words that appear consistently in both Chinese and Vietnamese over time, which Austroasiatic theorists often attribute to Mon‑Khmer origins. In light of the many Vietnamese words that align etymologically with Sino‑Tibetan, the proposed Mon‑Khmer connections lack essential linguistic features — including disyllabicity and tonality, hallmarks shared between Vietnamese and Chinese. These traits overwhelmingly outweigh the evidence presented by Mon‑Khmer proponents within the Austroasiatic camp.
If Vietnamese linguistic characteristics were systematically tabulated and compared in detail alongside those of Chinese historical linguistics, it would become evident that Vietnamese is, in many respects, a modified form of Chinese. This conviction has driven the author to sort through these complexities and compile this work over more than twenty‑five years. The Sinitic‑Vietnamese theory he proposes is not based solely on a comparative list of over 400 fundamental cognates with Sino‑Tibetan etymologies, elaborated in the chapter addressing Sino‑Tibetan. It is also supported by extensive evidence from anthropology, archaeology, and historical records.
Due to the scarcity of historical documentation, Austroasiatic specialists are often left to speculate strictly based on linguistic sound change rules. Consequently, their focus has shifted toward comparative analyses of Mon-Khmer basic words and many of which have gradually been reclassified as belonging to other linguistic families. In the absence of conclusive linguistic proof, they have sometimes redirected their attention to neighboring languages, a tendency that has undermined the validity of their arguments concerning Vietnamese-Chinese etymological connections.
Consider, for example, the Vietnamese word "vịt" ('duck'). It lacks cognates in Mon-Khmer languages. To address this, Austroasiatic scholars have proposed a connection to the Thai word เป็ด /pĕd/, despite knowing that Thai descends from the Daic languages, which in turn originate from the Taic family, that is, the same lineage that gave rise to the Yue languages, ancestors of the ancient Viet-Muong language!
If, however, we hypothesize an etymological link between "vịt" and Chinese 鴨 yā (SV áp), historical records may offer supporting evidence. Dong Zuobin (董作賓, 1933), in Discussing Tan (《譠》), p. 162, references a location in the Tan State of the Shang Dynasty (present-day Shandong Province) called 武原城 Wǔyuánchéng (Vietnamese: Thành Vũnguyên). Locals referred to it as 鵝鴨城 Éyāchéng (Vietnamese: Thành Nganvịt, literally 'Citadel of Ducks and Geese'), likely due to phonetic resemblance in their dialect at the time.
This historical dimension of Sinitic etymology, exemplified by cases like "vịt" and "ngan" (also "ngỗng", both of which can be used to reconstruct words such as 源 yuán for "nguồn" and "dòng"), underscores the depth of the Chinese-Vietnamese linguistic connection, an area where the Austroasiatic Mon-Khmer hypothesis falls short.
Built upon the historical framework outlined above, and expanded in subsequent chapters, this research offers a comprehensive account of how the modern Vietnamese language evolved—both diachronically and synchronically, and how it relates to other Sinitic segments within the Sinitic sub-branch of the Sino-Tibetan linguistic family.
Another key contribution from the author is a clarification regarding the use of the term 'Sinitic', used here as a practical convention to denote elements associated with Chinese linguistic and cultural domains. As previously noted, 'Chinese' is not an ethno-religious designation like 'Jewish', but rather a cultural construct, akin to 'America'. From a linguistic standpoint, the concept of 'Chinese' as from 'China' is a unified polity only emerged following the establishment of the Qin Dynasty, when a variant of proto-Tibetan was layered atop preexisting Taic-Yue substrates. The language we now call Chinese was named after this political consolidation, and thus carries a distinct historical trajectory intertwined with the evolution of the Middle Kingdom.
The term "Sinitic", or "Chinese" in broader usage, derives from the unification of ancient states into the Qin Empire, an event comparable in scope to the formation of the European Union in modern times. During this period, vestiges of indigenous Taic-Yue linguistic elements permeated the emerging imperial lexicon, whether acknowledged by Chinese linguists and Sinologists or not. Southern Yue languages, including Cantonese, Fukienese, and Wu, have since been institutionally classified within the Sino-Tibetan linguistic family, often through official imperial decree. Lexicographically, 'Chinese' has come to encompass all these dialects as part of a unified linguistic identity (see Tang Lan, 1965, p. 184).
Had history unfolded differently, say, if the Chu state had triumphed over both Qin and Han in their decisive campaigns, China might today be known as "Chu". Historical records suggest that Chu was a Daic-Yue polity, likely of Taic origin, ancestral to the Daic and Zhuang peoples. Its population may have spoken a variant of an ancient 'Chu-nese' language, rather than the form now conventionally labeled 'Chinese'. Likewise, had the NamViệt Kingdom succeeded in overtaking the Han Empire, the dominant term might have become 'Việt', or /Jyut6/, or 'Yue' as pronounced in Mandarin (see Lu Shih-Peng, 1964; Bo Yang, 1983–93).
In essence, 'Chinese' is not a fixed ethno-racial identity, but a cultural construct confined within the evolving boundaries of the Chinese polity. Its name may shift with regimes, but its linguistic continuity transcends nomenclature. These hypotheticals underscore the contingency of naming: terminological conventions are shaped by historical victors and political consolidation, while the deeper linguistic substrate often endures across dynastic transitions. For instance, during the Manchu Qing dynasty (1644–1911), the polity was officially designated as "Qing", yet its linguistic and cultural core remained recognizably Chinese because the Manchurians were a part of it.
Consider further the hypothetical in which Imperial Japan had won World War II and rebranded the Middle Kingdom as "Đại ĐôngÁ" (Great East Asia). In such a scenario, the term 'Sinitic' might have been supplanted by an entirely different designation, perhaps 'not-X'. This thought experiment illustrates how linguistic nomenclature is often a product of convenience, necessity, and power, rather than intrinsic linguistic reality. When juxtaposed with Taic-Yue or Austroasiatic Mon-Khmer frameworks, such naming conventions can obscure deeper continuities. Ultimately, the linguistic essence transcends the labels imposed upon it.
Regarding the integrity of this survey, the author affirms that it is an original and human-authored work, especially in light of its digital format as an ongoing research project. AI serves only as a tool for final proofreading and surface-level editing. Without the author's creative intellect and scholarly commitment, this work would not, and could not, exist.
It is worth acknowledging the skepticism some readers express toward academic studies published exclusively online, often dismissing them as "bogus" or likely AI-generated. While digital formats offer undeniable advantages in accessibility and scalability, concerns about reliability and longevity remain valid. Online works are subject to constant revision, and their long-term availability is far from guaranteed. Over time, websites may vanish from search indexes due to inactivity, or disappear altogether when hosting services lapse or accounts go unpaid.
In this context, the author's decision to publish incrementally online reflects both necessity and intent: to share findings in real time while preserving the human voice behind the research. The enduring value of this work lies not in its format, but in the originality of its insights and the rigor of its methodology.
This document should be regarded as a prelude to the forthcoming printed edition. In the realm of linguistic inquiry, no conclusion is ever truly final, and this research is no exception, regardless of whether it ultimately appears in bound form. The author maintains that readers are generally less inclined to engage with an online publication in its entirety, as they might with a physical volume acquired at considerable expense.
In practice, the author approaches the reference works cited in the bibliography with similar reverence, though, as previously noted, the bibliography remains incomplete. Hundreds of titles, meticulously arranged across his personal bookshelves, are consulted with care and deliberation, forming the intellectual scaffolding upon which this study rests.
This paper adopts a nontraditional approach by not devoting an entire section to exhaustively listing all sound change rules, natural or conditioned, between Sinitic-Vietnamese and Chinese loanwords. Such comprehensive treatments, as exemplified in Nguyễn Tài Cẩn's studies of the Sino-Vietnamese sound system (1979, 2000, 2001), are often expected in research of this scope. Instead, readers will encounter a synopsis of phonological patterns illustrated through examples and concise commentary. The emphasis is placed on irregular or distinctive sound correspondences, such as the ¶ /y- ~ b-/ pattern: 由 "bởi" (because), 油 "béo" (greasy), 邮 "bưu" (post), 柚 "bưởi" (pomelo), 游 "bơi" (swim), all pronounced /yóu/ in Mandarin. Another example is 公母 (gōngmǔ), which corresponds to Vietnamese expressions such as "trốngmái" (male and female), "sốngmái" (life-or-death struggle), or "vợchồng" (husband and wife).
Should this work later prove to have academic value, specialists in specific fields, such as lexical data tabulation and categorization, can undertake the task of establishing possible sound change patterns and formulating their corresponding rules. This type of endeavor is extraordinarily detailed, if not inherently complex, given that frequency-dependent sound changes tend to occur in synchrony and are often irregular, rather than uniformly systematic as observed in, for example, Germanic languages. While such phenomena are not uncommon, these irregularities are particularly pronounced in the Sinitic-Vietnamese context.
Readers inclined to skim for illustrative examples may freely navigate between sections or pursue areas of personal interest. In doing so, they will encounter scattered yet thematically linked instances throughout the text. However, to fellow scholars, the author offers a word of caution: please avoid quoting passages out of context or drawing conclusions from isolated errors or incomplete datasets. Such imperfections are inevitable in a work still undergoing revision, and occasional typographic lapses may persist. Premature judgments, such as those the author has previously endured, often result in unwarranted criticism. One notable example involved an exploratory link between 將 ( jiāng) and Vietnamese "sẽ" ('will'), which was dismissed as "unreliable" and "bogus" by a linguistic forum due to a misalignment between "nướctương" 醬油 ( jiāngyóu , 'soy sauce') and "xìdầu" 豉油 ( chǐyóu , 'bean sauce'), an error stemming from careless data handling. In such cases, readers may be tempted to infer exceptional sound change rules, such as "jiāng" ~ "sẽ" via the speculative pattern ¶ /j- ~ s-/, /-iang ~ -Ø/. Yet a single misstep does not invalidate the broader inquiry into phonological correspondences.
High-profile etyma requiring detailed treatment may unavoidably occupy substantial space. Exceptional or anomalous cases often resist neat categorization and highlight why enumerating sound change rules can become unwieldy, sometimes warranting independent study. These irregularities, which do not generalize across similar phonological environments, demand careful deliberation. The goal is to equip readers to either interpret such subtleties through conventional linguistic frameworks or explore emergent patterns via unconventional heuristics. Ultimately, this endeavor underscores the speculative nature of historical phonology and the interpretive latitude inherent in linguistic reconstruction.
Rather than presenting exhaustive lists of mechanical sound change rules, often overlooked or unread, we will prioritize engaging case studies and targeted examples. These will illuminate the specific processes by which conclusions regarding Sinitic-Vietnamese etyma have been reached. By venturing beyond the well-trodden paths of frequently cited correspondences, readers are invited to navigate the complexities of sound change and cultivate the analytical tools necessary to extract and apply linguistic rules independently.
While regularity governs most phonological transformations, this research foregrounds examples involving Chinese lexemes in their diverse forms and phonetic variants, many of which have permeated the Vietnamese lexicon since antiquity. This linguistic infiltration spans multiple historical phases, notably the millennium following 111 B.C., when the Annamese region was under Chinese rule until its liberation in 939. During the Ming Dynasty's incursion in 1410, Mandarin briefly reemerged as the official language, playing a prominent role in diplomatic and administrative exchanges with the Chinese imperial court. (4)
Phonetically, there are instances where sound changes have given rise to multiple Vietnamese variants of a single etymon. Comparatively, similar cases can be observed in Japanese Kanji and Go-on readings for individual Chinese words. Take 道 dào ('way') as an example. In Vietnamese, we can identify several distinct "readings" that convey different concepts, interestingly, most of which correspond to the range of meanings found in the Chinese equivalents. For instance:
- 'đạo' (way, religion, sect, morals, skill, line),
- 'dạo' (time),
- 'đường' (road, line),
- 'nẽo' (path),
- 'nói' (speak),
- 'bảo' (tell),
- 'tưởng' (suppose), etc.,
Each of these Vietnamese words may seem like a translated version of the Chinese word, but this is not necessarily the case. Rather, each derived Sinitic-Vietnamese form is a variant that is cognate with the same Chinese etymon 道. This phenomenon would be easier to understand if the old Chinese-based Nôm characters were still widely used in Vietnamese writing. Unfortunately, this was not always the case, especially given that modern Putonghua syllables are shorter than their Middle Chinese counterparts.
The phonological change rules illustrated in this paper are neither exhaustive nor intended as definitive references. As this research remains a work in progress, it continues to undergo revision and refinement, with plans for a first print edition to reach select university campuses, ideally those with active communities of historical linguists. The methodologies presented here are exploratory and suggestive rather than conclusive, though their foundational principles remain consistent unless explicitly revised.
Given the evolving nature of this study, the demonstrated approaches should be understood as practical models—examples of how the author has applied two innovative etymological frameworks to generate preliminary results. Readers will observe the investigative process used to identify Vietnamese words of Chinese origin (Sinitic-Vietnamese [VS]) and, in turn, gain the tools to replicate this process with confidence and clarity.
These newly developed methodologies have proven effective in uncovering the etymology of Sinitic-Vietnamese words and in formulating tentative sound change rules, tracking transformations between forms, or identifying 'what changes into what.' For instance, this approach underpins the analysis of 道 dào, as previously discussed. Readers will have the opportunity to apply these techniques in later chapters, particularly through the worksheets provided in Chapter 13 . They will also encounter a curated selection of Sinitic-Vietnamese etyma, a small but meaningful subset of the broader findings presented in this research. (6)
Caution is warranted when interpreting loanwords among the examples presented. As a general rule, if a Vietnamese word closely resembles its Chinese counterpart in both phonological form and semantic meaning, it is likely a direct loan. Recognizing such cases is essential for distinguishing inherited etyma from later borrowings and for maintaining analytical precision throughout this study.
While the linguistic resemblance between Vietnamese and Chinese will be addressed in greater detail in later chapters, it is worth noting here that their structural and lexical affinities are significantly closer than those observed between Chinese and many other Sino-Tibetan languages. The term Sinitic-Vietnamese (VS), also referred to as HánNôm (漢喃), encompassing both Hán and Nôm strata, is used to denote either Vietnamese words of Chinese origin or cognates shared by both languages that descend from common ancestral roots. Examples include "sông" 江 (jiāng, 'river'), "ngà" (牙 yá, 'tusk'), and "dừa" 椰 (yé, 'coconut').
Among their shared linguistic features, beyond morphonological and semantic parallels, nearly every linguistic trait present in Chinese finds an equivalent in Sinitic-Vietnamese. These features are so deeply embedded in Vietnamese usage that they are often mistaken for indigenous Vietic words or regarded as 'pure' Vietnamese. Some are considered quasi-Sino-Vietnamese variants, especially those represented by Nôm characters incorporating Chinese components.
For the Sinitic-Vietnamese etyma investigated here and identified as having Chinese roots, such conclusions are based on holistic alignment with Chinese linguistic attributes. These include phonetic and morphemic structure, phonological and semantic traits, syntactic and lexical parallels, tonal systems, CVC syllabic architecture, and grammatical arrangements in sentence construction.
The closer a Vietnamese word resembles its Chinese counterpart, the more likely it is to be a loanword. However, this research also examines whether resemblance necessarily implies borrowing. For example, the Vietnamese "tếu" 'funny' may be hypothesized as a loan from 笑 xiào (SV "tiếu" 'laugh'), which is cognate with VS "cười". Alternatively, "tếu" may be cognate with 逗 dòu /tow⁴/ 'tease', SV "đậu" /ɗɐw⁶/, where the voiced /ɗ-/ reflects an older development and the unvoiced /t-/ a more recent one. This word may have been reintroduced into Middle Vietnamese via spoken Mandarin, likely during the Ming Dynasty. Readers may compare the contemporary usage of 逗 in Chinese with its appearance in classical literature such as Dream of the Red Chamber (紅樓夢 Hónglóumèng).
With the findings presented in the Sino-Tibetan chapter, including the genetic affinity demonstrated through shared linguistic peculiarities and cognates, it becomes increasingly plausible to reconsider Vietnamese as part of the Sino-Tibetan linguistic family. Such a reclassification could be achieved through the methodologies outlined in this research, which adopt broader and innovative approaches. These can be applied alongside existing tools from Chinese historical linguistics, offering insights into Vietnamese etymology across disciplines such as anthropology, archaeology, and history, particularly regarding the origins and biological composition of the Vietnamese people and their state. The underlying premise is that populations of shared racial ancestry tend to speak variant languages of common origin.
Throughout this paper, each etymon is accompanied by its corresponding Chinese character and pinyin (拼音) transcription to facilitate sound identification. In many cases, the pinyin alone suffices and may be less visually distracting than the character itself, especially when the character is constructed with "giảtá" (假借) or 'loangraph', which requires readers to decipher embedded phonetic codes. A loangraph refers to a Chinese character borrowed solely for its phonetic value and repurposed for a different concept. For example, the Vietnamese "lại" 來 (lái, 'come') may have originally been associated with "lúa" ('paddy, millet, grain'). If loangraphs were transcribed only in pinyin, they might resemble English homophones with divergent meanings such as 'yard', 'glass', 'page', and 'lie'.
Pinyin, the official romanization system of the People's Republic of China for transcribing Mandarin (普通話 pŭtōnghuà, 'national language'), has gained widespread global adoption, including in Taiwan, which began integrating it nearly three decades ago.
For accurate sound transcription, this study primarily employs the International Phonetic Alphabet (IPA). IPA symbols are used to represent dialectal and archaic pronunciations, as well as precise phonetic values, enclosed in square brackets ["xxx"], in contrast to approximate phonemic values indicated by slashes "/xxx/". This distinction helps clarify subtle phonetic nuances in the cited lexicons.
Examples include:
- 'dung' [juŋʷ1], [jowŋʷ1], [zʊŋʷ1] /zowng1/ (not [duŋ1])
- 'thìn' [t'ɨjn2], /tʰɤjn2/, /tʰɨn2/, /tʰejn2/ (not [thin2])
- 'thu '[t'ʊ1], /thow1/, /tʰʊ1/ (not precisely [thu:1] /thu:1/).
These distinctions are especially relevant in cases involving diphthongs, where comparative analysis depends on capturing fine phonemic variation. For instance:
-
'tin' [tin1], /tin1/, /tɪn1/ (not [tɤjn1] /tein1/) (5)
To streamline typographic presentation, phonetic symbols may be rendered in simplified forms such as [-ow-] and [-ejn], or alternatively as /-ou-/ and /-ein/, when the intended sound values are contextually clear and unambiguous. This convention will be applied consistently across other phonetic environments, with supplementary notes and examples provided throughout the text to ensure clarity and continuity.
In many instances, IPA transcriptions offer a more precise reflection of Vietnamese phonetic values, especially in relation to Chinese character correspondences – than conventional pinyin. For example:
-
Pinyin d aligns with [t]
-
Pinyin t corresponds to [tʰ] or /th/
-
Pinyin r maps to /j/
-
Pinyin gu and ku are phonetically realized as [ku] and [kʰu], not [gu] and [ku], respectively
This transcriptional approach parallels the methodology employed by Pulleyblank (1984) in his reconstruction of Old Chinese (OC), where he explored phonetic values ambiguously recorded in classical annals and inscriptions.
To avoid typographic clutter and potential confusion with IPA diacritics, tonal numerals (ranging from 1 to 9) will be appended to each phonetic form. These numerals indicate tonal categories across various Chinese dialects – such as Cantonese (Guangzhou), Fukienese (Hokkien, Fuzhou, Amoy), Teochew (Chaozhou), and Hainanese – as well as other regional languages including Daic, Thai, and Vietnamese. This system ensures both phonetic precision and cross-linguistic comparability.
Tonal numeral symbols are conventionally used in the transcription of Cantonese, Fukienese, and other Chinese dialects to indicate pitch contours and tonal categories. In the case of Vietnamese, tones are annotated following the traditional eight-tone framework – more precisely, a system of four tonal categories bifurcated into upper and lower registers. This structure is rooted in classical sources such as the Guǎngyùn 廣韻, Jerry Norman's Chinese (1988, p. 55), and foundational Vietnamese linguistic studies, notably Nguồn gốc và Quá trình Hình thành Cách đọc Âm Hán-Việt ("The Origin and Transformational Process of the Sino-Vietnamese Pronunciation") by Nguyễn Tài Cẩn (1979, 2001).
The tonal categories are as follows:
| 1. | , | 3. | ʔ | 5. | ´ | 7. |
´ -p, -t, -c, -ch |
| 2. | ` | 4. | ~ | 6. | . | 8. |
. -p, -t, -c, -ch |
The use of tonal numerals will be limited and reserved for cases where clarification is essential, particularly to prevent misinterpretation across Chinese dialects. Tonal values assigned to the same numerical markers often vary significantly between dialects. For example, Mandarin tones (1, 2, 3, 4) differ markedly from those in Cantonese (1, 2, 3, 4), as documented by Wang Li et al. (1953), and diverge further from Vietnamese tonal conventions.
To maintain clarity in Vietnamese phonetic transcription, modern diacritics will be the primary notation system, used alongside IPA symbols, for instance, [à], [ả], [ã], etc., except where such usage risks confusion with IPA phonetic values (e.g., nasalized /ã/). For precise tonal interpretation, readers may consult Quốcngữ diacritics for Vietnamese or Pinyin tone marks for Mandarin (e.g., ā, á, ǎ, à, a), both of which offer distinct tonal representations despite superficial visual overlap.
In select cases, tonal markings will be deliberately omitted. This reflects the author's view that tonal values in many Sino-Vietnamese and Sinitic-Vietnamese forms, like their Chinese dialectal counterparts, have undergone extensive historical shifts. These tonal evolutions, often cyclical and unpredictable, lack a universally reliable rule for reconstruction. In some instances, tones may even revert to their original contours as they existed at the time of lexical absorption into Vietnamese. Such tonal fluidity is well attested in Chinese historical phonology, alongside other systemic changes such as shifts in initial consonants and syllabic finals (see Chao Yuen-Ren, Tone and Intonation in Chinese, 1933, pp. 119–134).
Phonemically, Phonemically, Vietnamese initial and medial consonants exhibit a range of articulatory values that are not always transparently reflected in the orthography. For instance, the following correspondences are commonly observed:
- b- → [ɓ]
- d- → [ɗ]
- ch- → [ʨ]
- kh- → [kʰ]
- ph- → [pf]
- r- → [ʐ]
- th- → [tʰ]
- tr- → [ʈ]
- nh- → [ɲ], occasionally rendered as ɲ-, jn-, or nh- depending on typographic or contextual constraints.
Similarly, vowel clusters such as -uy and -iê are more accurately transcribed in IPA as [wej] and [iə], rather than [wi] and [ie], reflecting the true phonetic realization rather than the orthographic approximation. Vietnamese spelling conventions often obscure these distinctions, particularly in final consonant environments. (7)
To ensure clarity and consistency, final consonants will be transcribed using the following IPA representations:
- -p → [p]
- -t → [t]
- -ch → [jt]
- -c → [k]
- -nh → [jŋ]
In cases involving labiovelar articulation—especially when preceded by a rounded vowel (e.g., o-, ɔ-) or a glide medial (-w-), the following transcriptions will be used:
- [-kʷ] → -kw, -wk, or -kʷ
- [-ŋʷ] → -wŋ, -ŋw, or -ŋʷ
The velar nasal /ng/ will be rendered as either [ŋ] or [ng], contingent on its phonetic environment and the need for typographic clarity. These conventions will be applied systematically throughout the text to maintain phonological precision and editorial coherence.
Subsequent chapters elaborate on all preceding elements and extend each example through polysyllabic grouping across Chinese, pinyin, and Vietnamese. This includes:
- Detailed correspondences with Middle Chinese finals and tonal categories
- Chronologically layered borrowing trajectories
- Diagnostic markers of Yue substratal influence
- A comprehensive polysyllabic lexicon, indexed by Chinese characters, pinyin forms, and Vietnamese equivalents
The overarching objective is to produce a synthesis that is ready for publication – methodologically rigorous, typographically exact, and fully transparent in its analytical claims.
Conclusion
The long debate over Vietnam’s ethnolinguistic origins reveals that no single framework, whether Austroasiatic or Sino‑Tibetan, can fully account for the complexity of its ancestry. The Austroasiatic school rightly emphasizes the presence of Mon‑Khmer minorities and substratal vocabulary, but it underestimates the depth of Yue inheritance and the overwhelming Sinitic overlay that defines the modern lexicon. Conversely, the Sino‑Tibetan perspective highlights the dominance of Sinitic‑Vietnamese elements, yet risks obscuring the indigenous contributions of Chamic, Khmer, and earlier Taic populations.
What emerges instead is a composite picture: Vietnamese identity is the product of successive waves of migration, assimilation, and cultural layering. The Kinh majority descends largely from Sinicized Yue emigrants who intermingled with local populations in the Red River Delta, while later centuries brought Chamic, Khmer, and Teochew refugees into the fold. Archaeological evidence confirms that these groups contributed artifacts and lexicon, but the structural backbone of Vietnamese remained Yue‑derived and Sinitic‑integrated.
Culturally, Vietnam absorbed and reinterpreted traditions from its neighbors, from the duodenary zodiac cycle to ritual festivals, while maintaining a distinct national spirit forged in resistance to repeated Chinese incursions. The contrast between Chinese identity, defined as cultural rather than racial – and Vietnamese identity – defined by resilience, sovereignty, and the preservation of ancestral memory, underscores the divergent paths taken after the fall of NamViệt and the independence of Annam in 939 CE.
Thus, Vietnamese must be understood as a language and a people of dual heritage: Yue substratum and Sinitic overlay, enriched by Austroasiatic and Austronesian contact, and consolidated through centuries of political struggle. This layered ancestry explains both the shared features with Cantonese and other southern lects, and the unique divergences that mark Vietnamese as a distinct entity. In the end, Vietnam's national identity rests not on purity of origin but on the creative synthesis of multiple traditions, sustained by a collective will to preserve cultural integrity across millennia.
FOOTNOTES
(1)^ Western theories often overlook historical Yue linguistic and cultural facts, favoring new constructs over existing knowledge. Many Western scholars have hesitated to engage deeply with older historical sources, particularly those requiring proficiency in Chinese, leading them to invent frameworks from scratch rather than building on established research.
(2)^ "Bod" is just another name "Bak"as in 百姓 Baixing, 百越 BáchViệt or BaiYue as discussed by Lacouperie (Ibid., see Chapter 9) that "Bak was an ethnic and nothing else. We may refer as a proof to the similar name, rendered however by different symbols, which they gave to several of their early capitals, PUK, POK, PAK, all names known to us after ages, and of which the similarity with Pak, Bak, cannot be denied. In the region from where they had come, Bak was a well-known ethnic, for instance, Bakh in Bakhdhi (Bactra), Bagistan, Bagdada, etc. etc., and is explained as meaning 'fortunate, flourishing' " in addition to what was discussed by the same author quoted in Chapter 6 regarding the Pre-Chinese and the Chinese, per Lacouperie (ibid, pp. 116-119), on the ancestral Bak of the early Chinese as opposed to the pre-Chinese.
(3)^ A few hey points before proceeding: For general readers, here are a few introductory guidelines before delving further into this work.
-
Time commitment: This research is intended for publication in print format and is not suited for cursory browsing on the internet. Be prepared to invest ample time in engaging with its content.
-
Conceptual framework: If the introductory chapter feels dense or difficult to grasp, do not be discouraged. If you are eager to learn, consider a simplified perspective: treat Austroasiatic as a linguistic branch stemming from pre-Yue Taic languages, and build your understanding from that premise. Alternatively, you may begin with the assumption that Yue, distinct from both Sinitic and Austroasiatic, serves as the foundation for this discussion. This approach clarifies why Austroasiatic classifications tend to be retroactive, tracing a circuitous route from south to north.
-
Navigating Austroasiatic research: Do not let the overwhelming amount of Austroasiatic information online intimidate you. Much of it reiterates the same interpretations drawn from similar sources. Scholars in the Sino-Tibetan linguistic circle (focused on Yue studies) understand the limitations of such analyses. The author assume that if you have read this far, you align with the Sino-Tibetan perspective; otherwise, you likely would not have had the patience to engage with these discussions, let alone with the equivalent of hundreds of printed pages ahead. To maintain clarity, avoid reactive engagement with Austroasiatic arguments, as they often lead to distractions rather than progress.
Linguistic insights for different audiences:
-
For language learners: Much like the thrill of tasting "phở" for the first time, learners may be intrigued to learn that "phở" is etymologically cognate with 粉 fěn (SV "phấn", meaning 'noodle'). This root has branched into several Vietnamese words, including "phấn" 'chalk', "bún" 'noodle', "bột" 'flour', and "bụi" 'dust', all tracing back to the same semantic origin. (See Han-Viet.com)
- 雞 jī (SV "kê" ~ VS "gà", 'chicken')
- 蛋 dàn (SV "đản" ~ VS "trứng", 'egg')
- 蒜 suàn (SV "toán" ~ VS "tỏi", 'garlic')
- 打 dǎ (SV "đả" ~ VS "đánh", 'strike')
- 公 gōng (SV "công" ~ VS "ông", 'mister') ~ 翁 wēng (VS "ông", 'old man', etymologically linked to "lông", 'feather, hair')
However, for general readers, digesting these etymological connections requires time and effort. Explanatory elaborations may help, but some assumptions should be accepted as foundational premises without excessive scrutiny, such as the correspondence between 打 dǎ and "đánh". Further phonetic details, like its association with 丁 dīng (SV "đinh", 'young man'), would only add complexity. 丁 dīng also gave rise to words like 釘 dīng (SV "đinh", 'nail') and 打包 dǎbāo, which corresponds to Vietnamese "đóngbao" 'to package'. Readers may, of course, question whether "trai" 'young man' originated from 丁 dīng, but such inquiries extend beyond the immediate scope of this work.
(4)^ Austroasiatic Interpretations of Sino-Vietnamese Usage: Although unproven, this perspective is noteworthy as it provides Austroasiatic scholars with a rationale for the widespread use of Sino-Vietnamese words in daily Vietnamese speech. Their argument suggests that these words were adopted into common usage through linguistic evolution rather than being inherently native expressions belonging to speakers of the same language.
(5)^ Phonological insights for Chinese Philologists: Chinese philologists may find value in examining subtle articulation discrepancies in Vietnamese, which could offer solutions to complexities such as chongniu (重紐, rime doublets) and phonemic division patterns (I, II 等, first and second class distinctions) in Middle Chinese historical phonology.
(6)^ The Singular 'They': Regarding pronoun usage, the author acknowledges that the singular they is increasingly recognized as a practical alternative to "she," "he," or "s/he" in various contexts. The Washington Post formally adopted this usage in its stylebook in December 2015, and the U.S. Examiner followed suit on September 22, 2016. Furthermore, they was named Word of the Year by the American Dialect Society in 2015.
(7)^ For guidance on approximate pronunciation in modern Vietnamese, consult Vietnamese-English Dictionary by Nguyễn Đình-Hoà (1966) or Nguyễn Văn Khôn (1967).

