Sunday, April 6, 2025

Chapter 6 - The Chinese Connection

Executive Summary

  1. The Chinese intruders
    This chapter reassesses the origins of early Chinese civilization and its expansion into southern territories, including present‑day Vietnam. Drawing on Terrien de Lacouperie and subsequent scholarship, it argues that the foundational Chinese population, Bak tribes from western Asia, were cultural intruders rather than indigenous to the Yellow River basin. Alongside Altaic and Turko‑Tartaric groups, they gradually infiltrated southern China, displacing or assimilating native Yue populations. The resulting fusion of peoples and traditions laid the groundwork for early dynasties and shaped the trajectory of the Chinese language, itself a hybrid formed through centuries of interethnic contact. This hybridity extended into Vietnam through successive waves of migration and imperial expansion.

  2. The languages of China before the Chinese
    Lacouperie advanced the provocative notion of "pre‑Chinese languages of China", a vast and complex substratum preserved only in fragmentary Chinese records. He argued that southern Mon and Taic languages profoundly influenced early Chinese syntax, phonology, and semantics, introducing features such as SVO word order, tonal development, and classifier usage. Works like the Erya and Yang Xiong’s Fangyan preserve thousands of regional terms, many non‑Chinese in origin, reflecting centuries of contact and migration. Sino‑Vietnamese further illustrates this legacy, preserving archaic Chinese sounds while coexisting with vernacular Vietnamese.

  3. Linguistic evolution through colonial history: The case of Vietnamese
    Vietnam’s linguistic history is inseparable from its millennium under Chinese rule (111 B.C.–939 A.D.). During this period, Chinese settlers and officials introduced a vast array of Sinitic vocabulary, producing both formal Sino‑Vietnamese terms and colloquial expressions. Vietnamese vocabulary thus became layered: Yue substrata coexisting with Chinese borrowings. This complexity challenges simplistic Austroasiatic classifications. Vietnamese is not a creole but a grafted language, rooted in Yue foundations and enriched by Sinitic influence. Later episodes, such as the Ming occupation and Hồ Dynasty reforms, further illustrate the interplay of language, identity, and power.

  4. Prelude on the Sinitic etyma
    The chapter concludes by introducing a new etymological methodology for uncovering camouflaged Sino‑Vietnamese cognates. Through sound change analysis and morphemic decomposition, hundreds of Vietnamese words are shown to have plausible Sinitic origins. Vietnamese emerges as a linguistic tree with Sinitic branches grafted onto Yue roots, challenging Austroasiatic narratives and inviting a reevaluation of Vietnam’s place within the Sino‑Tibetan family.

x X x

China has long stood as both neighbor and counterpart to Vietnam, a presence that is at once formative and contested. To ask what China represents, and how its people should be understood in relation to Vietnam, is to confront questions of identity, inheritance, and resistance. The historical and linguistic bonds between the two civilizations run deep, yet the degree to which Vietnamese has been shaped by Chinese traditions remains a source of unease. For many Vietnamese speakers, acknowledging this influence seems to challenge the strength of national identity, even as the evidence of shared linguistic and cultural strata is undeniable.

Vietnam’s past cannot be disentangled from its long and intricate entanglement with China. Efforts to construct a purely independent narrative by selectively emphasizing certain cultural elements risk obscuring the fuller truth. This chapter therefore approaches the subject through three interwoven themes: identity, as nations define themselves in relation to one another; language, as a record of contact, borrowing, and adaptation; and power, as conquest and resistance leave their imprint on speech as well as on history.

A credible account of Vietnam’s origins must rest not on romanticized legends of an unbroken 4,000‑year lineage, but on rigorous, evidence‑based scholarship. By examining the linguistic record alongside historical testimony, this chapter seeks to illuminate how Vietnam’s voice emerged from centuries of dialogue, conflict, and exchange with China — a voice at once distinct and deeply marked by its closest neighbor. (K)

In parallel, the Austroasiatic Mon‑Khmer theory of Vietnamese‑Khmer affiliation has largely been constructed on lexical comparisons drawn from southern vocabulary. While this framework has gained considerable traction, its foundation rests on a relatively narrow subset of linguistic data, and thus warrants closer scrutiny and reevaluation within a broader historical and comparative context. (W).

This survey extends prior research by integrating historical documentation with linguistic evidence to reassess the development of the Vietnamese language. It cautions that when the historical complexities embedded in etymological studies are overlooked, younger scholars risk retracing familiar paths without advancing the discourse. Persistent methodological challenges, within both traditional and modern frameworks, continue to shape interpretation, often allowing ideological bias to color claims of objectivity.

In this chapter, the author approaches the central questions from a historical vantage point, presenting evidence to support several key assertions:

  • Geo‑political dynamics have contributed to the neglect and distortion of historical and linguistic records, largely as a result of nationalist resistance to Chinese influence.

  • Enduring antagonism toward perceived Chinese hegemonism has compromised impartial analysis of Vietnamese linguistic origins, particularly in their earliest stages.

  • Vietnamese etymology, when evaluated against a Sinitic framework, reveals a dominant Chinese imprint that continues to shape the language’s evolution and must be taken seriously in any theory of affiliation.

If advocates of the Austroasiatic Mon‑Khmer theory ground their claims in speculative reconstructions of events thousands of years past, then counterarguments rooted in historical and anthropological evidence deserve equal weight. One such perspective draws on Darwinian principles of natural selection: the presence of core Chinese linguistic elements in Vietnamese may reflect a long‑standing process of racial and cultural intermingling. This includes intermarriage between indigenous Vietnamese populations and successive waves of northern settlers, namely the Chinese. Such interactions, extending back to prehistoric times, have left a profound and enduring imprint on the Vietnamese language.

Language, like biology, follows its own evolutionary trajectory. To fully understand Vietnamese, it is necessary to theorize the origins of the Yue people and to trace the development of Chinese languages and their subfamilies, whether Sinitic within the Sino‑Tibetan family or Mon‑Khmer within Austroasiatic. In the 18th century, Sir William Jones famously identified structural and lexical commonalities among Sanskrit, Greek, and Latin, leading to the formulation of the Indo‑European language family. His mastery of 28 languages enabled him to recognize that these similarities were not coincidental but inherited from a shared ancestral source, Proto‑Indo‑European (Merritt Ruhlen, The Origin of Language, 1944 [1994], p. 27).

As Merritt Ruhlen later observed, Jones’s insight that languages evolve through descent with modification anticipated Charles Darwin’s evolutionary theory by more than seventy years. This parallel between linguistic and biological evolution underscores the importance of comparative methodology in tracing language origins and affiliations (Ruhlen, ibid., p. 28).

Darwin himself later affirmed that linguistic classification, like biological taxonomy, hinges on shared vocabulary and structural features. In The Descent of Man (1871), he noted that if two languages exhibit extensive similarities in words and construction, they are likely to have originated from a common source, even if they differ in some respects. Unfortunately, this foundational principle has been largely overlooked by historical linguists in recent decades.

Today, interdisciplinary studies — linking genetics, linguistics, archaeology, and evolutionary biology — offer promising avenues for understanding the origins and spread of human languages. Ruhlen emphasized Darwin’s foresight in The Origin of Species (1859), where he envisioned that a genealogical map of humanity would provide the most accurate classification of world languages. If extinct languages and transitional dialects were included, such a map would be not only ideal but necessary.

Guided by this evolutionary framework, the next section will explore the theorization of the Yue as cultural and linguistic predecessors to the pre-Chinese populations of ancient China, drawing from both anthropological and linguistic evidence to illuminate their role in shaping the Vietnamese language.

I) The Chinese intruders

In The Languages of China Before the Chinese (London, 1887; reprinted Taiwan, 1966), Terrien de Lacouperie proposed a provocative theory regarding the origins of the early Chinese civilization. He suggested that the foundational Chinese nucleus was composed of approximately a dozen Bak tribes originating from western Asia, specifically southwest of the Hindu Kush. These Bak leaders, according to Lacouperie, were more culturally advanced than the nomadic horsemen of the northern steppes, who are now recognized as belonging to the Altaic Turco-Mongolic lineage (see Peter A. Boodberg, 1979.)

Influenced by the civilizations of Susiana, an offshoot of Babylon, the Bak tribes had acquired knowledge in the arts, sciences, governance, and early forms of cursive writing. Around 2300 B.C., they migrated into the Yellow River basin, accompanied by Altaic groups from the north, and encountered indigenous southern populations. For centuries, these early settlers established themselves in what is now Gansu and Shaanxi provinces, particularly near the latitude of Taiyuan City. Their southward expansion was initially blocked by entrenched northern forces, notably the Jung tribes and the Xiongnu (匈奴), described in Chinese records as formidable barbarians. This period coincides with the reign of King Shun (舜, 2043-1990 B.C.), who inherited southwestern Shaanxi from his predecessor, King Yao (堯, 2146-2043 B.C.).

Upon arrival, the pre-Chinese groups gradually dispersed across the region, infiltrating aboriginal communities and asserting control over vast territories. Simultaneously, northern infiltrators continued to push southward, sometimes allying with indigenous tribes in rebellion or under nominal allegiance to the emerging Celestial authority. Resistance to assimilation was met with suppression or displacement, often forcing native groups further south.

Unlike the isolated tribal populations found along the Tibetan frontier, in Taiwan, or the Philippine archipelago, the majority of inhabitants in the Indo-Chinese peninsula were originally from China proper. As Lacouperie noted, "The ethnology of the peninsula cannot be understood separately from the Chinese formation", a reciprocal relationship that shaped both linguistic and cultural development. The Chinese language, while dominant, absorbed elements from aboriginal tongues, which were distinct from the Altaic or Turko-Tartar dialects of earlier northern occupiers. Instead, the early Chinese linguistic lineage aligned more closely with the Western or Ugric branch of the Turanian family, particularly with dialects such as Ostiak.

During the Xia Dynasty, around 2000 B.C., the language of the conquering Chinese began to intermingle with that of the indigenous populations as they advanced southeastward toward the Yangtze River Delta. This fusion marked a critical phase in the evolution of Chinese linguistic identity, shaped by both conquest and cultural exchange.

"[..]The aboriginal tribes, of the Flowery Land, with whom the Chinese Bak tribes, advancing through the modern Kansuh to South Shensi, fell into contact, did not receive them all in the same way. Some were friendly from the beginning, others objected to their advance, and the same thing occurred over and over again in the course of their history. Small and unimportant at first, the Chinese had no other superiority than that of their civilization. In their advance they had to make their way through the native settlements, either by amicable arrangements and interminglings, or, in case of need, by war and conquest, with the help of the friendly tribes. They used to establish advanced posts and military settlements, around which their colonists could take shelter when required by the hostile dispositions of the native populations among which they were interspersed. As a rule, in the history of their growth and development, the advance of their dominion was preceded by the settlements, always increasing, of colonists in the coveted region. It was their constant practice to drive away their lawless people, outcasts and criminals, who with the malcontents and the travelling merchants paved the way to the future official extension. The non-Chinese communities and states were in this way always gradually saturated with Chinese blood. This policy was never long departed from, even when in later times their power was sufficiently effective to permit a more effective way of bringing matters to a short conclusion.

Under the pressure of the Chinese growth by slow infiltration or open advance, the Pre-Chinese populations gradually retreated southwards; some of them were absorbed by intermingling; others, satisfied with the Chinese yoke, lost slowly their individuality, and formed part of the Chinese nation. Others were entrapped to the same end by the insidious process of the Chinese government, which, bestowing on their chiefs titles of nobility and badges of office, thus made them, sometimes against their secret will, Chinese officials. Light taxes and a nominal recognition of the Chinese suzerainty were only required from them as long as the government of the Middle Kingdom did not feel itself strong enough to ask more and overcome any possible resistance. But those of the Pre-Chinese who objected altogether to the Chinese dominion were thus gradually compelled to migrate away, either of their own will and where they chose and could, or, as was the case in later times, in such provinces or regions left unoccupied by the Chinese for that very purpose. Numerous were the tribes who were gradually led to migrate out of China altogether, as we have had many occasions to show in the course of this work.

The gradual submission of the Pre-Chinese was a very long affair, which began with the arrival of the Chinese Bak tribes, and has not yet come to an end, though the finish is not far at hand. For long the Chinese dominion was very small, and later on, when very large on the maps and in appearance, it was, as a matter of fact, effective only on a much smaller area. The advanced posts on the borders of the real Chinese domain used to give their names to regions sometimes entirely unsubdued, though the reverse has long seemed to be the case, because all the necessary intercourse between the independent populations and the Chinese government passed through the Chinese officials of these posts, specially appointed with great titles of office, for that purpose."

(Lacouperie. Idbid. pp. 106-108.)

Table 1: The Chinese intruders

It is not one of the least interesting results of modern researches in oriental history and philology that the Chinese should now be known as intruders instead of aborigines in their own country. This blunt statement must, however, be qualified, as the modern Chinese are a hybrid race, and their speech is a hybrid language. both of which are the outcome of interminglings between the immigrants from the north-west and north and the previous occupiers of the soil belonging to different races, and especially to the Indo-Pacific ones.

This better knowledge, for the benefit of the philosophy of history, was brought about by a closer examination of their early traditions, a rigorous identification of the geographical names mentioned, therein and in the course of their history, and the study of many historical statements and disclosures about the non-Chinese races actually settled within the borders of China proper, clumsily arranged under the heading of foreign nations, in the Chinese Dynastic Annals.

The early Chinese intruders and civilizers were the Bak tribes, about sixteen in number, who arrived on the N.W. borders of China not long after the great rising which had taken place in S.W. Asia at the beginning of the twenty-third century B.C. in Susiana. Their former seat was within the dominating influence of the latter country, as they were acquainted with its civilization, a reflex of the Babylo-Assyrian focus."

(Lacouperie. Ibid. pp. 113-114)

Table 2: The other intruders

"Numerous were the tribes and races who, for the same reasons as the Chinese Bak tribes, or attracted by the wealth and civilization of the latter, forced their way into China, imperiling the existence of its government, often superseding it altogether over a part or over the whole of the country, and afterwards disappearing, not however without leaving traces of their sway in the civilization, the language, and the population.

The Jungs, who had partly preceded the Chinese, the Teks, the Kiangs, etc., have been already mentioned in this work as having contributed to swell the ranks of the malcontents and banished Chinese families, as well as those of the aboriginal tribes, in pre-Chinese lands. Now we must refer more particularly to those of the intruders who have exercised an influence of some importance either politically or in civilization.

The oldest intruders of this class were the Shang 商, whose name suggests that they were traders, while their traditions indicate a western origin near the Kuen-lun range, and perhaps a parentship with the Jungs. They appear on the N.W. of the Chinese settlements since the beginning of and in the sixteenth century [B.C.]; they upset the Hia dynasty, took possession of the parts of Shensi, Shansi, and Honan then occupied by the Chinese, driving the Hia [廈 Xia] towards the coast.

The Tchou 周, formerly Tok, who drove away the Shang-Yn dynasty [殷 Yin], established their brilliant rule over the Middle Kingdom in 1050 B.C. ; some of them had lingered on the Chinese borders in Shensi for several centuries. They were, most probably Red-haired Kirghizes, and were not apparently without Aryan blood among them. It seems so, from the fact that they were acquainted with some notions derived from the Aryan focus of culture in Kwarism, which they introduced into China, and that several of the explanations added to the Olden texts of the Yn-King by their leader Wen-wang were certainly suggested by the homophony of Aryan words.

The Ts'in 秦, or better Tan [ SV "Tần" ], as formerly pronounced, formed an important state on the west of the Chinese agglomeration. It grew from the tenth century to the third B.C., when, having subdued the six other principal states of the confederation, its prince founding the Chinese Empire, declared himself Emperor in 221 B.C. Their nucleus was not Chinese, and made of Jung tribes who absorbed gradually many Chinese families from inside, and also Turko-Tatar tribes from its outside borders, the limits of which are not well known. This state was a channel through which passed, or a buffer preventing the passage of, any intercourse of the west with the Middle Kingdom."

(Lacouperie. Ibid. pp. 123-125)

 

This chapter will delve further into the historical foundations of the preceding hypothesis, offering a preliminary review of key evidence that supports the author's argument regarding the development of Sinitic-Vietnamese etymology. The aim is to trace how historical events and cultural interactions contributed to the linguistic formation of both Chinese and Vietnamese.

One focal point is the documented contact between ancient China and the BáchViệt (百越 BaiYue), a collective term for the Yue peoples, referred to by Lacouperie as “all the outside-borders” populations. These interactions date back to the Yin Dynasty (殷代, known in Vietnamese as ĐờiÂn), specifically during the period between 1718 B.C. and 1631 B.C., when hostilities between the Yin and the Yue were recorded.

By this time, the Yin civilization had already diverged from its earlier Tibetan-Bak roots and had begun resettling in the northwestern regions of present-day Gansu and Shaanxi. Archaeological discoveries, including findings from excavations as recent as August 2016, have substantiated the historical existence of the Xia (廈) and Shang (商) dynasties, successors to the Yin, suggesting that the Yin may have already been a fully established state during this era. (周)

Figure 1: Yin-Xia-Shang-Zhou timeline



The findings in the journal Science may help rewrite history because they not only show that a massive flood did occur, but that it was in 1920 B.C., several centuries later than traditionally thought.

This image highlights the variable timelines for the start of the Xia Dynasty according to traditional Chinese culture, the Xia-Shang-Zhou Chronology Project and the flood that was newly identified and dated by Wu et al. (Credit: Copyright © Carla Schaffer/AAAS)

From an etymological perspective, the interchangeability of the phonetic initials /d-/ and /j-/ in Old Chinese (OC) vocables provides compelling linguistic support for the Vietnamese legend of Thánh Gióng (Saint Gióng). This heroic figure, said to have led an army against Yin invaders, is recorded in Chinese historical texts under the name 董 (Dǒng, SV Đổng). The Vietnamese forms Gióng or Dóng (/jɔŋ⁵/) are phonetically cognate with the Old Chinese pronunciation of Dǒng (/toːŋʔ/), suggesting a shared linguistic ancestry.

This appellation may reflect remnants of proto-Taic speech patterns once spoken by descendants of the ancient Yue peoples who inhabited southern China prior to the rise of the Chinese dynasties. While the term could be extended to imply Austroasiatic Mon-Khmer affiliations, such a linguistic interpretation risks misrepresenting the historical and phonological context. Instead, the name Gióng may serve as a linguistic artifact, preserving traces of pre-Sinitic vernaculars that shaped early Vietnamese identity.

Following the decline of the Xia and Shang dynasties—successors to the Yin—the Zhou Dynasty (周王朝) emerged, ushering in a new era shaped by ethnolinguistic fusion. China North (華北) saw the integration of Rong (Jung) and Turko-Tartaric nomadic tribes, who established states such as Zhao (趙), Wei (衛), Liang (梁), and Liao (遼). These populations, along with subjects from the Central Plains vassal states, Qin (秦), Lu (魯), Qi (齊), Yan (燕), and Han (韓), formed the demographic foundation of a unified China under Qin rule in 221 B.C.

Prior to unification, interstate communication relied on Yayu (雅語), a lexicon of regional dialects, and Wenyanwen (文言文), or classical Chinese, as diplomatic tools. It is important to note that linguistic uniformity did not exist; northern and southern states spoke markedly different languages. 

 Southern polities such as Chu (楚), Wu (吳), and Yue (越), along with other historical polities like XiYue (西越), DongYue (東越), MinYue (閩越), WuYue (吳越), LuoYue (雒越), Ou Yue (毆越), and Yuechang (越常) are believed to share a common linguistic ancestry rooted in what this study refers to as the Taic family. This proposed lineage stands in contrast to the modern Sino-Tibetan classification, which, by design, encompasses contemporary Chinese lects.

The Yue linguistic sub-family, as hypothesized here, runs parallel to the Austroasiatic Mon-Khmer branch in terms of structural and historical development, though it diverges notably from the Daic-Kadai classification. This distinction invites a reevaluation of linguistic affiliations in southern China and mainland Southeast Asia, particularly in light of historical migration patterns and cultural convergence.

In the early 20th century, Western linguists grouped southern languages under the Austroasiatic family, including Mon-Khmer languages spoken across Southeast Asia. However, archaeological and linguistic evidence from southern Indochina suggests that these languages, such as Chamic, were unrelated to the Annamese newcomers who arrived after the 13th century. By then, the Annamese were already speaking a Yue-derived language heavily infused with Chinese elements, likely resembling vernacular Mandarin more than any Mon-Khmer tongue. This analysis is grounded in historical records, not prehistoric speculation.

As the Spring and Autumn Warring States period (770–221 B.C.) came to an end, many defeated populations fled southward. Among them were Yue peoples—recorded in Chinese texts using characters such as 鉞, 粵, 越—who had long established their own states. Northern Chinese elites often referred to these groups derogatorily as NamMan (南蠻), or "Southern Barbarians." Over time, the Yue tribes evolved into modern ethnic minorities such as the Dai (傣), Zhuang (壯), Yao (瑤), Miao (苗), and Mon (猛 or 毛南), each with distinct linguistic trajectories.

The Chu State (楚國), notably, was populated by people of Taic descent (原始 傣族). After its defeat by Qin, its population was absorbed into the Qin Empire. In 208 B.C., the Yue states from MinYue (閩越) to northern Vietnam’s Giaochỉ (交趾) came under the rule of King Zhao Tuo (趙佗, Triệu Đà), a former Qin general. These territories later formed the NamViệt Kingdom (南越 王國), which lasted nearly a century before being annexed by the Han Empire in 111 B.C.

The Han Empire’s demographic landscape was shaped by diverse populations originating from the former Chu and Qin states, along with the annexed Nam Việt territories. Consequently, the Han people inherited a notable Yue ethnic influence. It is particularly significant that Liu Bang (劉邦), the founding emperor of the Han Dynasty, and many of his key officials were once subjects of Chu. This historical detail is reiterated to highlight the ethnic and regional foundations of the Han Empire and its broader societal makeup.

Following the Han annexation of Nam Việt, portions of what is now northern Vietnam were reorganized as Jiaozhou Prefecture (交州). During the Tang Dynasty, this region came to be known as the Pacified Southern Protectorate (安南都護府, Annam Đôhộphủ). As Han influence expanded, indigenous Viet-Muong communities—descendants of the Luo Yue (雒越, Vietnamese: Lạc Việt)—resisted cultural assimilation. Many Muong retreated to mountainous areas, while those who remained in the Red River Delta gradually formed the Kinh (京族, Jingzu) majority. The Kinh identity emerged through intermarriage between Yue-influenced Han settlers and local aboriginal groups. The character "京" (Kinh), meaning "metropolitan people," has long reflected their self-perception and cultural continuity.

From 208 B.C. onward, the Qin language (秦, Tần) began influencing the indigenous Vietic tongue, contributing to the formation of early Sinitic-Vietnamese vocabulary. Later, in 186 A.D., Viceroy Sĩ Nhiếp (士攝) mandated the use of Han Chinese over the native Yue language, deepening the linguistic integration. Continued dynastic shifts and colonial policies, especially during the Ming occupation in the 15th century, further shaped Middle Vietnamese. Over time, the linguistic divergence between Muong and Kinh grew so pronounced that they became mutually unintelligible.

In sum, the linguistic legacy of northern intruders and pre-Chinese settlers in southern China profoundly shaped both Chinese and Vietnamese language histories. As Lacouperie aptly concluded, their impact cannot be overstated.

"The influence of the Turko-Tatar races has been considerable. Several of them [...] belong to olden times. For several centuries after the Han period, ignorant Tatar dynasties have ruled over parts of Northern China. The Sien-pi, cognate to the Coreans, have produced the dynasties of the Former Yen, 303-352 A.D.; the After Yen, 383-408 A.D.; the Western Yen, 385-394 A.D.; the Southern Yen, 398-410 A.D.; the Southern Liang, 397-414 A.D.; the WesternTsin, 385-412 A.D.

The Hiung-nu Turks have produced the dynasties of Northern Liang, 397-439 A.D., of the Hia, 407-431 A.D. in W. Shensi (to be distinguished from the later Si-Hia), and afterwards the Northern Han, in 951-799 A.D.

The Tchao Turks produced the dynasties of the Former Tchao, 304-329 A.D., and After Tchao, 319-352 A.D.

The Si-fan have produced the dynasties of Tcheng in Szetchuen, 301-346 A.D. ; of the Former Tsin, 390-395 A.D., After Tsin, 384-417 A.D., both in Shensi. The Tobat Tatars, who produced the great dynasty of the Northern Wei, 386-532 A.D., belonged to the same group. They were apparently acquainted with the Syriac writing, at least about 476-500 A.D., and they had a court language of their own, in which their ruler Wan-ti at that time (in 486 A.D.) ordered that a translation of the Hiao king or 'Book of filial piety' should be made. Its use was not abolished before 517 A.D.

The rule of the Northern Wei extended over the whole of Northern China, with a few regional exceptions in the proximity of the Yang-tze Kiang. Later on, that of the Mongol dynasty of the K'itan or Liao, 907-1202 A.D., was restricted in the north-east. In the north-west, the Si-Hia or Tangut dynasty ruled from 982 to 1227, until it was swept away by the Mongols. [..] The Kin or Jutchih, the ancestors of the present Mandshu dynasty, ruled over a larger area than the N. Wei, from 1115 to 1234 A.D. The Mongol Yuen dynasty established by Kubila'i-Khan in 1271, and which lasted until 1367, was the first to rule over the whole of China; its great power did more for the homogeneity of the Middle Kingdom than any previous effort.And at last, in 1644, the Mandshu Ta Tsing dynasty established its sway all over the Empire[..]

These various dynasties brought each of them their own language, as their names suggest, and restricted as it was in its use to the court and soldiery, its influence was in every case limited, though by no means unreal, as shown by the alteration of pronunciation and the introduction of words in the official dialect. With regard to the [..] Maudshus, their presence has hurried on the phonetic decay of the Peking Mandarin dialect, now the official language, on the path of hissing and hushing the sounds, where it had entered since the days of the Yuen Mongols. Their small number, and their habit of living somewhat apart from the population, restrict the influence of the soldiery, which is felt only in the proximity of the post-towns over the empire, by the introduction of a few terms in the vernaculars."

(Lacouperie. Ibid. pp. 127-129)

Regarding the Taic linguistic roots spoken by the subjects of the ancient Chu State—including King Liu Bang (漢高祖 劉邦), founder of the Han Dynasty, and the generals who helped establish the Han Empire, as previously discussed—the foundational vocabulary of Sinitic-Vietnamese reveals notable connections to the Daic-Kadai language family. This family includes dialects still spoken today by the Tày ethnic groups in northern Vietnam. The linguistic survey presented here highlights the author’s recent findings, alongside earlier discoveries, which point to glossarial remnants of proto-Taic elements embedded in pre-Sinitic language, predating even Archaic Chinese. These elements are particularly evident in the Minnan dialects of the MinYue (閩越) region, corresponding to modern-day Fujian Province, and show clear affiliations with core vocabulary from the Yue aboriginal language. (Refer to Chapter 7 for the Tày word list.)

II) The languages of China before the Chinese

Long before the emergence of Chinese proper, a constellation of indigenous languages thrived across the regions south of the Yellow River and extending into the Red River Basin. Historical linguists identify these as branches of the broader Taic family — Taic‑Shan, Taic‑Dai, Mon‑Tai, Mon‑Paluang, and others — languages that ultimately gave rise to Dai, Yue, Austroasiatic, Mon‑Khmer, Viet‑Muong, and early Vietnamese. This section examines those ancient Taic‑Yue languages that preceded the rise of Sinitic, their speakers, and the hybridized descendants that were later classified as "Chinese dialects."

For languages of uncertain affiliation, it is unsurprising that Southeast Asian linguists have sometimes described them as "mixed," "hybrid," or even "generic." Yet in reality no language is truly "generic" in the sense of an artificial "Esperanto." Afrikaans, Albanian, Haitian French, and Vietnamese alike are natural languages with deep historical roots. Vietnamese, in particular, has long been classified by the Mon‑Khmer school as Austroasiatic, largely on the basis of its core vocabulary cognate with Mon‑Khmer forms.

Genetic affiliation, however, is rarely straightforward. Typologically, language A may share a portion of its lexicon with neighbor B, which in turn overlaps with C, and C with D, and so forth. At a distance, language Z may display scattered cognates across A, B, and C, though without necessarily being genetically related. Such patterns recall the intriguing resemblances sometimes noted between distant Asian and American Indian languages — for example, California’s Lake Tahoe and China’s "Tàihú" (太湖, SV Tháihồ), both denoting a "large body of water."

An anthropological‑linguistic scenario may help to frame the Vietnamese case. Let us posit Vietnamese (the ancient Annamese or Vietic tongue) as a descendant of an ancestral Y (Yue), itself a branch of T (Taic). This same T also gave rise to X (Zhuang), making ancient Vietnamese and Zhuang linguistic cousins, both distantly related to Z (Zhou). Z was later subsumed by Q (Qin), and together these lineages evolved into a composite XYZ (see Chapters Two and Six on the genetic components of Chinese and Vietnamese). From this amalgam emerged H (the Han peoples) and S (the Sinitic languages). Surrounding them were numerous now‑extinct languages — A, B, C, D — whose traces survive only in scattered vestiges.

Figure 2 - Linguistic ancestry diagram

                

                T (Taic)
                   │
     ┌─────────────┴─────────────┐
     │                           │
   Y (Yue)                     X (Zhuang)
     │
     └───> V (Vietnamese / Vietic, ancient Annamese)

Z (Zhou) ─────────┐
                  │
                  └──> absorbed by Q (Qin)
                           │
                           └──> composite XYZ
                                   │
                     ┌─────────────┴─────────────┐
                     │                           │
                   H (Han peoples)             S (Sinitic languages)

Other extinct neighbors: A, B, C, D … (scattered vestiges)

Through centuries of intermingling, conquest, migration, and integration:
   Y + T + Z + S + intermediaries (P, R, Q) 
        ↓
      K (Kinh, "mutated" Vietic‑Yue)
        ↓
      V (Modern Vietnamese)
      
      

It is hypothesized that the proto‑Taic speakers gave rise to the Yue aboriginals, who once occupied vast stretches of pre‑Chinese territory, ranging from the northern Yangtze basin to the coastal regions of present‑day Zhejiang and Jiangsu. From this substratum, Vietnamese basic vocabulary may have drawn directly on elements of T, Z, and S, while at the same time exerting influence on its southern neighbors, including the Austroasiatic Mon‑Khmer languages. Such diffusion unfolded over centuries of sustained contact — marked by submission, migration, trade, warfare, annexation, and integration — as populations gradually moved southward.

Although Vietnamese and the Sinitic languages may not be genetically affiliated in the strictest sense, they nonetheless share a kinship through common ancestral cousins and intermediate carriers (P, R, Q), forged in the crucible of conquest and domination. Across long centuries and vast spaces, this process culminated in the emergence of the Kinh (K), a transformed lineage of earlier Vietic‑Yue peoples, who ultimately became the modern Vietnamese (V) we recognize today.

Table 3 - On the Pre-Chinese Aboriginal Taic Linguistic Family

The Taic linguistic family examined in this study corresponds to what Terrien de Lacouperie, in The Language of China Before the Chinese (London, 1887; Taiwan reprint, 1966), described as the Mon‑Taic dialects. According to Lacouperie, these were the pre‑Chinese aboriginal dialects spoken across ancient China. Building on both historical traditions and legendary accounts, he sought to establish the affiliations between Taic, Chinese, and Yue as interpreted as follows.

The Pong (), also known as the Pan‑hu (盤瓠) race, held a predominant position in Central China, south of the Yellow River, at the time when the early Chinese, or Bak tribes, migrated into the region. Their leader, remembered as Pong, became the subject of numerous legends. He was said to have settled in northeastern Sichuan and western Henan, where he maintained friendly relations with the Chinese from the outset. Indeed, he reportedly aided them in resisting incursions from the Jung and Naga peoples advancing from the northwest. Many tribes later claimed descent from him, and some continued to venerate his memory. Their collective name, Ngao meaning "powerful," eventually evolved into the ethnonym Yao.

The Pan‑hu race was considered a branch of the Mon peoples from the southwest, who had occupied large parts of China prior to the arrival of the Chinese, that is, before the twenty‑third century B.C. From this branch, and through intermingling with northern Kuenlunic (崑崙  Kūnlún) tribes, the Taic or Shan‑Siamese populations are thought to have emerged. Over time, some of these groups migrated southward under pressure from Chinese expansion, spreading into Indo‑China and forming several distinct states.

The Pan‑hu language itself is not directly attested but is inferred from the dialects of the tribes descended from it. Its most notable feature was its ideological orientation, described as nearly opposite to that of the Kuenlunic languages. The oldest remnants of this speech were preserved by Chinese writers of the Han Dynasty, particularly in the Annals of the Eastern Han. Earlier traces appear in still older works, though there they are cited only with geographical markers, leaving scholars to infer the identity of the speakers. By contrast, in the Han sources the words are explicitly attributed to the Yao of the Pan‑hu race, a precision that, as Lacouperie emphasized, makes all the difference.

(Lacouperie. ibid. pp. 38-39)

Ethnologically, besides what was discussed by the same author in Chapter 8 regarding the Pre-Chinese and the Chinese, per Lacouperie (ibid, pp. 116-119), on the ancestral Bak of the early Chinese as opposed to the pre-Chinese, he demonstrated that

"[...] the chief characteristic of these affinities between the early civilization of the Chinese 4000 years ago and the much older focus of culture of South-West Asia is that they are obvious imitations and borrowings. They have nothing original in themselves, and bear in the face that they do not come from common descent. They present the usual imperfectness unequally combined with a complete identity on some points and others which are always the accompaniment of acquisitions obtained through a social intercourse of protracted length, and not from a casual teaching and learning from books and scholars.

The name Bak [百] (now Peh), of the original Chinese immigrants, meant 'flourishing, many, all,' and also 'hundred.' But it has not the last meaning in such expressions as Peh sing 'all the surnames,' Peh kuan 'all the officials,' Peh Liao, same meaning, Peh Yueh [百越 BáchViệt] 'all the outside-borders,' etc., where no possible reference can be made to any precise number, since these various items comprise several hundreds, as in the case of the first three, or only a few, as in the last case. All through the Shu-King [書經] or Canon Book of History, it is employed as a whole though undetermined number. And as a matter of fact, the well-known expression Peh sing, above quoted, which appears from the beginning of Chinese history, and about which so many baseless speculations have been set forth, has never meant the hundred surnames, as was wrongly presumed, and this for several reasons. The supposition that Peh sing meant 'the hundred surnames ' (or families) was based on the fact that the Peh Jia sing or 'the hundred (?) family names,' which includes some 460 names, was only compiled under the Sung dynasty, i.e. after A.D. 960, when the number had increased largely and much beyond its original figure. But this admitted, the regular use of the family names does not go back much beyond the time of Confucius (B.C. 551-479), and when this list of surnames is carefully sifted, we do not find more than about sixteen surnames dating as far back as the beginnings of the Chinese in China; this small number, however, being only reached if we include a few family names quoted in the early traditions, and disappearing afterwards. Therefore, as the term Peh sing, 1 i.e. the 'Bak Surnames,' existed among the Chinese from the outset as an appellative for themselves, the word Peh, old Bak, could have, not the meaning of 'hundred,' but perhaps that of 'all, numerous, flourishing,' as stated above, should it have been still understood. And the meaning 'hundred,' which originally was apparently said bar, was only a homonymous sound in the limited phonetic orthoepy of the Chinese, expressed by the same symbol because of the similarity of sound, real only for them.

Bak was an ethnic and nothing else. We may refer as a proof to the similar name, rendered however by different symbols, which they gave to several of their early capitals, PUK, POK, PAK, all names known to us after ages, and of which the similarity with Pak, Bak, cannot be denied. In the region from where they had come, Bak was a well-known ethnic, for instance, Bakh in Bakhdhi (Bactra), Bagistan, Bagdada, etc. etc., and is explained as meaning 'fortunate, flourishing.'

Another ethnical name no less important is that which is now read 夏 Hia, also sha, in several ideo-phonetic compounds, and which was the proper appellative of one of the leading tribes of the immigrants when settled in 'a little bit of territory in the N.W.' It became the name of the Chinese people. The Ku-wen spellings tell us that its original full form was something like Ketchi, Ketsu, Ketsi, Kiitche, Kotchi, etc., which are all graphical attempts at rendering the exact name with the clumsy acrologic and syllabic system of the time being. We may take Kütche as an average of all these variants. Now this name is so much like that of the Kashshi on the north-east of Mesopotamia that, without suggesting in any way a relationship of some kind between the two peoples, there may have been an affinity of names from a common meaning suitable to both.

An analysis of the aforesaid book of the family surnames, the Peh kia sing, shows their number to be made up, besides the original names, of native appellatives brought in sometimes by the entrance of native tribes into the Chinese community, but principally from the native names of regions bestowed upon Chinese subjects as fiefs and territorial grants. Even the princely names taken by the early Chinese leaders in the Flowery Land were borrowed from those of native regions, as they conquered them. But an examination of all these proper names, tribal and geographical, would carry us much beyond the limits of the present work.

We have little to say here of the early language of the Chinese Bak tribes, and its subsequent evolution and development into several important dialects, as the matter is somewhat precluded by the object of the present work. We allude elsewhere to some of its characteristics and to the formation of its ideology (§§ 20-26) and tones (§§ 117, 230). The explanation of the gap now existing between the book-language 2 and the vernaculars requires some long explanations and demonstration much beyond our scope here. The following scheme, however, gives the list of the most important languages, dialects, and subdialects, with an indication of the probable dates of their branching off. It is the first attempt which has hitherto been made at classifying them, and thus far must be looked upon with regard to the relative position of several dialects and subdialects as provisional. A great deal of work and investigation remains to be done before such a classification can be completed. The total number of dialects and subdialects, hiang fan or local patois, etc., has been roughly estimated to be somewhat similar to that of the days of the year (360), and though they are not likely to affect the general lines of the classification below, it may be useful not to forget that the total 'figure of the names entered therein is only one-ninth of the general number."


Figure 3. General Historical Scheme of the Chinese Family of Languages



1 Pak was written in Ku-wen with the old form of  Pei with  Ke (mod. hia placed over, or  Kao placed below and read P-k  [sic] . In Ta-tchuen [大篆] style Pak sing were written sometimes as a single word  sing over and  Buk or Muk, or an old form of  Pak under. In modern writing 百姓.

2 A misconception as to the real character of the Chinese language, at first known in its fictitious book form written with ideographic symbols, now syllabic, and supposed to be genuine and spoken; combined with another misconception as to the non-historical and mnemonic value of the 1720 pseudo-roots of the Hindu Brahmans analysing their Sanskrit; both misconceptions — understood as justifying a theory of an early period of monosyllabic roots, while, as a matter of fact, these are generally late in the history of language, — have misguided the greater number of philologists until the present time, and have for long hindered the progress of the science of language. Our predecessors have erroneously built a logical monosyllabism from the monosyllabisms of writing, of decay, and of elocution, the only ones which have ever existed.


Linguistically, in terms of data availability as of the late 19th century, Lacouperie (ibid. pp. 3-5) noted that

"the languages mentioned in these pages are not all of those, or the representatives of those, which were spoken in the Flowery Land when the Chinese made their appearance in that fertile country some four thousand years ago. The Chinese have only occupied it, slowly and gradually, and their progressive occupation was only achieved nominally during the last century [i.e., the 18th century]. Some portions of the S. and S.W. provinces of Kueitchon [sic], Szetchuen, Yunnan, Kuangsi and Kuangtung are still inhabited by broken and non-broken tribes, representatives, generally cross-bred, mixed and degenerated, of some former races who were once in possession of the country. Therefore the expression pre-Chinese languages of China implies an enormous length of time, which still continues, and which would require an immense study should the materials be available.

Unhappily the data are of the most scanty description. They consist of occasional references given reluctantly and contemptuously during their history by the Chinese themselves, who were little disposed to acknowledge the existence of independent and non-Chinese populations in the very midst of their dominion. Though they cannot conceal the fact that they are themselves intruders in China proper, they have always tried the use of big words and large geographical denominations, which blind the unwary readers, to shield their comparatively small beginnings. Such indications can be obtained only by a close examination of their ancient documents, such as their histories, annals, and the local topographies, where, in the case of the annals, they have to be sought for in the sections concerning foreign countries; an arrangement somewhat startling, though not unnatural when we consider the real state of the case from a standpoint other than the views entertained by the ancient sinologists on the permanence and the ever-great importance of the Chinese nation. But the Chinese, though careful to inscribe in one or another part of their records all that occurred between themselves and the aboriginal tribes, and all that they could learn about them, were not enabled to know anything as to the events, linguistical and ethnological, which took place beyond their reach. So that displacements of the old races, as well as the arrival of new ones, have taken place in the regions non-Chinese, now part of China proper. Foreign linguistic influences have also been at work, and of these we have no other knowledge than that deduced from the traces they have left behind them which enable us to disentangle their peculiar characteristics."

Syntactically, to say the least, as to the Southern linguistic influence on the Chinese language, per Lacouperie (ibid, pp. 16-17),

"the postposition of the genitive to its noun, which occurs not unfrequently in the popular songs of the Book of Poetry, where it cannot possibly be looked upon as poetic licence, belongs to an influence of different origin, and is common to the Mon and Taic languages." [...] "And for the position of the object to the verb, and the syntactical order of [ Subject+Verb+Object ] standard, in contradistinction with the unadulterated indices of the Ural-Altaic, which it formerly possessed, there is no doubt that the Chinese language was indebted to the native languages of the Mon, and subsequently to the Taic-Shan formation." [..]

"The phonesis, morphology, and sematology of the language bear, also, their testimony to the great influence of the native languages. The phonetic impoverishment and the introduction and growth of the tones as an equilibrium to make up deficiencies from wear and tear, are results of the same influence. In the process of word-making, the usual system of the postplacing particles for specifying conditions in space and time common to the Ugro-Altaic linguistic alliance has been disturbed in Chinese, and most frequently a system of preplacing has been substitute for the older one. And finally, in the department of sematology, we have to indicate, also, as a native influence on the language of the Chinese, the habit of using numeral auxiliaries, or segregative particles, otherwise classifers, which, if it has not been altogether foreign to the older state of the language, would not have taken the important place it occupies in the modern dialects."

"The vocabularies which, contrary to the usual habit, have not been the first considered have come at one pace with the preceding alternations. The loan of words have been intensive on both sides, native and Chinese, and reached to a considerable amount."

The linguistic characteristics as described above is in the Chinese standpoint as those have been put in the historical perspective. During the period of the Zhou Dynasty (1050-255 B.C.) the State of Chu was one of the great power of all, of a non-Chinese civilization, of which its territory covered from Anhui, Hebei, to Honan provinces, and a waving and ill-defined territory all around. On the east of Chu were the states of Wu and Yue, non-Chinese, covering the modern provinces of Jiangsu and Zhejiang in about 584 B.C. and the Wu was later conquered by the Yue in 473 B.C. Towards the end of the 4th century B.C., philosopher Mengzi (Mencius), took note that the Chu 'barbarians' spoke a shrieking language different from those people of the Qi State in today's Shandong Province. Note that the names of the kings of the Wu and the Yue have decidedly a non-Chinese appearance; therefore, it and all other states were in need of interpreters in the machinery of the Chinese government. (Lacouperie. Ibid, pp. 20-21).

In our time, we had at hand the Erya (爾雅), containing hundreds of local vocabularies, having been used as a common tool for communication among ancient states in ancient China, on the one hand. It was believed that it had been an interstate diplomatic language, on the other hand. As a matter of fact, Erya was a dictionary issued by the Zhou Dynasty that collected common words, including non-Chinese languages, with explanations and many double-words arranged in pairs, which is a characteristic feature of the Taic-Shan languages, commonly found in Shijing (詩經) or Classic of Poetry. In fact, "it contains many words which do not seem to have ever been used in any Chinese text properly so called. They are regional words borrowed from other stocks on vocables, and they could be expressed in Chinese writing only by the use of homonyms as phonetic exponents. [..] There are no less than 928 words or about one-fifth of general stock, which do not appear anywhere else than in the Erh-ya." (Lacouperie. Ibid, pp. 23)

Lacouperie, nevertheless, found that the most important work was Fangyan (方言 'Dialects') by Yang Xiong (楊雄 53 B.C.-18 A.D.) and much of the attention was paid to local words about the time of this author. Before Yang Xiong, other scholars had labored on the subject with collections of thousands of local words that had been utilized and adapted into Yang Xiong's work up to 9000 words arranged by subjects from 40 regions, many of which were only Chinese in name, and others not Chinese at all such as Hebei, Anhui, Hubei, Hunan, Jiangsu, Zhejiang, Guangdong, Guangxi, Sichuan, etc., within the modern proper of China. All in all, later generations added more items and brought them up to 12,000 words. Note that words in this remarkable work represent the collection of several centuries in which many names of states did not exist prior to his time, e.g., 南越 NanYue, 貴州 Guizhou, 湘 Xiang, and even the Qin State 秦國 Qin that was destroyed and partitioned in 436 B.C. by the states of Han 韓, Wei 衛, Zhao 趙, etc. (Lacouperie. Ibid, pp. 25, 29).

So, being such a case, the Chinese symbols attached to the recorded character-words were pronounced differently in each era, that is a serious matter to consider.

"This is made apparent by this fact, that differences of pronunciation are often indicated by symbols whose sounds have for long been homonymous. However, the best means to start with, and subjected to the least proportion of ulterior modifications, are the sounds preserved in the Sinico-Annamite, the most archaic of the Chinese dialects. The only preservation to be made, is that the hardening and strengthening which this dialectal pronunciation indication goes perhaps beyond the mark, and that half of its strength might be due to local peculiarity of the dialect."
(Lacouperie. Ibid, p. 29)

By mentioning "Sinico-Annamite", termed as "Sino-Vietnamese" (SV) in this paper, Lacouperie not only meant "Sinico-Annamite" vocabulary but also an academic language considered as a dialect like those of Cantonese or Fukienese.

"Two languages are used in Annam. One employed by the literati only is pure literary Chinese, with the old sounds of the Ts'in [秦 Tần ] period attached to the written characters. It is the Sinico-Annamite, this very dialect, which, with necessary allowance for decay and self divergence, rightly deserves the qualifications of the most archaic of the Chinese dialects.

It is the curious fact that its existence was not, in the minds of many scholars, separated from that of the other language, the vernacular Annamese or Cochin-Chinese, which belongs, as recognized by John Logan, and though full of Chinese idioms, to the same family, as the Mon or Peguan[?]."
(Lacouperie. Ibid, p. 54)

And by the time of the publication of his book, 1886, Lacouperie (Ibid. p. 55) noted that there were 3 writing systems used in Annam: (1) the chữNho (字儒), (2) chữNôm (字喃), (3) chữQuốcngữ (字國語), of which characteristics have been discussed previously in this paper and elsewhere by all Sinologists and Vietnamese specialists, all similarly described.

All said, the author herein would like to bring to the attention that many of those Mon-Taic vocabularies as list "Mon-Taic" by Lacouperie barely find plausible cognates in modern Vietnamese. Even though the author related only to non-Chinese ethnology of the country as Fairy Dragon's descendants (龍種) as those of the Mon-Taic races, starting with King of Kinhdương (京陽王 Jingyang Wang or SV 'Kinhdương Vương') whence Jingyang was a place name near the capital of Qin in Shaanxi. King Kinhdương was the son of Prince by a girl of of the race of the immortals (the race of Peng 彭 or Panhu 盤瓠, as previously mentioned, who were ancestors of the Taic race; hence, the phrase 'conrồngcháutiên' (or 'children of the Dagon and Immortal race') in the Vietnamese legends. King Kinhdương married a wife from Độngđình Lake (洞庭湖 Dongtinghu, in Hunan Province), also belonging to the Dragon race.

"King Lak-Long [Lạclong Quân (雒龍君)], the issue of this union, was the first of a series of eighteen rulers, the last of whom ended in 207 B.C. At the rate of twenty-five years a reign, the highest average possible, these speculative data lead to circa 800 B.C. as the probable date of these beginnings, which therefore would have taken place when the state of Ts'u [ 楚 Chu (Sở) ] in Hupeh and Hunan S. was in full prosperity."

"The boundaries of the kingdom of these early Annamese rulers were, according to the tradition, on the east the sea, on the north, Tung ting lake, on the west Pa and Shuh, both names for Szetchuen, with one ruler whose reign of fifty years that ended in 202 B.C. when the third dynasty begins. The latter is no less than that founded by the successor of Jen Hiao [ 任囂 (Nhâm Ngao) ], Tchao T'o [趙佗 (Triệu Đà)], a rebel Chinese [秦 Qin (Tần)] general who established his sway all over the maritime provinces of the south, extending from Fuhkien to Tungking [東京 (Đôngkinh) or 'Tonkin', North Vietnam ]; which lasted with 5 rulers until 112 B.C., when it submitted to the Chinese dominion, which, however, was merely nominal in some parts, and not at all established on the east. It was recognized from that date, with the exceptions of three years (39 - 42 A.D.), until 186 A.D., when a native king, Si-nhip [士攝 (Sĩ Nhiếp), known as the Han's viceroy in Vietnam's early history, though ], ruled for 40 years. It was this king who introduced the Chinese literature, and prohibited the of the use of phonetic writing [?] hitherto employed by the Annamite."

(Lacouperie. Ibid, pp. 53-54)

As we assigned the Chu populations as descendants from the Taic aboriginal peoples who gave birth to the Dai-Kadai (Taic-Shan, Mon-Shan, Mon-Taic by Lacouperie) and the Pre-Chinese Aboriginal Mon-Khmer (in this paper being termed as Taic-Yue, Yue, Daic, Tai-Kadai, Austroasiatic, Mon-Khmer, etc.) languages, for the latter tribes, Lacouperie states that, "the ancestors of the language and civilization of the Annamites, and partially also of their race, must be sought for in Central and Eastern China. We hear from history that the former population of the south, between the Kwangtung [Canton] and Tungking [Tonkin], both, inclusive, were generally displaced by, or intermingled with, half a million of colonists drawn chiefly from the region of modern Tchetkiang [Zhejiang] and its west, by Jen Hiao [Nhâm Ngao] in 218 B.C." (Lacouperie. Ibid, p. 52)

As a matter of fact, with regard to Mon-Shan affiliation, the author cited a number of its aboriginal languages, especially that of the "Paloungs"" (勃弄 'Po-lung', 'Palaung'), a language of the Mon-Talaing family and its speakers were settled in northwest Yunnan, which was later conquered by the Nanzhao (南詔) Kingdom of the Shan tribes in the 7th century.

"We have two vocabularies of their speech; one of 200 words collected in 1858 by Bishop P. A. Bigandet, which examined by John Logan, permitted this great scholar to recognize the Mōn-Annam relationship of the language. Another vocabulary was collected by Dr. Hohn Anderson at the time of his expedition in S.W. Yunnan. The latter list of words is less saturated with Shan words than the preceding. The indices of its ideology are 2 4 6 8 VI [ i.e., grammatically word order, e.g., adjectives and genitives follow nouns, etc., being like that of the French language (Lacouperie. ibid. p. 66) ], which confirm the glossarial evidence."

"As we have seen in our foregoing §§ 31-33 the language spoken in Ts'u was not a Chinese dialect. And the statement of Hung k'iü, ruler in Ts'u from 887-867 B.C., saying, 'We are Man-y (i.e., aliens from the Chinese), and we do not bear Chinese names,' is an unnecessary confirmation. The words quoted from the Ts'u Fang yen are easily identified with the Mōn and Taic-Shan vocabularies in equal shares, when they are not simply altered Chinese. And the most frequent phonetic equivalent is that of k or h for a Chinese l, still existing in the modern language."
(Lacouperie. Ibid. pp. 55-56)
To sum up, here are the key findings of the foregoing:
  • Fragmentary evidence: Knowledge of pre‑Chinese languages is limited, preserved mainly in reluctant Chinese records and scattered references.
  • Southern influence: Mon and Taic languages shaped early Chinese syntax (genitive placement, SVO order), phonology (tonogenesis, phonetic reduction), and semantics (classifier system).
  • Lexical borrowing: Intensive two‑way borrowing occurred between Chinese and indigenous languages, leaving a substantial shared vocabulary.
  • Regional records: Works like the Erya and Yang Xiong's Fangyan preserve thousands of local and non‑Chinese terms, reflecting centuries of contact and migration.
  • Chu, Wu, Yue States: Powerful non‑Chinese polities in the Zhou era maintained distinct languages, requiring interpreters in Chinese administration.
  • Sino‑Vietnamese legacy: The so‑called "Sinico‑Annamite" preserved archaic Chinese sounds, functioning as both a scholarly register and a bridge to vernacular Vietnamese.

Regarding words from the "Paloungs" (勃弄 Po-lung) language, a Mon-related language, in this paper the author has cited them as "Palaung" from the list of 249 words in the table published by Luce  G. H. (1965) (See Chapter 8.)

The author identifies several intriguing parallels in wordlists cited by Lacouperie, though most of the cognates he proposed were treated as loanwords from "Tai‑Shan" and "Mon‑Taic" aboriginal languages. The essential point is that the relationship between Vietnamese and these pre‑Chinese Mon‑Taic dialects, including Mon‑Khmer, s relatively loose, though. Their cognateness is not as firmly established as that of Chinese dialects such as Cantonese or Fukienese, for reasons discussed in earlier chapters. Lacouperie did note some striking correspondences: for instance, in the Tai‑Shan dialects of the Zhongjiazi (also "Tchung Miao"), the reduplicated form 田丁田丁 tien‑ting tien‑ting aligns with Vietnamese thằng ('servant'), while 媚娘 méiniáng parallels vợlớn ('first wife'). Yet claims by early Mon‑Annamese researchers that one‑third of 28 basic “Tchung Miao” words were cognate with Vietnamese appear overstated. Many forms, such as 阿妹 amiem (SV amuội), 家奴 jianungườinhà (SV gianô), 家公 ch’ia kungôngchủ (SV giacông), and 家婆 ch’ia pubàchủ (SV giabà), are more convincingly explained as Chinese cognates. Recent Vietnamese scholarship likewise rejects a purely Mon‑Khmer origin, situating many of these items instead within the Dai‑Kadai (Tày‑Thái) sphere (Nguyen Ngoc San 1993).

For non‑specialists, these lists illustrate how sound changes diverged morphologically as words spread across languages, sometimes reduced to vocables regardless of meaning. Over centuries,  particularly during Vietnam’s millennium under Chinese rule, vocabularies shifted both diachronically and synchronically, with loanwords deeply embedded in the lexicon. This complicates efforts to distinguish true indigenous strata from borrowed layers when classifying genetic origins, especially between Chinese and Vietnamese, where forms are so closely aligned.

Acknowledging the existence of  "languages of China before the Chinese", the Sinitic‑Vietnamese hypothesis gains support from Luce’s 245‑item wordlist and similar compilations. By filtering out indigenous elements, Vietnamese basic vocabulary can be grouped into categories: (1) words with no Chinese connection, (2) cognates shared with Chinese and Mon‑Daic/Mon‑Khmer, (3) forms more closely aligned with Chinese than Austroasiatic or Daic‑Kadai, (4) items plausibly cognate only with Chinese and Vietnamese, and (5) fundamental lexemes absent from Mon‑Khmer lists but essential to any language.

Sample items include:

  1. Indigenous: tai (ear), mũi (nose), miệng (mouth), bốn (four), bảy (seven).

  2. Shared with Chinese: mắt (eye), tayshǒu (hand), gạodào (rice), sắttiě (iron).

  3. Closer to Chinese: tiếngshēng (sound), lửahuǒ (fire), nhàjiā (home).

  4. Exclusive parallels: goáguǎ (widowed), liềmlián (sickle), sôngjiāng (river).

  5. Core lexemes absent in Mon‑Khmer lists: uống 飲 (drink), khóc (weep), cườixiào (laugh), chuốijiāo (banana).

Although Vietnamese basic vocabulary aligns dominantly with Chinese, many forms also appear across neighboring languages. This suggests that shared elements with Mon‑Khmer are better explained as outcomes of prolonged contact, resettlement, and typological convergence, rather than as evidence of a single Mon‑Khmer origin. (L)


III) Linguistic evolution through colonial history: The case of Vietnamese

Determining the origin of a cognate shared between Sinitic-Vietnamese and Chinese is often a complex task. A key challenge lies in classifying a "Sinitic-Vietnamese word" when its etymology may trace back to either a Yue root or Archaic Chinese, especially in cases where the word reflects cognateness with both sources and has evolved into lexical variants or derivatives. For example, the character 牙 (yá), originally a Yue term meaning "tusk," later came to signify "tooth" in Chinese. Such transformations raise important questions about linguistic classification and the layered nature of historical language contact. (See Appendix G: Tsu-lin Mei's The Case of "ngà"

If a word is of Yue origin, should its Chinese counterpart be classified as a Yue loanword, or rather as a cognate of the same Sinitic-Vietnamese etymon linked to an indigenous "proto-Yue" or "Taic" linguistic family? This question arises not only in the case of 牙 (yá), originally meaning "tusk" in Yue and later "tooth" in Chinese, but also in other instances where native lexical roots diverge yet remain etymologically connected.

Take, for example, the reconstructed indigenous form */krong/, which appears cognate with both Vietnamese sông and Chinese 江 (jiāng; SV giang; Cant. /kong11/), all meaning 'river'. The glyph 江 is deeply embedded in Chinese vocabulary and likely traces its origin to an ancient Yue language of southern China. Interestingly, in modern Khmer, "krong" has shifted to mean "city", as seen in place names like Krong Siem Reap.

Phonologically, as mentioned, the etymological evolution from "krong" into Vietnamese sông, Sino-Vietnamese giang, Cantonese /kong11/, and Mandarin 江 (jiāng) reflects a shared structural pattern. These forms are built around a tonemic framework [C+V(+C)] that underlies both Vietnamese and Chinese lexicons. The tonal and morphemic features of Vietnamese lexemes such as /sowŋ11/ or /səwŋ11/ mirror the ancient root */krowŋ11/, suggesting a deep phonetic continuity.

By analogy, Mandarin 江 (jiāng) may also share ancestral ties with other river-related terms such as 水 (shuǐ, SV thuỷ), possibly of Tibetan tchu origin, and 川 (chuān, SV xuyên), all denoting "river" in various linguistic traditions, all fitting well into Lacouperie's description above.

This linguistic structure mirrors the way genetic heritage shapes physical identity, prompting questions like, “Is she Chinese or Vietnamese?” Metaphorically, the essence lies not in the bio-engineering that grafts Chinese branches onto the Yue tree—producing Vietnamese-like fruits, leaves, and flowers—but in the underlying bio-genome, as Charles Darwin emphasized in 1859. It is this genetic and cultural interweaving that defined the Taic and Yue-mixed Chu populations, including Sinicized individuals of Yue descent. A notable example is the forced marriages between local Yue women and northwestern Qin infantrymen during the Qin Empire’s southern expansion, many of whom later became subjects of the Han Empire, particularly in Jiaozhi Prefecture in northern Vietnam.

Vietnamese, as a language, has layered Sinitic elements atop a foundation of ancient aboriginal strata, with remnants of indigenous vocabulary still present. Its lexicon is heavily populated with Chinese loanwords, both in the Sino-Vietnamese and Sinitic-Vietnamese categories. A subset of the latter evolved from ancient Yue roots, which are also reflected in several southern Chinese dialects such as Cantonese, Fukienese (Hokkien), and Hainanese (see illustrations in the following sections). Despite a millennium of Chinese rule, Vietnamese underwent a profound Sinicization, transforming into a language rich in Chinese influence, which was considered as the most archaic Chinese dialect by Lacouperie, and, of course, not a creole or hybrid language in the strict linguistic sense like Creole of Albanian, that is composed almost entirely of borrowed vocabulary with only a few native words remaining (Bloomfield, 1933), Vietnamese retained a distinct structural and lexical identity.

Interestingly, many Yue-based words existed in ancient Annamese before their doublets re-entered the Vietnamese lexicon through later linguistic channels. Examples include: chuột vs. (子 , SV tử, 'rat'), vs. mùi (未 wèi, SV vị, 'goat'), trâu vs. sửu (丑 chǒu, SV , 'buffalo'), mèo vs. mẹo (卯 mǎo, SV mão, 'cat'), ngựa vs. ngọ (午 , SV ngọ, 'horse'), and heo vs. hợi (亥 hài, SV hợi, 'pig'). Some of these terms are also believed to have Austroasiatic Mon-Khmer origins, as evidenced by Khmer zodiac animal names that likely traveled via ancient trade routes through Annam.

On the sidenote, this cyclical linguistic phenomenon is comparable to how Japanese coined terms for modern Western concepts in the early 20th century, such as dânchủ (民主 mínzhǔ, 'democracy') and cộnghoà (共和 gònghé, 'republic'), etc., using Chinese morphosyllable. These terms later re-entered Chinese and eventually Vietnamese, completing a fascinating loop of cultural and linguistic exchange in our contemporary time.

Table 4: Is Vietnamese of Austroasiatic Mon-Khmer or Sino-Tibetan linguistic family?


James Campbell in Vietnamese Dialects once mocked my ignorance of linguistics but he states it best that

"I originally included Vietnamese in this study/website because of the fact its phonological makeup is very similar to Chinese and, indeed, its tonal system matches the Chinese one. Originally I wrote at this site: "Vietnamese is neither a Chinese language nor related to Chinese (It is an Austroasiatic > Mon-Khmer language more closely related to Khmer/Cambodian). Besides having a very similar phonological system, and due to the heavy Chinese influence on the language, it also has a tone system that matches the Chinese one." However, after reading and conducting a bit more research, it appears that Vietnamese affiliation with Việt-Mương, Mon-Khmer, and Austroasiatic, may in fact be a faulty case."

[...] [Vietnamese] may not be considered a Sinitic language or one of the Chinese dialects, but the Kinh have a lot in common with the Chinese culture, and the language leaves little to doubt. I will not go into great detail about how this is claimed, as a great deal has been posted at some other websites (see below [for study by dchph, the author of this very paper]) and that is not the purpose of this site. However, one can see that Vietnamese shares many traits in common with Chinese: 60-70% Sinitic vocabulary, another 20% of vocabulary is substrata of proto-Sinitic vocabulary, much of the grammar and grammatical markers share similarities with Chinese, along with classifiers. One would find it very difficult to draw similar parallels between Chinese and other Mon-Khmer languages. It seems that after considering all of this, what is left that is Mon-Khmer is actually very little, and probably acquired over time through contact with bordering nations. For example, the numbers are of distinct Mon-Khmer origin, however, used in many compound words, Vietnamese uses instead Chinese roots (as is common in the other Sino-Xenic languages, Japanese and Korean)." (X)

Let us delve further into the historical and geopolitical ties between China and Vietnam, particularly regarding northern Vietnam, which was once part of the Chinese imperial domain, often referred to as the Middle Kingdom, prior to the 10th century. When chronicling Vietnam’s early history, historians frequently relied on Chinese sources, especially in foundational texts such as ĐạiViệt Sửký Toànthư (Complete Annals of ĐạiViệt), compiled in 1479 under the Lê Dynasty by court historian Ngô Sĩ Liên at the behest of King Lê Thánh Tông. These records often referenced the region known as Giaochi (交趾, Jiaozhi), and many of the names of states, rulers, places, and peoples were rendered in Sino-Vietnamese terms—such as kings Hồng Bàng, Hùng Vương, and An Dương Vương—to narrate the origins of the ancient state of ÂuLạc. (A)

In seeking even more archaic layers of Vietnamese history, scholars have turned to folklore and legends that may correspond with early Chinese historical accounts. One notable example is the legend of Thánh Gióng, a mythical hero who is said to have defended the land against invaders from the Yin Dynasty (circa 1718–1631 B.C.), a narrative that intriguingly parallels Chinese records from the same era. (董)

Early Vietnamese history is deeply intertwined with that of China. Prior to gaining independence in 939 B.C., Vietnam’s historical narrative was largely shaped by events recorded in Chinese annals. In fact, ancient Vietnam, referred to as Annam, was never officially recognized as a sovereign state in Chinese historiography. For example, Zizhi Tongjian, compiled by Sima Guang and later translated into modern Chinese by Bo Yang in 72 volumes (1983–1993), treats Annam as an administrative region rather than an independent entity.

During the extended period of Chinese colonial rule from 111 B.C. to 939 A.D., there was only one brief episode of autonomy under the Early Lý Dynasty (544-602 A.D.). Even then, Vietnam was regarded as a vassal state. For much of its early history, Annam was viewed by Chinese authorities as a rebellious prefecture, and even after achieving sovereignty, it largely vanished from Chinese historical records.

Vietnam continued to be considered a part of the Chinese imperial system until the late Qing Dynasty. It wasn’t until the Treaty of Tientsin in 1885 when the weakening Manchu government formally relinquished its protectorate claims over Annam to France that Vietnam was finally acknowledged by name as a separate nation in official Chinese documentation.

In tracing the historical development of "the Yue of the South," or Vietnam, Chinese historical sources are indispensable. Unless Vietnamese historians choose to disconnect the pre-independence era, the narrative of Vietnam prior to 939, marked by a millennium of northern domination (北屬 時期), must be understood through Chinese chronicles. This period began in 218 B.C. when the First Emperor Qin Shihuang (秦始皇) incorporated the region of Giaochi, in what is now northern Vietnam, as a prefecture of the Qin Empire. It later became part of the NamViệt Kingdom under the Triệu Dynasty (207–111 B.C.), and subsequently continued as a Chinese administrative region, known as Giaochâu and later Annam, under successive dynasties, including the Han and Tang. The collapse of the Tang Dynasty in 907, which fragmented China into ten states, created the conditions for Vietnam’s emergence as an independent polity in 939.

Throughout this long colonial period, the history of ancient Annam was treated in Chinese records merely as "local chronicles" (地方誌). Sporadic uprisings and rebellions were expected and routinely suppressed, leaving no room in official Chinese historiography for the notion of sustained resistance. Nevertheless, Vietnamese historians often assert that Vietnam possessed its own historical and literary traditions, including two declarations of independence attributed to the ancestral Southern Yue State (NamViệt). These documents, though written in Chinese, even after independence, are seen as expressions of a distinct national identity. (I)

Many Vietnamese scholars believe that a significant portion of Vietnam’s historical records has been lost due to centuries of resistance and warfare. Some speculate that when Chinese forces withdrew from the region, they may have taken with them valuable texts and documents from their former colony of Annam. However, it is important to note that Giaochi—another name for the Chinese prefecture of Annam—was never recognized as an independent state in official Chinese historiography. As such, Chinese authorities had little reason to anticipate a full evacuation or to systematically remove cultural artifacts and historical records. For many Chinese officials and settlers, Annam was not a distant colony but a homeland—many had been born and lived there for generations, and the region was expected to remain under Chinese influence for their descendants to continue exploiting. (V) 

In practice, imperial mandarins were often more concerned with material wealth, such as gold taels and precious gems, than with preserving cultural heritage. Military generals, meanwhile, focused on securing their own power bases and estates. This scenario becomes even more plausible when viewed against the backdrop of the political fragmentation that followed the collapse of the Tang Dynasty in 907. During this chaotic period, the once-unified Middle Kingdom splintered into seven major states, each ruled by self-proclaimed emperors or kings. This disunion lasted for 72 years, until 979, creating a power vacuum that allowed Vietnam to assert its independence and begin shaping its own historical narrative.

Around the year 939, while the Chinese mainland was embroiled in violent conflicts among rival warlord factions, the Annam Prefecture, then part of the Qinghaijun Military Zone (清海 軍區), stood out as a relatively stable and prosperous enclave. It functioned as a kind of "home away from home", maintaining business as usual amid the chaos. Despite Annam’s de facto sovereignty at the time, Chinese rulers and historians continued to treat it as a renegade prefecture, much like the way Taiwan or even Hong Kong are viewed in certain modern contexts.

This perception was reinforced by the fact that many Chinese colonial officials and their families, appointed by the imperial court of the NamHan State (南漢王國, 917–971), which governed regions corresponding to present-day Guangdong, Guangxi, Hainan, and northeastern Vietnam, chose to remain in Annam rather than return to the increasingly unstable mainland. Notably, the Nam Han regime was heavily influenced by eunuchs, and high-ranking officials were often required to undergo castration to qualify for government service. According to Bo Yang (Vol. 72, p. 160, 1993), the court was populated by as many as 20,000 eunuchs, underscoring the unique political dynamics of the period.

Table 5 - A cultural prelude to the Fourth Chinese domination of Vietnam

Before the Ming invasion, the Hồ Dynasty launched a bold cultural reform movement aimed at affirming Vietnamese identity. During the combined seven-year reigns of Hồ Quý Ly (1400) and his son Hồ Hán Thương (1400–1407), the dynasty actively promoted the use of Vietnamese language and customs, banning Chinese script and administrative practices from official governance. Notably, Hồ Quý Ly traced his ancestry to Zhejiang Province in China—a fact that underscores a broader historical pattern: many Chinese emigrants in Vietnam, including Hồ’s lineage, had long distanced themselves from mainland China and embraced a strong sense of Vietnamese national identity.

This cultural assertion was abruptly disrupted by the onset of the Fourth Era of Northern Domination (Bắcthuộc lần thứ tư), which lasted from 1407 to 1428. After defeating the Hồ Dynasty in 1406–1407, the Ming Dynasty annexed Vietnam as the province of Jiaozhi (Giaochỉ). Unlike earlier periods of Chinese rule, collectively referred to as Bắcthuộc, which spanned nearly a thousand years, this fourth occupation ended with the establishment of the Lê Dynasty in April 1428, marking a new chapter of Vietnamese sovereignty.

Under Ming control, the Hồ Dynasty’s cultural reforms were systematically dismantled. Vietnamese printing blocks, books, and cultural artifacts were confiscated or destroyed, resulting in the near-total disappearance of vernacular chữNôm texts from the pre-invasion period. Historic sites such as the Baominh Pagoda were looted and desecrated. The Ming administration imposed aggressive Sinicization policies, seeking to embed Chinese cultural norms more deeply into the occupied territory and suppress indigenous Vietnamese expression.




Figure 4.1: Jiaozhi (northern Vietnam) when it was under Ming occupation.


Figure 4.2 : Administrative division of Vietnam (Jiaozhi)
under the Ming Dynasty from 1407 to 1427.

Sinicization process

An entry in the Ming Shilu (明實錄) dated 15 August 1406 recorded an imperial order from Emperor Yongle that instructed for Vietnamese records such as maps and registers to be saved and preserved by the Chinese army. In addition, according to Yueqiaoshu (越嶠書, SV Việtkiệuthư), on August 21, 1406, the Yongle Emperor issued an order to Ming soldiers in Annam:

"Once our army enters Annam, except Buddhist and Taoist text; all books and notes, including folklore and children book, should be burnt. The stelae erected by China should be protected carefully, while those erected by Annam, should be completely annihilated. Do not spare even one character."

On the 21st day of the 5th lunar month of the following year, Emperor Yongle issued another order to Ming soldiers in Annam:

"I have repeatedly told you all to burnt all Annamese books, including folklore and children books and the local stelae should be destroyed immediately upon sight. Recently I heard our soldiers hesitated and read those books before burning them. Most soldiers do not know how to read, so it will be a waste of our time. Now you have to strictly obey my previous command, and burn all local books upon sight without hesitation."

The Chinese colonists promoted Ming Confucian ideology, bureaucratic practices, and Classical Chinese study among the local Vietnamese people, forcing them to wear Chinese-style clothes. The Ming forbade local customs such as tattooing, unmarried boys and girls wearing short hair, and women wearing short skirts, in "order to change customs in conformity with the north." Cultural incorporation was pursued with the new Jiaozhi administration advising the Ming court:

"The Yi people of Annam venerate the law of the Buddha, but do not know to worship or sacrifice the spirits. We should establish altars for sacrifice to the spirits of the wind, clouds, thunder and rain... so that the people become familiar with the way to express gratitude to the spirits through sacrifice."

In 1416, a large number of Confucian school, Yin-Yang schools and medical schools were established within the province. Examinations for local bureaucracy were formalized in 1411. Chinese mourning rites and mourning leave were instituted among the official of Jiaozhi in 1419. For the first time, Đại Việt experienced the sustained influence of Neo-Confucian ideology, which not only included the traditional doctrines of filial piety but also demanded an "activist, state-oriented service" based on officials' absolute loyalty to the dynasty and on the moral superiority of the "civilized" over the "barbarian" as the Ming viewed the Vietnamese as barbarians. Yongle brought Vietnamese students to the National Institution at the Ming capital and appointed more natives to the minor local offices in Jiaozhi. The Ming also destroyed or brought to the north many Vietnamese vernacular writing, historical and classic texts. 

After regained independence, Vietnamese monarch Lê Thánh Tông issued royal edict in 1474 to forbid Vietnamese from adopting foreign languages, hairstyles and clothes like the Laotians, Chams and the Ming Chinese, abolished the Ming forced customs. The Mongol, Cham, and Ming invasions of 13th-15th centuries destroyed many Vietnamese important sites, buildings, artifacts, and archives of the Postclassical period. 

Source: Wikipedia.org


As it has been repeatedly emphasized throughout this paper, to understand the development of the Vietnamese language, it must be examined through the lens of historical dynastic events. Ancient Annam, as a colony of China, served as a site for resource extraction by Chinese settlers, often at the expense of the indigenous population. Colonial consequences included displacement from ancestral lands and disruption of local economies, rendering native communities minorities in their own homeland. The ruling powers dictated cultural and linguistic norms, often through oppressive policies that sparked resistance. These dynamics significantly shaped the trajectory of the Vietnamese language, as colonial authorities determined what was taught and spoken.

For over 1,600 years, Annam was under strong Chinese linguistic influence. This influence intensified during the 20-year Ming occupation beginning in 1407, when the invaders implemented policies aimed at eradicating local culture, including the destruction of local chữNôm literary works. Ironically, after Vietnam regained independence in 1427, the post-Lê Dynasty monarchs largely returned to the same Sinicized cultural and linguistic practices. Classical Chinese (Wenyanwen, 文言文) remained the official language of scholarship and governance, and many court scholars embraced Chinese literary and colloquial forms wholeheartedly. Indigenous scholarship was often marginalized, and local intellectual contributions were viewed with condescension.

This cultural orientation persisted until the early 20th century, when French colonial influence introduced Western ideas and the French language into national examinations. Alongside classical Chinese, the Romanized Vietnamese script known as Quốcngữ was adopted in 1909 by the colonial government and further institutionalized in subsequent years (1910, 1912, etc.) with bulletins, newspapers and magazines.

The fascination with Chinese culture extended beyond elite circles, though. As Nguyễn Thị Chân-Quỳnh notes (1995, pp. 110–111), quoting Nguyễn Văn Xuân’s Phongtrào Duytân (1970), even rural villagers continued to place Chinese-scripted papers (ChữNho, 儒字) in sacred spaces as late as 1970, while printed materials in Latin script were sometimes used as toilet paper., which is a taboo with tissues printed with Chinese glyphs. Today, this legacy is viewed critically by nationalist scholars, who see such cultural deference as a betrayal of Vietnamese identity.

In the pursuit of understanding Chinese etyma that share cognates with Vietnamese basic vocabulary, it is essential for historical linguists, especially home-grown Vietnamese scholars, to reengage with classical Chinese sources. Texts such as the Guangyun (廣韻) and Kangxi Zidian (康熙字典), along with modern Western research on Sino-Tibetan and Old Chinese linguistics, offer valuable insights. Without such scholarship, the presence of northern Chinese dialectal features in Vietnamese, especially in everyday speech, remains poorly understood.

Many Vietnamese words reflect northern Chinese colloquialisms once spoken by the general populace of the Middle Kingdom. Examples include mainày (明兒 míngr, "tomorrow"), lúcnào (牢牢 láoláo, "always"), luônluôn (老老 láoláo, "constantly"), khôngphảisao? (可不是 kěbùshì, "isn't it so?"), and chịukhôngnổi rồi! (受不了 了. Shòubùliǎo le!, "I can't take it anymore!"), or khôngdámđâu! (不敢當! Bùgǎndàng!, "It wouldn’t dare!!"), etc. These phrases are characteristic of northern Mandarin dialects.

While the precise period when such colloquialisms entered everyday Vietnamese speech remains unclear, what is evident is that Vietnamese has shown a stronger tendency to absorb Mandarin influences over those from other Chinese dialects such as Cantonese or Hokkienese. This linguistic affinity reflects historical patterns of contact, migration, and governance that favored northern Chinese dialectal exposure over southern variants.

Historically, most Chinese rulers hailed from northern regions, including those of Altaic Turko-Mongol origin. Their capitals were typically located in the north, such as Nanjing (南京) in the lower Yangtze Basin and later Beijing (北京), despite its harsh climate and frequent dust storms from the Gobi Desert.

Table 4: Comparative Integration of Chinese in Sino-Centric Vietnam, Korea, and Japan

The enduring distinction between Chinese identity and that of Koreans and Japanese remains evident today. Despite generations of residence, individuals of Chinese descent living in South Korea and Japan are still often regarded as outsiders, highlighting the deep-rooted cultural boundaries that persist across East Asia.

In contrast, Vietnam, though subjected to the longest period of Chinese domination, underwent a profound process of Sinicization that left an indelible imprint on its people and language. Unlike Korea and Japan, which have maintained distinct national identities despite historical Chinese influence, Vietnam absorbed many Sinitic elements into its cultural and linguistic fabric. This contrast is particularly striking given that northern Chinese populations themselves include significant Altaic-origin groups, much like the ethnic diversity found between the Yue and northern peoples. Yet, Korea and Japan resisted assimilation more firmly, while Vietnam’s historical trajectory reflects a more complex and enduring entanglement with Chinese civilization.

Chinese Diaspora in Korea and Japan

In South Korea and Japan, ethnic Chinese communities have historically faced social and legal barriers to full integration. Despite generations of residence, many remain classified as foreigners or permanent residents rather than citizens. This reflects strong national identities in both countries, where cultural and ethnic homogeneity has long been emphasized. Even Chinese surnames and heritage can mark individuals as outsiders, regardless of how long their families have lived there.

Chinese Assimilation in Vietnam

Vietnam presents a contrasting case. Chinese immigrants—especially those who arrived during earlier dynasties—have largely assimilated into Vietnamese society over generations. By the third generation, many Chinese-Vietnamese families are culturally indistinguishable from ethnic Kinh Vietnamese. This is partly due to centuries of shared history, intermarriage, and linguistic blending, especially during periods of Chinese rule and influence.

Vietnamese Surnames and Chinese Origins

Many Vietnamese surnames have Chinese origins. Common surnames like Trần (陳), Lê (黎), Nguyễn (阮), and Phạm (范) are derived from Chinese characters and were adopted during periods of Sinicization. Even surnames found among ethnic minorities, such as Phạm among Chamic people or Thạch among Khmer-Vietnamese, can trace their roots to Chinese linguistic influence.

However, there are exceptions. Some indigenous Vietnamese surnames, especially among ethnic minorities like the Hmong, Muong, or Tay, may not have Chinese etymology, though they are fewer in number.

Cultural Identity vs. Ethnic Lineage

The deeper point here is that cultural identity often diverges from ethnic lineage. A person may carry a Chinese surname but identify fully as Vietnamese, just as someone in Korea or Japan may be ethnically Chinese but culturally distinct. Vietnam’s long history of absorbing and localizing foreign influences, whether Chinese, French, or Cham, has created a uniquely syncretic national identity.

The linguistic convergence between Vietnamese and northern Chinese colloquialisms is likely the result of centuries of interethnic contact, particularly during periods of Chinese colonial rule. Beginning in 111 BCE, waves of Chinese immigrants—including officials, soldiers, and their families—settled in Annam. Over time, many intermarried with local populations, creating a rich tapestry of cultural and linguistic exchange that continues to influence Vietnamese speech today.

Ethnically, approximately 84% of Vietnam’s population belongs to the Kinh majority, commonly recognized as ethnic Vietnamese. This group is believed to be descended from a blend of early Yue populations, who once inhabited a vast region stretching from Lake Dongting in Hunan Province down to northern Vietnam, and, historically, Han settlers who migrated southward during the Han Dynasty. The Kinh also absorbed influences from territories formerly part of the Nanzhao Kingdom (738–902) and later the Dali State (937–1253), both located in what is now Yunnan Province. Vietnam’s present-day northwestern region, home to a significant concentration of Daic-speaking communities, reflects this layered ancestry.

The remaining 16% of the population comprises 54 officially recognized ethnic minority groups. These communities are primarily located in Vietnam’s mountainous northern regions and along the western highlands, extending from north to south. Among them are Mon-Khmer and Cham groups, many of whom inhabit the eastern coastal lowlands, territories historically seized from the former kingdoms of Champa and Khmer between the 12th and early 20th centuries. The Mon-Khmer peoples, often referred to as montagnards, and the Cham minority, descendants of the once-powerful Champa Kingdom, carry legacies of resistance and survival. Notably, the Cham endured devastating persecution in the 18th century following uprisings against Vietnamese rule, particularly under Emperor Minhmạng of the Nguyễn Dynasty.


Figure 5: Map of the Dali State in 1142




(Source: https://upload.wikimedia.org/wikipedia/vi/d/d5/China_11b.jpg)

From an anthropological standpoint, it is possible to distinguish between two major waves of Chinese migration to Vietnam. The first wave consisted of Han settlers who arrived in ancient times and gradually intermingled with indigenous populations, contributing to the formation of the early Kinh ethnic majority. These settlers became part of Vietnam’s foundational demographic, blending Yue and Han ancestry with local aboriginal groups.

The second wave involved more recent Chinese immigrants who arrived after World War II, following Generalissimo Chiang Kai-Shek’s troops into Vietnam to oversee the disarmament of Japanese forces. These latecomers, including Hainanese and Fukienese communities, settled predominantly in lowland and coastal cities such as Huế, Đànẵng, Faifo (Hội an), Tamkỳ, Tamquan, Bồngsơn, Quynhơn, and Tuyhoà. Collectively, they formed the Hoa (華) ethnic minority, which numbered around one million people. During the 1960s, many were compelled to adopt Vietnamese citizenship under policies enacted by the southern government. Over time, these communities became increasingly assimilated into the Kinh majority.

Chinese refugees in later period, particularly those who fled by sea after the fall of the Ming Dynasty to the Manchurian Qing, played a significant role in developing southern Vietnam. They helped establish towns in six newly formed provinces, including Hàtiên, Bạcliêu, and Sàigòn (historically known as 西岸 Xī'àn), regions that had previously belonged to the Khmer Kingdom before the Kinh expansion in the late 18th century.

Further evidence of assimilation can be seen in the transformation of descendants of early Chinese immigrants. Many are no longer classified as ethnic Chinese in official records, having fully integrated into the Kinh majority. This shift accelerated after the mass exodus of Chinese-Vietnamese refugees in 1979, triggered by rising tensions ahead of the Sino-Vietnamese border war. Between 1979 and 1996, approximately 400,000 Chinese-Vietnamese fled Vietnam by boat, eventually resettling in countries such as the United States, Canada, and other Western nations. Those who remained in Vietnam continued to assimilate, contributing to the evolving cultural and ethnic landscape of the nation. (H)  

Let us consider further examples illustrating the integration of Chinese-descended individuals into Vietnam’s Kinh majority. Historical figures such as King Hồ Quý Ly and Governor Phan Thanh Giản, along with countless unsung heroes and ordinary citizens, reflect the deep-rooted contributions of Chinese ancestry within Vietnamese society. In modern times, a notable segment of the Hoa minority—ethnic Chinese in Vietnam—has risen to prominence in the entertainment industry. Celebrities like Trấn Thành, Đàm Vĩnh Hưng, and Quách Thành Danh exemplify this trend, alongside many others in politics and public life.

While the number of renowned figures is statistically small—perhaps one in a million—their visibility underscores the remarkable extent to which descendants of Chinese immigrants have become fully integrated into Vietnamese culture. This phenomenon reflects not only assimilation but also the flourishing of hybrid identities within the broader Kinh population.

Historically, the Kinh people emerged from early waves of Chinese immigrants who followed Han invaders into Annam beginning in 111 BCE. Over the next millennium, these settlers intermarried with indigenous populations, forming the foundation of the Kinh ethnicity (京族). The term "Kinh" originally referred to urban dwellers in the Red River Delta and coastal lowlands, regions where Han colonialists first established military and administrative outposts. Many of these settlers remained permanently, especially after the collapse of the Tang Dynasty in 907 CE.

The concept of a "millennium" is central to Vietnam’s historical evolution. After gaining independence in 939 A.D., the newly sovereign Annam began its own expansionist phase, mirroring the earlier Chinese colonization. Over the next thousand years, Annamese settlers moved southward, annexing territories from weakened neighbors through conquest and resettlement. This expansion led to intermarriage with Chamic and Khmer populations, producing a new ethnocultural blend. The linguistic and racial integration that followed contributed to the development of Southern Vietnamese dialects, which differ markedly from the speech patterns of the northern population shaped over 2,200 years.

By the early 18th century, Annamese settlers had reached present-day Rạchgiá Province, where they intermingled not only with Khmer communities but also with descendants of Chinese refugees led by Marshal Mạc Cửu. These refugees, fleeing the fall of the Ming Dynasty and Manchu rule, were granted resettlement by the Nguyễn monarchs. The resulting mixed populations became indistinguishable from the Kinh majority, both in appearance and cultural identity, likely influenced by the shared equatorial climate and centuries of integration.

In sum, the Kinh majority is composed of six primary ethnic stocks: Taic, Yue, Chinese, Daic, Chamic, and Khmer. While Cham and Khmer heritage is celebrated for its monumental cultural contributions, the Chinese component is often overlooked in academic discourse. By the late 19th century, as Annam’s population reached 20 million, historical events had further shaped ethnic identities. The Chamic population, for example, was diminished due to persecution under Emperor Minhmạng, who targeted them for their past support of the Tâysơn rebellion. Many Cham and Khmer individuals reclassified themselves as Kinh to avoid discrimination and violence, especially after their ancestral territories were annexed into Southern Vietnam.

Despite the prestige of Cham and Khmer cultural legacies, Vietnamese scholarship has often sidestepped the Chinese racial influence, perhaps due to political sensitivities surrounding nationalism. Yet, given Vietnam’s thousand-year history as a prefecture of the Chinese empire, such influence is both inevitable and profound. Comparisons with other former colonies, such as Ireland under England, Mexico under Spain, or the assimilation of Yue populations in Guangdong and Fujian into Han Chinese identity, highlight Vietnam’s parallel experience. After centuries under the rule of a dominant neighbor, Vietnam, too, underwent deep assimilation. Recognizing this reality is not only historically accurate but essential to understanding the nation’s complex identity.

Figure 6: Bảngiốc Waterfall over the river


Artistic render of the Taic-Yue-Chin-Chamic-Khmer cascades of the modern Vietnamese language.
(Source: modified from a photo of Bangioc Waterfall, half of it was overtaken by the Chinese)

To make the complex historical and anthropological narrative more intuitive, especially for those with a visual or artistic mindset, let’s reimagine the entire rationalization as a watercolor landscape.

Picture an imaginary map of Vietnam, painted in cascading layers of ink. At the top, a dark hue represents the earliest origins, gradually fading into lighter tones as it flows downward. This image resembles a multi-tiered waterfall, where each cascade symbolizes a phase in the region’s ethnocultural evolution.

The uppermost cascade represents the early Chinese settlers, subjects of the Han Empire, who themselves descended from diverse populations of ancient states like Qin, Chu, Wu, and Yue. Many of these groups, particularly the Chu State (楚國, c. 1030–223 B.C.), were composed of Taic or proto-Daic (先傣) peoples, often referred to as "Malay" by Vietnamese scholar Bình Nguyên Lộc in his 1972 work Nguồn-gốc Mãlai của Dân-tộc Việt-nam ("The Malay Origin of the Vietnamese"). His thesis, supported by early 20th-century scholars like Phan Hữu Dật and echoed by Lacouperie’s identification of Shan-Taic origins, suggests that these pre-Yue populations formed the ethnolinguistic bedrock of the region.

As the water flows downward, it becomes muddied, symbolizing the infusion of foreign elements, such as the proto-Chinese nomadic horsemen who conquered the ancient mainland. This mixture continues through successive cascades, blending Han and Yue lineages with other ethnic influences like Cham and Khmer. By the time the stream reaches the bottom pool, representing the Annamese population, it has absorbed a rich array of cultural and genetic elements.

This final pool embodies the ancestral composition of the Vietnamese people and other southern populations in China who also trace their roots to the Yue. The result is a deeply integrated racial and cultural mosaic, shaped by centuries of migration, conquest, and intermarriage.

The theory presented above is grounded in Chinese historical documentation. Naturally, this Sino-centric interpretation may be met with resistance by Vietnamese nationalists, as it challenges deeply rooted beliefs in a racially and linguistically "pure" Vietnamese identity. Such discomfort often arises from differing conceptions of origin and cultural heritage.

In tracing the history of a language and its speakers, one must decide whether to rely on mythological narratives or historical evidence. If the latter is chosen, it becomes evident that the ethnic makeup of Vietnam’s Kinh majority is the result of centuries of interethnic blending, primarily between indigenous Yue populations and Han migrants. This fusion was largely driven by China’s steady southward expansion, which displaced Yue communities and pushed them into new territories.

Ethnographic data from northern Vietnam suggests that this migratory and integrative process persisted well into the 20th century. The demographic evolution closely parallels that of the Han Empire after its annexation of the NamViet Kingdom in 111 B.C., when diverse groups were absorbed into the imperial system. Viewed through this lens, Vietnam’s ethnogenesis reflects a similar arc of cultural amalgamation and historical continuity. (T) 

It is important to recognize that there is no singular entity known as the "Chinese race." Rather, what exists is Chinese culture and the diverse populations who have adopted and contributed to it over time. The people identified as "Chinese," both before and after the Han Dynasty, are in fact of racially mixed origins, descended from various groups across the northern and southern regions of ancient China.

This diversity began with the unification of China under Qin Shihuang, the First Emperor of the Qin Dynasty, who established the foundation of what is now known as China. The Qin Empire incorporated:

(a) The populations of six previously independent states, whose ancestral lineages were likely distinct from those of the Qin heartland in present-day Shaanxi;

(b) The original Qin populace, descended from proto-Tibetan nomadic horsemen;

(c) Ancient northern tribes of non-Taic origin from the Shang and Xia periods, including groups with Altaic and Turkic ancestry located in regions such as Shanxi and Shandong;

(d) Southern populations from earlier states that had paid tribute to the Western Zhou kings, whose lineage traced back to Hunan in southern China.

As the Qin Empire expanded southward, it absorbed and intermingled with indigenous Yue communities. This process of integration continued through successive dynasties, multiplying the population with further Yue tribal groups as their territories were annexed into the growing geopolitical entity known as the Middle Kingdom (中國).

Thus, the Chinese identity emerged not from a single racial origin, but from a complex and evolving synthesis of cultures and peoples across a vast and diverse landscape.

Figure 7: Map of the Zhou Dynasty

In the aftermath of the Qin Dynasty’s brief reign, the empire plunged into turmoil. Among the contenders for power, the revived Chu State fiercely challenged for control. Ultimately, victory fell to Liu Bang, the founding emperor of the Han Dynasty, whose rise was supported by a cadre of generals, many of whom were former subjects of Chu. These leaders, like Liu Bang himself, traced their lineage to Taic ancestors of the Yue, underscoring the deep ethnic and cultural continuity between the Chu revival and the early Han establishment.


In Vietnamese, the term Tàuô is believed to have originated as a reference to the Qin (秦, Tần) people, who were traditionally known for wearing black garments. This association gave rise to the compound 秦烏 (Qinwu), rendered in Sino-Vietnamese as Tầnô, referring to the remnants of other ancient states that were absorbed or destroyed by the Qin conquest.

However, there is also an anecdote that from the Yan Danzi (燕丹子), a classical Chinese text that dramatizes events from the Warring States period. Toward the end of that era, Crown Prince Dan of the Yan State was held hostage in Qin. When he requested to return home, the King of Qin mockingly replied, “Only when a crow’s head turns white and a horse grows horns will I allow it.” In despair, Prince Dan looked to the heavens and lamented. Miraculously, the crow’s head turned white and the horse sprouted horns, signs interpreted as omens of fate bending to his will. As a result of this tale, crows came to be referred to as Qinwu (秦烏), literally “Qin crow,” symbolizing the improbable and the prophetic. This story not only reflects the tension between Yan and Qin but also illustrates how myth and metaphor shaped linguistic expressions in classical Chinese culture.

In any cases, it is plausible that those displaced populations referred to the Qin invaders with a term resembling Tàuô (/taw²o¹/), which eventually evolved into the Vietnamese word 'Tàu'. Phonologically, the term Tàu emerged through a process of sound sandhi. Specifically, the nasal ending /-n-/ of the first syllable was clipped and merged with the rounded vowel /-o-/ of the second syllable, resulting in a glide or semi-vowel /-w/ at the end. Thus, the transformation can be represented as: "Tàu" <~ [/tã-/ + /-wo/], where the contraction of /-wo-/ to /-w/ reflects a natural phonetic shift. Over time, Tàu came to be used as a colloquial and sometimes pejorative term for Chinese people, had both historical root of hatred for Qin and linguistic evolution.

The Han Empire, as a continuation of the unified Middle Kingdom established under the Qin, expanded its territorial reach and absorbed a vast population drawn from the subjects of previously independent ancient states. Racially and linguistically, the Han identity evolved atop a demographic foundation that had already included the populace of the former Chu State, later enriched by Yue elements from the annexed NamViet Kingdom. From this synthesis emerged the people known as “Han,” encompassing not only the core regions of the empire but also newly acquired territories corresponding to present-day Guangdong and Guangxi provinces. In effect, all inhabitants residing within the Han imperial domain from that point forward were designated as Han, analogous to how individuals born within the United States are considered American, regardless of ancestral origin.

The formation of the Han populace was thus the result of a complex amalgamation: original subjects who had once constituted the multi-state demographic landscape of the Eastern Zhou Dynasty, those absorbed following the collapse of the Qin Empire, and additional groups from north of the Yangtze River blended with Yue communities from the southern reaches of China.

Following the Han Empire’s annexation of the NamViet Kingdom in 111 B.C., the racially heterogeneous Han population from China South began a sustained migration into the southeastern frontier of the empire, specifically, the northeastern region of present-day Vietnam, where the newly established Giaochỉ prefecture ('Giaochâu') was situated. Among the earliest waves of Han colonists and their accompanying infantry units, many were of BáchViệt (百越 BaiYue) origin, previously residing just south of the Yangtze River (楊子江 Yángzǐjiāng) in areas corresponding to modern Hubei and Hunan provinces. Displaced from their ancestral homelands, these groups were resettled in distant locales such as the Red River Basin (Đồngbằng SôngHồng) of northern Vietnam. For many, the relocation became permanent, largely due to the absence of means or opportunity to return.

It was not of any secret that the Han soldiers were those of the wretched poorest who had no means of making a living so they join the army. In Chinese, there is an old saying that goes, "好男不當兵, 好鐵不打釘" (Good men don't join the army; good iron is not for making mails.) The idiom is so cited here just to emphasize the fact that out of hundreds of thousands of Chinese solders who went to colonize ancient Annam only a few would be able to make it back home. As a members of "the elite ruling class", their life would better off resettling in the fertile land of Annam.

In tandem with the prolonged southward campaign of Han imperial forces, successive waves of exiled officials, their families, and refugees, many fleeing the ravages of war and famine, followed behind. These groups migrated in large numbers and eventually established permanent settlements in the newly annexed territories, which were later formally designated as Annam Đôhộphủ (安南 督護府, "Southern Pacification Protectorate Prefecture") under Tang administration, a designation that remained in use until the dynasty's decline.

Over time, many of these settlers expanded into the lower-elevation agricultural zones of the southeastern basin, particularly in areas corresponding to present-day Vĩnhphúc and Hoàbình provinces in northern Vietnam. There, they resettled permanently, often due to the lack of means or opportunity to return to their ancestral homelands. Over the course of the following millennium, these colonial migrants and their descendants came to form the demographic foundation of the Kinh majority population in the newly independent polity of Annam.

From the outset, the presence of Han colonists in the ancient Annamese heartland exerted pressure on native inhabitants, ethnographically classified as Vietmuong, who were gradually displaced into remote mountainous regions. These communities, now recognized as the Mường ethnic minority, remain concentrated in Hoàbình Province and are counted among the 54 officially designated national minority groups in Vietnam, many of whom continue to reside on the periphery of their ancestral lands.

Meanwhile, indigenous populations who remained in urban centers and fertile lowland townships often engaged in cooperation with the Han settlers. The integration of Han newcomers during successive waves of southern expansion took root in these resettled zones, where intermarriage with local women became increasingly common. Over generations, this process gave rise to a racially mixed population born in Annam, individuals historically referred to as Annamites (安南 居民), who would become the forebears of the modern Kinh people, often described as the "metropolitans" of the region.

Figure 8: Map of the Han Dynasty


Map of the Han Dynasty
(Source: http://en.wikipedia.org/wiki/Han_Dynasty)

Continual flow of migration out of the mainland of China in search of a better life elsewhere in other countries has always been a part of Chinese history.


In contemporary times, youth raised in newly established Vietnamese immigrant families across Western nations—such as the United States, Germany, and France—exhibit notably robust physiques, largely attributable to improved nutritional intake. Physically, they stand in marked contrast to their parents' generation, growing up taller, sturdier, and with noticeably lighter complexions. These traits reflect the biological inheritance of northern genetic stock, distinct from the phenotypic characteristics associated with Austroasiatic and Austronesian populations.

Specifically, this divergence sets them apart from groups of Mon-Khmer origin, as well as the Cham, who share genetic affinities with the Li minority on China's Hainan Island, and other southern populations such as the Malay, Filipino, Indonesian, and Polynesian peoples. The contrast underscores the complex interplay between environmental factors and inherited genetic lineages in shaping the physical development of diasporic communities.

As a matter of fact, we could state with certainty that more than 99 percent of the Kinh population today bear Chinese surnames. This phenomenon reflects a long-standing pattern of cultural and demographic integration, comparable to the racial assimilation policies implemented during the brief Qin Dynasty, which issued an imperial decree compelling over 30,000 local women to marry its soldiers. Much of this topic has already been addressed in the preceding chapter on political dynamics. In short, the lack of open discourse surrounding this issue stems either from domestically trained scholars adhering to politically correct narratives or from segments of the Vietnamese public who conflate national pride with historical denial.

To better understand this phenomenon through an anthropological lens, one may compare the origins of the Vietnamese people with similar developments in other nations that have followed a comparable trajectory, namely, the formation of multiethnic states irrespective of specific ancestral origins. This is precisely how the Annamese identity began to take shape some 2,225 years ago.

For instance, contemporary Asian history offers instructive parallels: all three consecutive prime ministers of Singapore and every president of Taiwan have ancestral roots in mainland China, specifically Fujian Province. Yet, they proudly identify as Singaporean and Taiwanese, respectively, in ways that align seamlessly with their national identities. Table  5 below illustrates the parallel trajectories of Vietnam and Taiwan. What transpired in Vietnam more than two millennia ago is, in many respects, unfolding in Taiwan today.

Of course, when drawing such analogies, one must account for the influence of modern technological factors, such as transportation, communication, and linguistic orthography, which can be excluded from the comparison. These modern elements, by their very nature, help preserve the consistency of standard pronunciation and inhibit the natural evolution of language over time.

Table 6: Taiwanese Identity

Of the 23 million people in Taiwan, 98% are descendants of ethnic Han Chinese immigrants who migrated from China from the 17th to the 20th century. Of these, around 70% are descended from immigrants from Fujian and identify themselves as Hoklo whilst 15% are Hakka from Guangdong (Canton) and also Fujian. The ancestors of these people were laborers that crossed the Taiwan Strait to work on plantations for the Dutch. It is believed that these male laborers married local aborigine women, creating a new ethnic group of mixed Chinese and aborigine people. It is these descendants who identify themselves as Taiwanese and increasingly reject their identity as Chinese. The reason for this lies to a great extent with the authoritarian rule of the foreign Kuomintang (KMT) which fled mainland China during the Chinese Civil War and set up government in Taiwan. There was martial law that lasted four decades and was discriminatory against the existing inhabitants of Taiwan. Mandarin, a foreign language, was imposed as the national language (國語) and all other languages were made illegal. The harsh rule over Taiwan was lifted in 1988 and began a new era in Taiwanese history when Lee Tenghui, a Taiwanese, became president. The first transition of power from the China-centric KMT occurred in 2000 when Taiwanese Chen Shui-bian of the Democratic Progressive Party won the presidential elections. He made efforts to push for Taiwan independence with statements that there are two nations across the Taiwan Strait; a push for plebiscite on independence; and the abolishment of the National Unification Council. Taiwanese opinion on independence is split between the northern and southern half of Taiwan which interestingly also divides the "mainlander" (外省人) in the north from the "Taiwanese" (本省人) in the south.

Source: http://www.taiwandna.com

Consider the projected number of children born to over 180,000 Vietnamese women—recorded as of 2018—who married local husbands in Taiwan over the past 35 years. Excluding the post-1949 arrivals, most of these husbands are descendants of fully Sinicized Fukienese {X2Y3Z4H} (交) immigrants who settled from mainland China beginning in the 17th century. The population resulting from these unions may now exceed the estimated 900,000 inhabitants recorded in the Han Dynasty’s census of the Giaochâu (交州 Jiaozhou) prefecture some 2,000 years ago. (M) 

In terms of ratio of racial composition, the demographic balance between these two populations, ancient Annamites and contemporary Taiwanese, could be considered comparable. The former, of Chinese descent in Annam, were referred to as "Annamites," while the latter are known as "Taiwanese," each speaking a Sinicized variant of their respective languages and incorporating varying proportions of indigenous ancestry. In this context, national identity is shaped less by ancestral origin and more by birthplace. Accordingly, the term "Taiwanese" encompasses both the "mainlander" (外省人) and the "native Taiwanese" (本省人), united in their shared commitment to preserving Taiwan’s de facto sovereignty, even as formal independence remains unrealized, much like Vietnam’s historical trajectory toward nationhood.

Although many Vietnamese carry Chinese ancestry, national pride remains firmly anchored in a long-standing tradition of resistance to foreign domination, particularly in response to successive waves of Chinese imperial expansion. This enduring sentiment is exemplified by Vietnam's remarkable military record, most notably its unprecedented victories over Mongol invasions on three separate occasions during the 13th century, at a time when the Mongols had already conquered and ruled China for nearly a century. For over the past millennium, successive generations have made profound sacrifices to preserve the nation's autonomy.

With the exception of Vietnam's historical southward expansion, marked by the gradual annexation of territories formerly belonging to the now-extinct Champa kingdom and segments of the Khmer domain, the country's sustained sovereignty stands as a compelling model for other stateless peoples. Both the Tibetan and Uyghur communities continue to seek the restoration of their respective homelands, which remain under Chinese control. Their contemporary struggle echoes the centuries-long experience of ancient Vietnam prior to its emergence as an independent polity in the 10th century.

From a linguistic standpoint, it is often sufficient to examine the political, cultural, and historical dimensions of a language in isolation, particularly when the languages in question, such as Mon-Khmer and Vietnamese, evolved independently for most of their histories. What once belonged to the Khmer linguistic sphere eventually became part of the Vietnamese domain, and vice versa. This dynamic parallels the historical relationships between Vietnamese and Chinese, and similarly between Taiwanese and Chinese populations.

However, in the field of Sinitic-Vietnamese studies, a more integrative approach is essential, one that draws from anthropology, history, and linguistics, because these domains are deeply interwoven. Without such a framework, it becomes difficult to account for the cognate relationships between certain lexical items in Chinese and Vietnamese, especially those found in intimate or colloquial registers. For instance, unrefined terms for human anatomy and sexual acts such as 'cu', 'cặt' (龜 guī), 'hĩm', 'lồn' (隂 yīn), 'bề' (嫖 piáo), 'đụ', and 'đéo' (屌 diǎo; SV 'điệu', Cantonese diu2, Hakka diau3) show clear etymological parallels. Likewise, refined expressions such as 'ânái' (恩愛 ēn'ài), 'giaohợp' (交合 jiāohé), and 'giaocấu' (交媾 jiāogòu) share common roots.

These lexical correspondences reflect underlying linguistic genomes, deep structural affinities that manifest in distinctive semantic and phonological patterns found only in genetically affiliated languages. Such evidence underscores the importance of a multidisciplinary methodology in fully understanding the historical and cultural forces that have shaped the Vietnamese lexicon.

The linguistic commonalities observed across the various "Chinese" dialects and sub-dialectal variants reveal deeply embedded features that are not found in languages of unrelated families, such as Austroasiatic Mon-Khmer. These shared traits are so intrinsic that they often obscure the true etymological origins of certain forms. Their phonetic and semantic proximity makes it difficult to distinguish whether such terms should be classified as cognates from a common root or dismissed as loanwords due to their closeness.

Take, for instance, the Vietnamese word 'đường' 糖 táng (sugar) and its homophonic counterpart 'đàng' or 'đường' 唐 táng (path). Both derive from Chinese sources, yet their semantic divergence reflects layered etymologies. The former likely originates from a Yue root, given Guangxi's historical association with sugarcane cultivation, while the latter, pronounced /dang2/, is plausibly traced to Middle Chinese. These forms reappear in compound expressions such as 'đáiđường' and 'tiểuđường' 糖尿 tángniào (diabetic), where 尿 niào (SV niệu) aligns with both 'tiểu' and 'đái' ('urinate'). The reduplicated form 尿尿 niàoniào corresponds to colloquial expressions like 'điđái' (with the former being 'baby talk' for urination), while Cantonese 屙尿 /o1niu6/ parallels 'điỉa' (to defecate).

Such examples illustrate a broader pattern of lexical interchange between Vietnamese and Chinese. However, etymological analysis alone cannot definitively determine linguistic affiliation, as many Chinese terms may themselves be of foreign origin. Dialectal distribution must also be considered. In this case, the phonetic variants of 'đường' — [ɗɨə̤ŋ˨] (VS), [ɗaŋ˨] (SV), [t'ɔŋ˨] (Cant.) — suggest a shared lineage. While 唐 táng may be rendered as 'đường' or 'đàng' in Vietnamese, 糖 táng /t'aŋ2/ (sugar) is consistently 'đường' /dɨəŋ2/, not 'đàng' /daŋ2/, which denotes 'road'. The semantic distinction is reinforced by phonological constraints across dialects, such as /t'ɔŋ2/ and /djɒŋ2/ in Central Vietnamese. The etymon 糖 táng, likely of Yue origin, was phonetically transcribed to represent the concept of 'sugar', while 唐 táng (Tang Dynasty) served as a phonetic base for both 'đường' and 'đàng', possibly connoting 'palace path'. Similarly, 道 dào (SV 'đạo') aligns with VS 'đường' through phonological correspondence, as seen in the pattern /-owŋ/ → /-ɒw/, comparable to 'đau' (pain) and 痛 tòng (SV thống).

To frame this discussion analogically, imagine Vietnamese as "English" within the Indo-European (IE) family, and Chinese as comprising the core of the Sino-Tibetan (ST) family. Just as English incorporates etyma from Germanic, Latin, Greek, and Romance sources, Vietnamese integrates lexical material from multiple Chinese dialects. In this theatrical scenario, ST plays the role of IE, and Vietnamese assumes the position of English. The relationship between Vietnamese and ST thus mirrors that of English and IE.

Further, if China were not a unified nation but a "United States of the Middle Kingdom," with each province functioning as a sovereign entity akin to pre-EU Europe, then dialects such as Cantonese and Fukienese would be classified as distinct languages. By the same logic, had Vietnam remained under Chinese rule, the Annamese language—still referred to as /a1nam2we5/ in modern Hainanese—would likely be considered a Chinese dialect today.

Methodologically, most Western-trained Sinologists approach Sinitic studies using tools developed for Indo-European linguistics. Yet these frameworks often fall short when applied to Sinitic-Vietnamese studies. For example, inflectional case systems (accusative, nominative, dative) common in Latin, German, or Russian are virtually absent in Sinitic languages, except for reconstructed Old Chinese verb suffixes like -s (Bloomfield, 1933, p. 17), or consonant cluster dimidiation in Archaic Chinese (GL- ~ *BL-, *GS- ~ *BS-, *GDZ- ~ *BDZ-) as noted by Boodberg (1930) and Cohen (1979, pp. 390–393). During the French colonial period, some linguists even dismissed Vietnamese as "primitive" for lacking grammar, imposing French syntactic structures to compensate for perceived deficiencies in parts of speech, verb conjugation, and tense, features absent in both Vietnamese and Chinese.

To achieve balanced objectivity, one must reconcile the early 20th-century contributions of French scholars like Maspero and Haudricourt, who were largely uninfluenced by nationalism or Chinese cultural pressures, with the Sinitic-oriented work of Vietnamese and Chinese scholars such as Nguyễn Đình-Hoà, Lê Ngọc-Trụ,  An Chi, Wang Li (王力), and Chao Yuen-Ren (趙元任). Each operated within the limits of their disciplinary scope: Maspero focused narrowly on a handful of Chinese-derived Annamese words; Haudricourt erroneously dated the emergence of Vietnamese tones to the 12th century; Nguyễn remained confined to Sino-Vietnamese; An Chi appears to find almost every duplicative derived from Chinese; Lê attempted to distinguish native Vietnamese from Sino-Vietnamese forms (e.g., 漢 hàn for 'hắn'); Wang echoed Vietnamese scholarship; and Chao scarcely addressed Vietnamese at all. Though these figures are no longer present to challenge contemporary revisions, their foundational work provides a springboard for advancing a revised historical and cultural framework for Sinitic-Vietnamese etymology.

Nonetheless, one cannot disregard the analytical tools of Indo-European linguistics when pursuing historical etymology. While Vietnamese scholars such as Bùi Khánh-Thế and Cao Xuân-Hạo have adapted Western methodologies to local contexts, though their contributions often remain mechanical rather than substantive in probing the essence of the Vietnamese language (see Cao Xuân-Hạo, 2009), An Chi postulated etymons by logical reasoning. This may reflect personal convictions shaped by unresolved nationalistic tensions, as discussed in the preceding chapter on politics. These issues are central to any meaningful evaluation of the origins of Vietnamese. Ultimately, we may agree that the early linguists laid the groundwork, if not a fixed foundation, then at least a movable platform, for continuing serious inquiry into the Sinitic-Vietnamese linguistic field, guided by the same spirit of analytical rigor and impartiality.

By critically examining the limitations of prior scholarship in the field, this research introduces locally grounded concepts to address gaps left by earlier authors, particularly in categories where exotic or non-native frameworks have proven insufficient. One such area is tonality. Here, each extant tone may be treated as a morphemic feature embedded within lexemes, functioning analogously to syntactic markers in Western linguistic systems. These tones, understood as pitch-registered phonetic vibrations, can be conceptualized as tonemes, suprasegmental morphemes that differentiate lexical meaning in both Chinese and Vietnamese. For example, tonal variants such as ye1, ye2, ye3, ye4, ye5, ye6, ye7, ye8, etc., illustrate how a single syllable can yield multiple semantic values depending on tonal contour.

Each lexeme embedded with a toneme may be classified as a glosseme or vocable, whether it constitutes a syllable, morpheme, or full word. In English, the closest parallel is intonation, which operates at the phrasal or sentential level (see Moira Yip, 1990). However, English intonation does not alter the core meaning of a syllable like /ye/, even when expressed as 'yea?', 'yeh?', 'yes?', 'yah?', or 'ya!'. While stress, pitch, and intonation may affect nuance, they rarely shift lexical identity. In contrast, tonal distinctions in Chinese and Vietnamese are integral to word formation. Indo-European languages generally lack this feature, and Khmer exhibits only limited tonal behavior.

Moreover, conventional sound change laws in Indo-European linguistics, such as Grimm’s Law or the Great Vowel Shift, fail to account for the irregularities observed in Archaic Chinese phonological evolution (see Boodberg, 1930; Cohen, 1979, pp. 363–406). Similarly, sound change patterns between Chinese and Vietnamese cognates often defy systematic classification and must be evaluated on an ad hoc basis. Internal variation within Chinese subdialects introduces further discrepancies. While Sino-Vietnamese loanwords exhibit relatively consistent transformations (e.g., /s-/ → /t-/, /c-/ → /th-/), broader Sinitic-Vietnamese etyma often emerge unpredictably.

In many cases, Sinitic-Vietnamese cognates are discovered serendipitously rather than through rule-based reconstruction. For example, 鼻 pí corresponds to SV 'tỵ' rather than the expected 'bĩ'; 番禺 Panyu becomes SV 'Phiênngung' rather than 'Phanngu'; 丞相 chéngxiàng aligns with 'thừatướng'; and 民 mín yields SV 'dân'. These irregularities suggest a one-to-many correspondence model, where polysyllabic forms evolved from monosyllabic roots alongside tonal morphemes, resulting in complete phonological shifts.

Consider the case of 書 shū in 教書 jiàoshū ('to teach'), which aligns with 學 xué in the compound 教學 jiàoxué. In Vietnamese, this yields 'dạyhọc' (VS) and 'giáohọc' (SV), both meaning 'teacher'. The loss of the final stop /-wkp/ in 'học' /hawk͡p̚8/ facilitates the identification of 書 with 學. Meanwhile, 教師 jiàoshī corresponds to SV 'giáosư', which in turn gives rise to the metathetical form 'thầygiáo'. In contemporary usage, 'giáosư' denotes 'professor' (cf. 講師 jiăngshī, SV 'giảngsư'), while 'giáohọc' refers to a village teacher. These examples illustrate how identical Sino-Vietnamese phonemes can evolve into distinct semantic roles.

The transformation from Sinitic to Sinitic-Vietnamese forms demands specialized expertise in both languages. Western linguistic frameworks, while methodologically rigorous, often resemble machine code, abstract systems beneath the surface of language-specific applications. Vietnamese and Chinese, as linguistic "apps", require localization by scholars fluent in both traditions. Reconstruction of Vietnamese etyma from Chinese sources thus depends not only on generalized rules but also on deep familiarity with Classical Chinese. The Kangxi Dictionary (康熙字典) remains an indispensable resource for tracing obscure etymons embedded in historical texts.

For instance, 車 chē (VS 'xe') appears in 後漢書 HòuHànshū as 居 jū (SV cư), phonologically linked to 古 */ka:ʔ/, and semantically aligned with 'cộ' /ko6/ (cart). This yields the compound 'xecộ', encompassing both general and specific meanings. The Kangxi Dictionary also lists 'cộ' under variants such as 輂 jù, 輋 jù, 檋 jù, and 轂 gǔ. Dialectal and historical factors influence phonological shifts, even when derived from the same ideographic roots. For example, 'tàu' (boat) may correspond to 刀 dāo, 舠 dāo, or 艘 sōu, while 'đò' aligns with 舟 zhōu. Related forms such as 'đỏ' 彤 tóng (SV đồng) further illustrate semantic layering.

The dictionary also preserves numerous obsolete characters and doublets, multiple forms representing a single concept. For example, 'xanh' (blue) may correspond to 靑 qīng (SV thanh), 清 qīng (SV thanh), 倉 cāng (SV thương), 滄 cāng (SV thương), or 蒼 cāng (SV thương). These variants inform expressions like 'trờixanh' (blue sky), rendered as 青天 qīngtiān or 蒼天 cāngtiān. Modern orthography obscures etymological precision; identifying the correct Chinese character behind a Vietnamese term like 'xanh' requires contextual and historical insight. In some cases, Vietnamese forms recorded in the Kangxi Dictionary may reflect dialectal usage in the Annam Prefecture, though such identifications are often elusive.

From the 10th century onward, a distinct set of characters began to appear exclusively within the Vietnamese domain. This development paralleled Annam’s political separation from the Middle Kingdom and led to the emergence of the Nôm script. These characters were constructed using the same structural principles as Chinese ideographs, adapted by Annamite scholars (Nhà Nho 安南 儒家). Unfortunately, only a small number of 15th-century Nôm texts survived the Ming invasion, leaving gaps in the historical record.

Most Sino-Vietnamese vocabulary consists of literary forms tied to written Chinese characters. These characters reflect a sophisticated system of ideograph-phoneticization, through which many terms were coined or adapted. Over time, less frequently used words fell into disuse, replaced by vernacular alternatives. Semantic shifts also occurred: for example, "tửtế" 仔細 zǐxī came to mean 'kindness', while "kỹcàng" evolved to mean 'meticulous'. Other examples include "thấtlạc" 失落 shìluò ('lost') versus "lạcloài" ('at a loss'), and polysyllabic formations such as "lịchsự" 歷事 lìshì ('polite'), "íchkỷ" 益己 yìjǐ ('selfish'), and "khoảngthờigian" 一段時間 yīduànshíjiān ('a period of time'), likely predating the adoption of modern romanized Vietnamese orthography.

Sinitic-Vietnamese etyma have undergone extensive phonological transformation. Colloquial pronunciations often deviate from original forms due to dimidiation and sandhi effects, observable through romanized orthography. For instance, 'rác' may derive from 'rácrưới' < 垃圾 lāji ('trash'), and 'đừng' from 甭 béng < 不用 bùyòng ('do not'). Compound words frequently exhibit reversed syntactic order, with both variants in concurrent use: "bảođảm" vs. "đảmbảo" 擔保 dànbǎo ('guarantee'), "lươngthiện" vs. "thiệnlương" 善良 shànliáng ('kindhearted'), "độc-ác" vs. "ácđộc" 惡毒 èdú ('vicious'), "thânphụ" vs. 'phụthân' 父親 fùqīn ('father'), and "thânmẫu" vs. "mẫuthân" 母親 mǔqīn ('mother').

These examples underscore the depth of Sinitic influence on Vietnamese vocabulary. The close phonological and semantic parallels often lead scholars to classify such terms as Chinese loanwords, though their integration into Vietnamese suggests a more complex origin. Newcomers to the field should examine shared basic words—such as "charuột" 親爹 qīndiē ('biological father') and "mẹruột" 親母 qīnmǔ ('biological mother')—to better understand these linguistic relationships. Readers will encounter both obscure and well-known Sinitic-Vietnamese etyma in this study, including those cited by early pioneers like Maspero and Haudricourt. However, some of their proposed Austroasiatic roots remain unresolved and require further scrutiny before being definitively classified.

Basic lexical items often transcend linguistic boundaries. For example, the word for 'eye' appears as /mat/ in Malay (/mata/), and as 'mắt' in Vietnamese, corresponding to Chinese 目 mù (SV 'mục'). Similarly, 'máu' (blood) may relate to 衁 huáng (SV 'vong'), while Khmer equivalents include /phnek/ and /chheam/. These cross-family cognates suggest that certain core vocabulary may originate from shared ancestral roots, later diffused across Austroasiatic and Sino-Tibetan languages.

At the same time, the presence of Mon-Khmer numerals in Vietnamese has long intrigued scholars. While some view these as foundational, this paper argues that basic words beyond numerals, those tied to daily life and cognition, are more indicative of genetic affiliation. Numeral cognates alone do not determine linguistic lineage. Counterexamples presented in subsequent chapters challenge the prevailing Austroasiatic theory, offering Sino-Tibetan evidence that supports a broader etymological framework. In fact, over 90 percent of Vietnamese common vocabulary may be traced to Sinitic-Vietnamese origins. Readers are encouraged to approach this data critically, not to accept the theory outright, but to prepare for a more informed defense of the Sino-Tibetan perspective if they already hold it.

Controversies surrounding Chinese influence on Vietnamese culture are longstanding. Nationalist sentiment often leads to the downplaying or erasure of historical connections. This resistance is particularly strong among militant nationalists, whose convictions are shaped more by ideology than by historical evidence.

Political agendas have historically shaped linguistic development. Euphemism and taboo have influenced vocabulary choices, such as the substitution of "lợi" (利, 'gain')—the name of King Lê Lợi—with "lời" or "lãi". The author speculates that even 民 mín, rendered as "dân" in Sino-Vietnamese, may have been altered to avoid direct association with Lý Thế-Dân (李世民), the Tang emperor during China's rule over Annam.

In modern times, political directives have encouraged the use of "purely Vietnamese" terms like "xelửa" (train), "tênlửa" (missile), and "máybay" (airplane), despite their Chinese origins. These replaced earlier Sino-Vietnamese forms such as "hoảxa" 火車 huǒchē, "hoảtiển" 火箭 huǒjiàn, and "phicơ" 飛機 fēijī, respectively, which were common in southern Vietnam before 1975. Notably, these terms were introduced during the French colonial period and may have been adapted from Japanese translations.

Why should we care whether Vietnamese has been influenced by Chinese? Fundamentally, because that influence lies at the heart of the Vietnamese language—and it is the central focus of this paper. The shaping of Vietnam by Chinese civilization is comparable to how the Romans, Celts, Angles, and Saxons shaped England in antiquity (Palmer, 1972, p. 356). Chinese cultural and historical imprint is deeply embedded in Vietnamese life, not only in visible traditions but also in the subtleties of everyday speech. This influence extends to the most intimate layers of vocabulary, including colloquial and even sexually connotative expressions. (英)

It can be stated with confidence that Chinese cultural and historical influence is deeply embedded in the everyday life and language of the Vietnamese people. This influence is not only evident in formal expressions but also in the most intimate and colloquial aspects of speech, including vocabulary related to human relationships and sexuality. The linguistic choices in Vietnamese whether refined or vulgar often mirror Chinese equivalents with striking precision. Terms referring to reproductive anatomy, sexual functions, and related actions are etymologically cognate with Chinese lexicons. Such parallels would not exist without centuries of sustained Chinese-Vietnamese interaction.

To understand this influence more fully, we must consider the historical development of Chinese dialects and how their complexity parallels that of Vietnamese. Chinese is traditionally divided into seven major dialectal groups, all of which trace their origins to Middle Chinese. These groups have diversified into more than 900 sub-dialects across China, as documented by C-C Chang and cited by Moira Yip (1990, pp. 202, 223). Despite their shared ancestry, these dialects are largely mutually unintelligible, not only across groups but often within the same group. For example, Amoy, Hainanese, and Tchiewchow, though all part of the Minnan sub-family, differ significantly in phonology and vocabulary. Their relationship is historical rather than functional, and this fragmentation reflects the broader linguistic landscape that Vietnamese has interacted with and absorbed over time.

While speakers of Yue-based languages identify themselves as descendants of the "Jyut people" (粵) and refer to their linguistic heritage collectively as "Jyut6waa6" (粵話), it is specifically Cantonese speakers who also embrace the designation "Tang people" (唐人 /Tong4jan4/). This self-identification implies that their ancestors were likely among the Tang-dominated populations who migrated en masse into the Guangdong region (Y) gradually displacing earlier native groups—particularly those associated with the historical entity X2Y3Z4H (交) prior to the 10th century.

The dialects spoken in this region, shaped by layers of Middle Chinese (MC) phonology and Tang-era linguistic features, evolved atop a Yue substrate and retained distinct characteristics despite Han influence. These dialects came to be known collectively as "Tang language" (唐話 /Tong4waa6/), with the Guangzhou variety eventually emerging as the representative standard. This linguistic identity reflects both historical continuity and cultural pride rooted in the legacy of the Tang Dynasty.

For the same period, as subjects of the Tang Empire until the 10th century, the ancient Annamese acquired Middle Chinese the way that the ancient Cantonese speakers did (Lü Shih-P'eng 呂士朋. 1964.). The extensive Middle Chinese vocabulary that later became the foundation of Sino-Vietnamese was layered atop an earlier Sinitic-Vietnamese lexical base derived from Archaic and Old Chinese, dating back to the pre-Qin and Han periods. Together, these two strata of Sinicized vocabulary formed the linguistic core of ancient Vietnamese long before Annam achieved sovereignty. It is no coincidence that Sino-Vietnamese represents one facet of the same Middle Chinese linguistic matrix, alongside other regional variants such as Táishān (台山)Báihuà (白話), Pínghuà (平話), etc.. The shared features between Sino-Vietnamese and Cantonese reflect their common Tang-era origins, though their paths diverged after Annam’s political break from the Middle Kingdom in 939. While Cantonese continued to evolve within China under the influence of migrants from other Tang prefectures, Annamese developed independently.

After the collapse of the Tang Dynasty in 907, China entered a period dominated by successive northern dynasties, each instituting its own northern dialect as the official language of the imperial bureaucracy. This policy of linguistic centralization continued into the modern era; by 2018, under Xi Jinping’s leadership, national television broadcasts were required to use Putonghua, while regional dialects faced increasing restrictions.

As a result, Cantonese remained largely a regional vernacular, often limited to local communities and informal settings. Among older generations with limited exposure to formal education, reproducing Mandarin phonemes accurately has proven difficult, especially due to interference from native dialectal phonology. This is particularly evident in the articulation of fricative palatal initials—z-, zh-, ch-, c-, q-, and j—which differ significantly from their Cantonese counterparts. 

Vietnamese speakers face similar challenges when learning Mandarin, as phonemic mismatches between the two languages often lead to divergent pronunciations. For instance, the Vietnamese labial /b/ does not consistently align with Mandarin /p/ or /b/, resulting in frequent overcorrections among Vietnamese learners of Putonghua. These phonological discrepancies underscore the broader difficulties faced by speakers of southern Sinitic languages when adapting to the standardized northern speech.

This phenomenon extends to the formation of Sinitic-Vietnamese vocabulary, where diachronic Chinese loanwords are frequently reshaped by Vietnamese phonological constraints. Obstruents such as d, t, th, g, k, quý, qưới, thì, thời, tràng, and trường illustrate how historical sound changes in Annam parallel developments in Minnan sub-dialects like Hokkienese, Amoy, Teochow, and Hainanese. The fusion of Archaic and Old Chinese with these dialects mirrors the broader linguistic evolution that occurred across the Western Han period and into the Three Kingdoms era (Wei, Shu, Wu, 220–280 A.D.). It is likely that ancestral Yue languages, such as proto-LuoYue and proto-MinYue, played a formative role in shaping Archaic and Old Chinese itself.

Up to Annam’s separation from the Tang Empire in 907 and its formal independence from the NamHan State in 939, the region experienced historical developments similar to those in Lingnan (嶺南), including modern Guangdong. After the annexation of the NamViet Kingdom by the Han Empire in 111 B.C., early Yue languages in both regions came under heavy influence from Han Chinese. The process of Sinicization continued in both Annam and Canton, with the latter remaining within the Sinic sphere for over 1,180 years. Cantonese thus evolved as a direct descendant of Middle Chinese, while Annamese diverged, expanding southward by the 18th century and resisting northern incursions. Only then did Chamic and Mon-Khmer elements begin to permeate Vietnamese, forming the basis of the Austroasiatic theory’s Mon-Khmer vocabulary layer.

This suggests that the Sinitic-Yue foundation of Vietnamese predates its contact with Chamic and Mon-Khmer languages. In essence, the early history of Annam was a reflection of Chinese statecraft and culture. Anthropological ties between the two regions date back at least 2,300 years, beginning with the Qin conquest of southern China. Chinese historical records typically refer to Annamese uprisings as local rebellions in a southern prefecture. As Nguyễn Thị Chân-Quỳnh noted (1995, pp. 256–66), Samuel Baron—a Dutch merchant of Annamese origin living in Thănglong (Hanoi) in the 1660s—expressed skepticism about Annam’s historical claims of victory over China in his book A Description of the Kingdom of Tonqueen (1685). Much of the historical and cultural information he cited was drawn directly from Chinese sources. Indeed, Vietnamese history books written before the 1960s often read as a mirror of Chinese history, portraying Annam as a miniature southern version of the Middle Kingdom.

Today, readers often struggle to understand Vietnamese literature written before the 18th century. Since the 20th century, modern Vietnamese has undergone significant transformation, heavily influenced by French grammatical structures that introduced new syntactic and semantic forms. Mid-20th-century Vietnamese generations were well-versed in Chinese classics, from the Warring States period to Romance of the Three Kingdoms, and mastered Tang poetic conventions more thoroughly than many contemporary Chinese readers, who have largely lost touch with classical forms. With the adoption of romanized orthography, Vietnamese readers have distanced themselves from classical Chinese, marking an almost complete break with the literary past. Yet interest in Chinese culture persists: younger Vietnamese audiences continue to enjoy modern Chinese television dramas and historical series, much like older generations who appreciated traditional Chinese opera performed in Vietnamese, such as Hátbội.

Sino-Vietnamese vocabulary, derived from Middle Chinese, developed through a process strikingly analogous to how Latin and Greek shaped the lexicons of Indo-European languages such as English and French. However, the comparison reveals a key distinction: while Latin remained largely confined to scholarly and literary domains, much like classical Chinese Wenyanwen (文言文), which persisted in written form until the early 20th century under the Nguyễn Dynasty, Sino-Vietnamese vocabulary has remained actively embedded in both spoken and written Vietnamese. Its semantic and phonological vitality continues to thrive in everyday usage, far surpassing what one might expect from a corpus of historical loanwords.

Unlike Latin, whose influence is largely fossilized, the phonological essence of Middle Chinese survives robustly in Sino-Vietnamese. The modern pronunciations of Sino-Vietnamese words have been remarkably well-preserved, shaped by systematic sound change rules within a scholarly framework. These rules closely follow the traditional Chinese Fanqie (反切) method of phonetic notation, which splits a syllable into its initial consonant (Anlaut) and final rhyme (Auslaut), each marked by tonal registers. For example:

  • học (learn) 學 xué: 《唐韻》胡 /ɣo2/ + 覺 /jɔkʷ8/ 切 → {Low /ɣ-/ + High /-ɔkʷ8/ (陽 Yang)}
  • tập (practice) 習 xí: 《廣韻》似 /tɨ6/ + 入 /njɐp8/ 切 → {Low /t-/ + Low /-ɐp8/ (陽 Yang)}

These examples illustrate how Sino-Vietnamese pronunciation aligns with historical Chinese phonological models, preserving tonal and segmental features with precision.

Importantly, Sino-Vietnamese vocabulary was never restricted to elite or literary circles. Much like Cantonese, it permeated colloquial speech and became indispensable in daily communication. This widespread usage suggests that many so-called "scholarly" terms likely originated from Tang-era spoken language that diffused into the general population of the Giaochỉ prefecture. Without such oral transmission, it would be difficult to explain the ubiquity of Sino-Vietnamese words in everyday Vietnamese.

This integration has led to the creation of new expressions that blend Sino-Vietnamese and derived Sinitic-Vietnamese elements, forming a dynamic part of the modern lexicon. Examples include:

  • tạingoạihầutra ↔ 在外候查 zàiwàihòuchá (on bail)
  • tâmhồn ↔ 心魂 xīnhún (soul)
  • ngọcngà ↔ 玉牙 yùyá (adorable)
  • cànhvànglángọc ↔ 金枝玉葉 jīnzhīyùyè (born into nobility)

These examples reflect not only linguistic continuity but also cultural resonance, affirming the enduring legacy of Middle Chinese in shaping the Vietnamese language.

    From an etymological standpoint, it is entirely feasible to construct a complete Vietnamese sentence using predominantly Sino-Vietnamese vocabulary (words that are directly inherited from Middle Chinese) by translating each term individually and then reorganizing them to conform to Vietnamese grammatical and syntactic conventions. One effective method involves coining new expressions by adapting Sino-Vietnamese terms into more naturalized Sinitic-Vietnamese forms. For instance, instead of using the formal SV compound phicơtrựcthăng (直升飛機 zhíshēngfēijī) for "helicopter," one might opt for máybaylênthẳng, a vernacularized construction that reverses the word order and aligns more intuitively with Vietnamese usage, that is, " jīfēishēngzhí".

    Another strategy is to link these lexical items using grammatical particles and prepositions, many of which were historically borrowed from Chinese xūcí (虛辭 'function word') to fill gaps in native Vietnamese syntax, which lacked such function words in earlier stages of development (Nguyễn Ngọc San, 1993, pp. 136–142). As Vietnamese evolved, especially under French colonial influence in the early 20th century, its writing style began to incorporate structural features from French, including complex sentence construction, clause embedding, and syntactic connectors. This shift was further reinforced by the adoption of Quốcngữ, the romanized script promoted by figures like Petrus Trương Vĩnh Ký and Phạm Quỳnh.

    By the latter half of the 20th century, the rise of English as a global language introduced additional syntactic models into Vietnamese writing. These included the standard sentence structure of [ Subject + Verb + Object ], along with modifiers, relative clauses, and topic sentences, elements that now form the backbone of modern Vietnamese prose, as well. Interestingly, as Vietnamese expressions grow longer and more syllabically complex, the frequency of Sino-Vietnamese elements tends to decrease. For example, buộcphải (unavoidably) corresponds to 不得已 bùdéyǐ, while the equivalent SV bấtđắcdĩ preserves the original Chinese morphemes. Similarly, lìabỏxómlàng (to abandon one’s hometown) parallels 離鄉背井 (M líxiāngbèijǐng) though the Vietnamese form may require reinterpretation due to its polysyllabic structure.

    To illustrate the convergence between Vietnamese and Chinese, one could construct a long Vietnamese sentence using Western syntactic mechanics and embed Sinitic-Vietnamese vocabulary throughout. Each word or phrase could then be matched with its Chinese equivalent, highlighting the shared linguistic architecture and historical continuity between the two languages. This exercise not only demonstrates the adaptability of Sino-Vietnamese within modern Vietnamese grammar but also underscores the deep-rooted parallels in lexical formation and sentence construction across both linguistic traditions.

    1. Modern Vietnamese with many Sinitic-Vietnamese elements: Đến năm mộtchínbảynăm Sàigòn thấtthủ chínhphủ miềnNam bạitrận cảnước rơivào tay quân BắcViệt xâmlược nên anhta buộcphải lái chiếc trựcthănglênthẳng phóngthẳng rakhơi gặpđược một chiếc tàusânbay liền nhảyxuốngbiển được vớtlên cho nhậpvào dòngngười tỵnạn Việtnam lìabỏxómlàng lưulạc tới Đảo Guam lênbờ tạmtrú tại căncứ Hảiquân Mỹ làmthủtục didân đợingày tới Mỹ địnhcư.
    2. Modern Chinese: 當 一九七五年 西貢 失守 南方 政府 戰敗 全國 落入 北越 侵略軍 之 手 他 被迫 駕駛 一架 直升機 直飛 海面 遇到 一艘 航空母艦 立即 跳入 海中 被 救起 後 加入 越南 難民 隊伍 離鄉背井 流落 至 關島 登陸 後 暫住 美國 海軍 基地 辦理 移民 手續 等待 前往 美國 定居.
    3. Chinese Pinyin: Dāng yījiǔqīwǔnián Xīgòng shīshǒu Nánfāng zhèngfǔ zhànbài quánguó luòrù Běiyuè qīnlüèjūn zhī shǒu tā bèipò jiàshǐ yījià zhíshēngjī zhífēi hǎimiàn yùdào yīsōu hángkōngmǔjiàn lìjí tiàorù hǎizhōng bèi jìuqǐ hòu jiārù Yuènán nànmín duìwǔ líxiāngbèijǐng líuluò zhì Guāndǎo dēnglù hòu zànzhù Měiguó Hǎijūn jīdì bànlǐ yímín shǒuxù děngdài qiánwǎng Měiguó dìngjū.
    4. Sino-Vietnamese transcription: Đương nhấtcửuthấtngũniên Tâycống thấtthủ namphương chínhphủ chiếnbại toànquốc lạcnhập Bắcviệt xâmlượcquân chi thủ tha bịbách giáthị nhấtgiá trựcthăng cựctrực hảidiện ngộđáo nhấttầu hàngkhôngmẫuhạm lậptức khiêunhập hảitrung bị cứukhởi hậu gianhập Việtnam nạndân độingũ lyhươngbốitỉnh lưulạc chí Quanđảo đănglục hậu tạmtrú Mỹquốc Hảiquân cơđịa biệnlý didân thủtục đãngđãi tiềnvãng Mỹquốc địnhcư.
    5. English translation: "In 1975 as Saigon collapsed with the defeat of the South Vietnamese government and the whole country fell into the hands of North Vietnam's invading army he had no choice but to fly his helicopter out into the open sea and saw a US aircraft carrier so he jumped out and was rescued aboard to join a group of Vietnamese refugees in the exodus fleeing the country and reached Guam Island where he spent time staying at a US Navy base to go through the immigration process awaiting resettlement in North America."

    The long one-sentence passage above offers a rich field for linguistic analysis. It intentionally incorporates Sinitic-Vietnamese compounds such as máybaylênthẳng and the locally coined tàusânbay, which reverses the structure of the 'ad hoc newly coined neologism '機場艇' (jīchǎngtǐng), regardless of whether such phrasing would realistically appear in practical wartime writing. In fact, such a sentence would likely not have been composed by either a southern or northern Vietnamese individual from the war era. Southern Vietnamese speakers would have more commonly used terms like phicơtrựcthăng (直升飛機 zhíshēngfēijī), or at least máybaytrựcthăng, and hàngkhôngmẫuhạm (航空母艦 hángkōngmǔjiàn), respectively.

    On one hand, these polysyllabic Sinitic-Vietnamese terms were widely used in the North, where speakers tended to favor Sino-Vietnamese vocabulary. The two modern terms mentioned above were coined during a period of educational reform, particularly during campaigns aimed at eradicating illiteracy. On the other hand, due to the political connotations embedded in phrases such as cảnước rơivào tay quân BắcViệt xâmlược (“the whole country fell into the hands of North Vietnam’s invading army”), such a passage could not have been authored by a northerner. The ideological framing would have been incompatible with official narratives.

    The key point here is that translated Sinitic-Vietnamese words are actively in use, and readers should pay close attention to their etymological layers. Even basic words such as anhta, rakhơi, gặpđược, dòngngười, and others carry significance for those interested in tracing genetic affiliations between Chinese and Vietnamese. This is particularly relevant for Austroasiatic Mon-Khmer specialists, who often seek lexical evidence to support broader linguistic theories.

    The Chinese translation of the passage follows a near word-for-word structure, retaining common Sino-Vietnamese terms such as thủtục (procedure) and địnhcư (resettlement), which are widely used by average Vietnamese speakers. The author leaves it to emerging Vietnamese linguists to explore the Sinitic-Vietnamese linguistic features presented here, which reflect both modern usage and ancient roots.

    Polysyllabic Vietnamese compounds are written in a combined format, as recommended, mirroring the way Chinese block characters are grouped, similar to Korean smart orthographic conventions. This stylistic choice reflects a modern Vietnamese writing style found in contemporary publications like Tuổitrẻ, which contrasts sharply with the French-era Namphong magazines of the 1930s. The differences span grammar, vocabulary, and tone.

    Even after two decades of division between North and South Vietnam, northern vocabulary remained more Sino-centric, while southern Vietnamese evolved under the influence of Chamic and Mon-Khmer languages, resulting in a more relaxed phonological structure. In today’s digital age, both spoken and written Sino-Vietnamese forms have spread rapidly and uniformly, a transformation unimaginable to scholars of earlier generations who relied on plume pens and writing brushes reserved for the privileged few.

    Beyond etymology, linguistic peculiarities that are those unique to Vietnamese and Chinese appear across all categories. Lexically, the Sino-Vietnamese class is indispensable; it permeates both speech and writing to such a degree that fluency in Vietnamese is virtually impossible without it. Phonologically, even fluctuating articulations reveal consistent patterns. Lexemic nuclei embedded in Sino-Vietnamese kernels manifest through sound change rules: tràng for trường (長 cháng, 'long'), đàng for đường (唐 táng, 'path'), đảm for đởm (擔 dàn, 'carry'), đờm for đàm (痰 tán, 'mucus'), đàn for đờn (彈 tán, 'pluck'), and so on.

    Western Austroasiatic Mon-Khmer specialists have long theorized that both Chinese and Vietnamese are isolating languages, composed of discrete words strung together without inflection or case. Yet grammatical analysis reveals deeper ties. Vietnamese grammar is heavily built on Chinese 虛詞 (xūcí, SV hưtự), that is, function words that serve as particles, adverbs, prepositions, conjunctions, pronouns, classifiers, and even articles. These grammatical markers are essential for coherence and are demonstrably derived from Chinese sources (Nguyễn Ngọc San, 1993, pp. 136–142).

    If even one grammatical word is missing from a sentence, the structure begins to resemble classical Chinese 文言文 (Wényánwén), a literary style composed of isolated words without grammatical connectors. While such classical forms are incomplete by modern standards, they often evolve into idiomatic expressions in both Chinese and Vietnamese, further reinforcing the proposition of shared linguistic ancestry. (文).

    In Vietnamese, the only exceptions to the use of xūcí are found in the shortest exclamatory sentences, typically consisting of just one or two words. Even then, such expressions are almost exclusively constructed from words of Chinese origin, for example,

    • 'Vâng.' 行 Xíng. (Agree.),
    • 'Xong.' 成 Chéng. ('Deal.'),
    • 'Đúng.' 中 Zhòng. ('Right.'),
    • 'Cút!' 滾 Gǔn! ('Out!'),
    • 'Rồi.' 了 Liăo (Done.),
    • 'Đi!' 走 Zǒu! (Let's go!),
    • 'Được!' 得 Dé! (Okay.),
    • 'Đượcrồi!' 得了! Déle! (That's okay!),
    • 'Hayghê!' 好極! Hăojí! (Very good!)
    • 'Chúa ơi!' 我主! Wǒ Zhǔ! (My God!),
    • 'Trờiơi!' 天啊 Tiānna! (My Lord!)
    • 'Vìsao' 為啥?" Wèishă? (How come?)
    • 'Vôduyên!' 無聊 Wúliáo! (Nonsense!),
    • 'Tạimầy!' 賴你 Làinǐ! (It's your fault!)
    • 'Đụmá!' 他媽 Tāmā! (Fuck you!), 
    • 'Thìralàvậy!'  原來如此!  Yuánláirúcǐ! (So, that is why!),
    etc.

    One illustrative case is the Vietnamese word mắt ('eye'), which corresponds to the Chinese character 目 (mù) and is also cognate with Hainanese /mat7/, a sub-dialect of Min Chinese descended from the ancient Minyue languages. This connection implies that mắt is unlikely to derive from the Malay form /mata/, which may be a coincidental resemblance, as such overlaps are rare and isolated.

    This example supports the broader observation that Vietnamese shares deep etymological ties with Chinese, particularly through dialectal continuities. We can extend this analysis to other Sino-Vietnamese forms derived from Middle Chinese literary vocabulary, which co-exist alongside Sinitic-Vietnamese derivatives from regional Chinese lects that are postulated as ancient Yue linguistic descents. These forms persistently appear across time and usage, reinforcing the notion of a durable linguistic inheritance, for example,

    • 'Được' 得 dé (okay), Hainanese /dewk8/,
    • 'Đi' 走 M Zǒu (go), Hainanese /duj3/,
    • 'Biết', Hainanese /bat7/ (know),
    • 'Xơi' 食 shí (eat), Hainanese /zha1/,
    • 'Đũa' 箸 zhú (chopsticks), Hainanese /duo3/, etc.,

    In the cases above, the words and their peculiar usage in all linguistic categories are apparently related and definitely not coincidental at all, which leaves one to ponder the peculiarities that other Mon-Khmer ~ Vietnamese cognates are lacking.

    The examples above illustrate that the words and their peculiar usage across linguistic categories are clearly related and not coincidental, prompting reflection on the absence of such peculiarities among Mon-Khmer and Vietnamese cognates.

    When a Vietnamese word closely mirrors the shape and sound of a form in a related Chinese dialect, so closely, in fact, that the morphemic structure aligns, it is common for linguists to misclassify it as a Chinese loanword. This overlooks the possibility that both forms may have evolved from a shared root. This phenomenon includes basic vocabulary items, such as Hainanese /mat7/ for 目 (mù, 'eye'), which corresponds to Sino-Vietnamese mục and vernacular Vietnamese mắt. Similarly, Cantonese /tʰaːi³/ for 睇 (dì, 'see') aligns with Vietnamese thấy, while 看 (kàn, “look”) corresponds to the scholarly Sino-Vietnamese khán /kʰan5/. The vernacular Vietnamese coi /kɔj1/ is echoed in the Quảngnam sub-dialect as /kər1/, and in Shanghainese as /kʰə25/. These examples demonstrate that phonetic and phonological proximity alone is insufficient to classify a Vietnamese word as a Chinese loanword.

    This same reasoning applies to true Chinese loanwords, whose prominent phonetic attributes have left a lasting imprint on Vietnamese. However, not all etyma of shared origin are derived from Chinese. Consider classic examples from the southern region: gạo (稻 dào, SV đạo, 'rice'), dừa (椰 yě, SV giả, 'coconut'), đường (糖 táng, SV đàng, 'sugar'), and sông (江 jiāng, SV giang, 'river'). These Chinese–Sinitic-Vietnamese cognates are not loanwords but rather forms descended from a common root. The reverse is also true: Chinese contains Yue-origin loanwords that resemble the phonology of Sinitic-Vietnamese forms. In other cases, Vietnamese borrowed etyma back from Middle Chinese, resulting in pronunciations that resemble Sino-Vietnamese instead.

    Both lexical classes, namely, Sinitic-Vietnamese and Sino-Vietnamese, are products of the same historical linguistic development, reflecting the characteristics of specific dynastic eras or regional speech patterns. This process parallels the evolution of Minyue languages and Cantonese Yue sub-dialects. The former derived from Old Chinese of the Han Dynasty, while the latter were shaped by Tang-era popular speech, brought south by migrants from northern China. Unlike these Yue dialects, which have been largely Sinicized and stood as stand alone lect, Vietnamese emerged as an independent Yue language. It retains both Sino-Vietnamese and Sinitic-Vietnamese vocabularies as major Chinese-derived strata, yet it has not been Sinicized to the extent that it could be considered a 'Chinese lect'.

    For example, Vietnamese syntax typically follows a [noun + adjective] structure, with the modified element preceding the modifier. This is evident in trờixanh ('blue sky') versus the Chinese compound 蒼天 (cāngtiān, SV "thươngthiên"), and in terms like gàcồ and gàtrống ('rooster'), as introduced before.

    These examples highlight Sinitic-Vietnamese words that remain distinct from their Sino-Vietnamese counterparts, though both classes complement each other. The former belongs to an older lexical layer, Old or Ancient Chinese, or to regional dialectal variants that diverge significantly from metropolitan speech. Some Sinitic-Vietnamese words represent the 'lightest' accented version of a dialect, typically spoken by educated urban populations.

    In this way, the development of Vietnam’s national language parallels the history of the Yue people of the NamViệt Kingdom. Fleeing Han invasion, they abandoned ancestral lands in the north, migrated southward, and displaced indigenous populations in their new settlements. Whether through replacement or assimilation, they survived and came to be regarded as descendants of the Southern Yue. They established a sovereign nation in the south called Việtnam ("people of the Southern Yue"), securing independence from the 10th century onward.

    Under constant threat from China, the Vietnamese became expansionists themselves. They eradicated the thousand-year-old Champa Kingdom from the Southeast Asian map in the 18th century and annexed its southern territories. They also occupied southeastern lands previously held by the Khmer. As a result, the Chamic and Mon-Khmer peoples became minorities in their own ancestral lands, much like earlier ethnic groups such as the Daic, Hmong, and Mường.

    Merritt Ruhlen in his The Origin of Language (1994. pp.172-173), when discussing about the postulation of the Bantu language family in Africa initiated by Greenberg, on finding its closest relatives, the author argued that

    "[i]f the language is widely dispersed, but its closest relative occupies only a small region, the usual historical explanation is that the broadly dispersed language was originally spoken in a much more circumscribed area, side by side with its closest relative, and spread to its present distribution later. This is sometimes referred to as principle of least moves. To see how this principle works, consider the Vietnamese language, which is spoken along the coast of Southeast Asia from China to the southern tip of Vietnam. It is reasonable to assume that this language spread along the coast in one direction or the other, but which, and from where? It so happened that Vietnamese is most closely related to a relatively obscure language known as Muong, spoken by just over 700,000 people in the northern regions of Vietnam, and this fact suggest that Vietnamese originally spread from this northern region southward to its present distribution. The fact that the Vietnamese dialects in the north are more divergent than those in the south – which invokes the Age-Area hypothesis – confirms the hypothesis of a northern origin.

    As they migrated southward, the Southern Yue people carried with them not only their cultural identity but also their linguistic heritage—their mother tongue. These migrants descended from the ancient Yue, whose genetic and cultural composition had increasingly crystallized into a hybrid form symbolized metaphorically as {4Y6Z8H+CMK} (交), enriched by layers of local influence. Prior to the Viet-Muong divergence, groups such as the Daic had already intermingled with Khmer populations, contributing to the emergence of a new Southern Yue identity in the resettled regions. These people would later be known as the Annamese.

    From a linguistic perspective, this "local flavor" aligns with what Leonard Bloomfield (1933, p. 51) described as a dialectal area, where sub-dialectal variation is minimal, and differences accumulate gradually as one moves farther from the point of origin. Such regions can be visualized as concentric circles, or isoglosses, radiating outward from a linguistic core. Bloomfield referred to this phenomenon as dialectal geography, a framework that helps resolve certain linguistic puzzles, for instance, the striking resemblance between Mon-Khmer numerals from one to five and their Vietnamese counterparts.

    This pattern of gradual divergence and lexical convergence mirrors similar phenomena in other language families, such as Indo-European. A notable example is Bulgarian, a Slavic language whose vocabulary is heavily composed of foreign loanwords, yet remains structurally consistent with its linguistic lineage. The Vietnamese case, shaped by centuries of migration, contact, and cultural layering, offers a parallel model of linguistic evolution through hybridization and regional adaptation.

    Over the course of two millennia, the evolution of modern Vietnamese has been shaped by a steady southward migration of its speakers, during which the language absorbed numerous local linguistic elements. This process explains the presence of Chamic and Mon-Khmer vocabulary in Vietnamese, particularly in regional dialects such as that of Huế. Words like , , ni, nớ, ri, rứa, and chừ have been identified by several scholars as likely of Chamic origin. Whether this attribution is entirely accurate remains debatable, especially given the existence of equivalent forms in modern Mandarin. Nonetheless, the presence of these features reflects a broader pattern: Chamic and later Mon-Khmer forms entered the Vietnamese lexicon alongside preexisting native elements. This phenomenon resembles the recycling of Sino-Vietnamese vocabulary into Sinitic-Vietnamese forms, for example, as a vernacular counterpart to tử (子 zǐ).

    As Merritt Ruhlen (1994, p. 173) noted, just as we do not refer to German, Dutch, or Swedish as "Semi-English", we should likewise refrain from labeling dialects such as Hokkienese, Amoy, Hainanese, or Cantonese as "Half-Chinese", and by extension, Vietnamese should not be considered a partial derivative of Chinese either. The internal structure of Vietnamese sub-dialects differs markedly from the mutual unintelligibility observed among sub-dialects of Cantonese or Amoy, for instance, between Guangzhou and Toishanese, or between Hokkienese and Teochow. Vietnamese sub-dialects resemble regional variants within a single dialect, akin to the relationship between Haikou and Wenchang in Hainanese, or Fuzhou and Amoy in Minnan.

    To grasp this distinction, consider the comparison between English and German "Good morning" versus "Guten Morgen". Vietnamese sub-dialects, from north to south, can be analogized to the seven major dialects of a major  Chinese lect in terms of "local flavor". On a tonal scale ranging from “lightest” to “heaviest” glides, northern Vietnamese tonal contours often strike southern ears as sharply accented, much like how Mandarin-speaking Taiwanese perceive Putonghua spoken by native Beijingers, namely, Beijinghua, as a "heavy 'erhua'" variant of Mandarin. Despite Mandarin’s phonemic inventory of only four tones, its northern form is syllabically and tonally further simplified compared to southwestern Mandarin sub-dialects spoken in Chengdu (Sichuan), Liuzhou (Guangxi), or Yueyang (Hunan), as well as the standard Mandarin used by Taiwanese broadcasters. In short, even within a single dialect, northeastern Mandarin variants differ significantly from their southwestern counterparts, which exhibit heavier southern accents.

    In contrast, Vietnamese dialects (方言, SV phươngngôn) demonstrate a remarkable degree of mutual intelligibility across regions. Unlike the numerous sub-dialects (方言 fāngyán) of major Chinese dialects, which are often completely unintelligible to one another even within the same linguistic family, say Fukienese and Hainanese, Vietnamese regional variants are generally understood nationwide. This cohesion is a direct result of the gradual southward movement of ancient Annamese migrants, who traversed approximately 2,200 kilometers over a span of 2,200 years, an evocative mnemonic of one kilometer per year. Linguistically, this longitudinal migration produced transitional sub-dialects that varied incrementally from one locality to the next, allowing speakers from different regions to comprehend each other with relative ease.

    Dialectal differences in Vietnamese are most prominently expressed through tonal pitch. In the northern regions, where the full eight-tone system is preserved, speech tends to carry a higher pitch and sharper tonal contours, somewhat reminiscent of Cantonese. In contrast, southern Vietnamese, with a reduced six-tone system, exhibits a softer and more relaxed tonal quality, contributing to an overall sense of "lightness".  Central Vietnamese dialects—particularly those spoken around Huế and in the rural areas of Bìnhđịnh—stand apart with their deeply concaved tonal contours, producing a marked "heaviness" that distinguishes them from both northern and southern varieties. This tonal gradient not only reflects geographic variation but also encapsulates the layered historical and cultural influences that have shaped the Vietnamese language over centuries.


    Figure 9: Proto-Sino-Tibetan (pre-Chinese)


    All of the above, linguistically and racially, appears to have played a formative historical role in shaping Vietnamese identity, with residual traces of ancient Chinese influence as expected, given that Annam was once part of imperial China. Politically, Vietnam has long been regarded by successive Chinese rulers as a breakaway vassal state or even a renegade prefecture, analogous, in some respects, to the contemporary case of Taiwan, or even Hong Kong. Although Annam ceased to be under direct Chinese rule after 907 B.C., the region continued to be associated with the "Great Han" (大漢 Dàhàn) through the Nam Han (南漢) regime after 918, which had previously been known as "Great Yue" (大越 DàYuè) until 917. The name change was prompted by the ruling Liu (劉) family’s claim of descent from Liu Bang and Liu Bei of the Western and Eastern Han dynasties. Recognizing these geopolitical shifts is essential to understanding the enduring affiliations between Vietnamese and Chinese civilizations, and the anthropological continuities that link modern Vietnamese society to its Chinese antecedents.

    It is accurate to assert that the proto-Chinese had little direct connection to the proto-Yue or proto-Vietic peoples. The distinction lies in nomenclature: "Chinese" refers to a civilization, not a race. In prehistoric times, proto-Tibetan groups are believed to have been the ancestors of proto-Chinese populations, who conquered and intermingled with Taic-speaking natives. This fusion gave rise to pre-Chinese communities that later interacted with the Yue peoples of the south, known in Chinese records as Namman (南蠻), around 3000 B.C. The racial admixture of inhabitants in ancient Central and Southern China eventually became subjects of emerging states during the Eastern Zhou period, scattered across pre-Han territories. Ancient China might well have been called the Chu Empire of the Taic people, rather than the Han Empire, given that many of its subjects descended from the Taic-Yue, who also populated the NanYue Kingdom. Reframing the terminology reveals parallels with Vietnam’s Kinh majority (symbolized as {4Y6Z8H}) and its long-standing coexistence with minority groups such as the Cham and Khmer ({+CMK}), forming a composite identity {4Y6Z8H+CMK} that has persisted for over 1,500 years.

    For better or worse, the Vietnamese inherited Chinese cultural traditions after a millennium of Chinese rule, passing them down through generations well into the 20th century. Among the most enduring, and arguably problematic, legacies are Confucian values, particularly the hierarchical principle of obedience: first to the ruler (君 jūn, 'quân'), then to the teacher (師 shī , 'sư'), and finally to the father (父 fù, 'phụ'). This framework has contributed to a national culture of deference to authority. Despite the potentially demeaning aspects of this legacy, Vietnamese nationalists have continued to embrace its neo-monarchical underpinnings. Confucian ideology reinforces the power of the ruling class and its supporting structures, conditioning individuals to obey and conform from birth.

    Anthropological evidence of this mindset is reflected in the cultural preference for male descendants, ensuring the continuation of the family surname. This phenomenon may be explained by inherited cognitive patterns shaped over generations. Vietnamese families rarely question the spiritual significance of genealogy, a trait deeply rooted in Chinese cultural tradition but largely absent in Khmer society. This suggests that the ancestral origins of the Vietnamese lie not in the broader Indochinese peninsula, where surname inheritance was uncommon, but in regions historically inhabited by the BáchViệt (百越 BǎiYuè) in southern China. Thus, when discussing historical linguistics, one must also consider anthropology. The collective unconscious of the Vietnamese people points to ancestral ties not only to northern Vietnam, where the ancient Vănlang polity was founded, but also to China South, the homeland of the original Yue.

    The transformation of cultural traditions, including the adoption of Chinese surnames, has deep roots in Chinese civilization and is evident throughout Vietnamese history. Genealogically, descendants of northern settlers from China who resettled in what is now northern Vietnam continued to pass down Chinese surnames across generations. These surnames, as seen in prominent historical figures, represent only a fraction of a much larger ancestral pool comprising hundreds of family names that formed the Yue-Han (楚漢) melting pot in southern China.

    Regarding northern genetic affiliations, the racial groups that constituted ancient China were evenly distributed across early Annamese territory until the fall of the Tang Dynasty in 907 A.D. After gaining independence, Annam expanded southward, absorbing additional racial elements into its evolving demographic landscape. This included lighter-skinned settlers from the north and darker, mixed populations from the south. These latecomers merged with earlier resettlers, contributing to the composite identity of the Vietnamese people. The historical periods of the NanYue, Chu, and Qin states are reflected not only in the diversity of Vietnamese surnames but also in the tonal qualities of Vietnamese personal names, except in cases where names were changed to conceal identity or avoid taboo, as noted by Nguyễn Thị Chân Quỳnh (1993).

    Semantically, the "textures" of certain given names evoke a sense of lightness, closely resembling the phonetic elegance of Chinese names from the Tang Dynasty era—such as Lý Thế-Dân (李世民 Lǐ Shìmín) or Dương Ngọc-Hoàn (楊玉環 Yáng Yùhuán). In contrast, names like Hồ Cẩm-Đào (胡錦濤 Hú Jǐntāo), Giang Trạch-Dân (江澤民 Jiāng Zémín), Tập Cận-Bình (習近平 Xí Jìnpíng), or even Hồ Chí Minh carry a more “rough” phonological texture that tends to sound foreign to Vietnamese ears.

    From a geopolitical standpoint, many of the populations in question were included in the census records of the Great Tang Empire, which reported a total population of nearly 42 million by the year 726 A.D. (Bo Yang, 1983–1993, Vol. 51, 1991, p. 86). Remarkably, by 763 A.D., just over a decade later, the population had plummeted to approximately 17 million, following the devastating An Lushan Rebellion (安祿山) that lasted a little more than twelve years (Bo Yang, 1983–1993, Vol. 53, 1991, p. 214). While the scale of this decline may seem staggering, it is not implausible when viewed through the lens of Chinese history, where mass casualties were common in major military conflicts.

    One illustrative example occurred in 878, when Tang forces reportedly annihilated 50,000 rebels during the Battle of Huangmei (黃梅), a confrontation against the Huangjiao uprising (Xu Liting, 1981, p. 217). Such figures underscore the brutal nature of warfare in imperial China and help contextualize the demographic shifts recorded in historical annals. (一)

    In relation to the same matter, there were no recorded changes in the population of the Tang Dynasty’s Annam Protectorate, suggesting that the region may have narrowly escaped the mass killings that devastated northern China during periods of rebellion and warfare. Historical records from the Western Han era indicate that Jiaozhou Prefecture (交州, Giaochâu) had an Annamese population of approximately 900,000. A significant portion of this population descended from more than 30,000 local women who were compelled to marry Qin soldiers during earlier conquests. Assuming an average of three children per couple, this initial generation could have produced up to 90,000 racially mixed offspring, numbers that would have multiplied exponentially well before the start of the first century.

    A millennium later, under Tang rule, Annam remained part of the empire for nearly 300 years. During this time, the demographic landscape likely expanded further, with additional Annamese descended from children fathered by thousands of Chinese infantry stationed in the region. These soldiers, along with waves of civilian immigrants from the mainland, often chose to settle permanently and marry local women. This pattern of intermarriage and integration persisted throughout the thousand-year colonial period, continuing until Vietnam’s independence in 939 A.D. Although the influx of Chinese immigrants slowed thereafter, it never ceased entirely and continues in smaller waves to the present day.

    A similar southward migration occurred again during the expansion of Annamese settlers into newly annexed territories formerly belonging to the Champa and Khmer kingdoms, beginning in the 13th century. This movement mirrored earlier demographic shifts and contributed to the complex ethnocultural fabric of modern Vietnam. (E)

    Analogously, when comparing the demographic composition of Singapore and Taiwan, particularly the ratio of late Chinese immigrants to indigenous populations, their current status mirrors the position ancient Vietnam occupied over a millennium ago. A similar process of linguistic and cultural integration is unfolding in these regions today. However, in contrast to the slow evolution of language in antiquity, the modern era with its advanced communication technologies, such as the internet and mobile phones, has stabilized linguistic development. Mandarin Chinese, as spoken today, is unlikely to undergo significant transformation, largely because learners across the Chinese diaspora now adhere to standardized Putonghua, already adopted in Malaysia and Singapore. Taiwan has begun transitioning from its traditional Zhuyin romanization system to the Pinyin system used in mainland China, while Hong Kong is increasingly embracing Putonghua and Simplified Chinese characters in place of Cantonese and Traditional script.

    In contrast, the linguistic evolution of Chinese in ancient Annam was far more complex. Over 2,200 years ago, during the Han occupation beginning in 111 B.C., the southward spread of Chinese language and administration progressed at an average rate of roughly one kilometer per year. This slow diffusion was shaped by limited transportation and communication infrastructure, resulting in a fragmented and regionally adapted linguistic landscape.

    Beyond the historical evidence cited throughout this study, and the clear presence of Chinese linguistic features in Vietnamese, historical linguists must grapple with semantic complexities that extend beyond phonetic shifts. Variations in cultural and linguistic elements pose significant challenges when attempting to trace words that appear to share Chinese ancestry but are absent from commonly spoken Chinese dialects. This requires sinologists to delve deeply into the peculiarities of over 900 Chinese subdialects to uncover potential cognates. For example, kinship terms such as ôngnội (possibly 內公 nèigōng, "paternal grandfather", Hokkienese 內公 nèigōng /lǎikong/) versus ôngngoại (外公 wàigōng, "maternal grandfather"), and bànội (possibly Hakka 婆奶 po2nai1 vs. 內婆 nèipó [?], "paternal grandmother") versus bàngoại (外婆 wàipó, "maternal grandmother"), suggest semantic parallels, though the first pair do not appear in modern Chinese usage. Nevertheless, the existence of terms like 天公 Tiāngōng (Vietnamese: ÔngTrời, "Supreme Creator") and 地公 Dìgōng (ÔngĐịa, "Earthly God") supports the plausibility of such kinship structures.

    Kinship vocabulary further illustrates the genealogical depth of Chinese influence in Vietnamese. Examples include tía (爹 diē, "daddy") versus cha, ba (爸 bā, "papa") versus bố (父 fù, "father"), nạ (娘 niáng, "mommy") versus mẹ, and mợ or u for 母 mǔ ("mother"). These terms coexist in both languages, reflecting shared cultural and familial structures. While some lexical correspondences are straightforward, such as 首 shǒu and Vietnamese sọ ("cranium"), or 足 zú and đủ ("enough"), others are more complex. For instance, the archaic Viet-Muong form /dak7/ (water) is cognate with modern Vietnamese /nɨək7/ and variant /nak7/, which may correspond to Chinese 水 shuǐ (SV thuỷ). Similarly, 踏 tà ("trample") aligns with Vietnamese đạp /dap8/. These examples highlight the intricate interplay between Austroasiatic and Sinitic elements, extending beyond well-known cases like mắt ("eye") or bươmbướm ("butterfly").

    Turning to anthropological considerations, it is widely accepted that "Chinese" denotes a civilization rather than a race. Conceptually, there was no "Chinese" identity prior to the unification under the Qin Dynasty (秦朝) in 221 B.C. Historical Vietnamese references to the Chinese as Tàu, derived from the Sino-Vietnamese Tần (秦), are often interpreted as pejorative, possibly reflecting resentment from the Warring States period (475-221 B.C.), when Qin eradicated rival states. However, the term Tàu may also have originated from Tiều, itself derived from Triều, a short form of Triềuchâu (朝州 Cháozhōu, 'Teochow'). If so, the pronunciation Tàu lacks any inherently derogatory meaning.

    Like any population across the globe, the Vietnamese people are not racially homogeneous in terms of Yue indigeneity. Instead, they represent a complex amalgamation of various ethnic tribes, shaped by successive waves of immigration from southern China. This diverse composition—symbolized as {4Y6Z8H+CMK}—includes early settlers who themselves were descendants of Taic-Yue lineages, the ancestral stock that gave rise to both the Chu and Han populations prior to the Han conquest of the Nam Việt Kingdom in 111 B.C.

    Following this pivotal moment in history, the forebears of the Vietnamese people began to evolve as a hybrid community, already bearing Chinese family surnames. These surnames reflected centuries of intermarriage and integration between Chinese colonists, military personnel, and civilian immigrants with the indigenous populations of the south. This process of admixture continued steadily for generations, laying the foundation for the ethnocultural identity of modern Vietnam.

    In racial terms, modern Vietnamese society is a composite, with the Kinh majority analogous to the Han 'race' in China, a melting pot rather than a segmented 'salad bowl'. As discussed earlier, descendants of the Yue peoples contributed to both Han and Vietnamese identities. Alongside 54 officially recognized ethnic minorities, including the Tày (Daic 傣族), Nùng (Zhuang 壯族), Hmong or Mèo (Miao 苗族), and Thuỷ or Thái (Shui 水族), these groups inhabit remote mountainous regions along the northern and western borders of Vietnam. Following the Han conquest of the NamViệt Kingdom, many Yue emigrated from China South and eventually formed the Kinh majority in Vietnam’s south. This migration facilitated the final split between the ancient Viet-Muong people into the Muong and Vietic branches. Today, Muong descendants reside primarily in Hòabình Province.

    As the Kinh continued their southward expansion, they intermingled with Chamic populations along the central coast and with Mon-Khmer communities in the western highlands and southernmost regions. This long history of migration, intermarriage, and cultural exchange has shaped the ethnolinguistic landscape of Vietnam, producing a richly layered national identity that reflects both indigenous and Chinese influences.

    Attempts to draw direct correlations between biological lineage and linguistic identity among Vietnamese and Chinese populations in Vietnam have often led to oversimplifications and misinterpretations. This conflation is notably less prevalent in countries such as Japan and Korea, where ethnic Chinese communities have historically remained socially and culturally distinct from the majority populations.

    In contrast, in Southeast Asian nations like Indonesia and Malaysia, Chinese minorities, despite having resided there for multiple generations, continue to be recognized as separate ethnic groups. In Indonesia, for instance, descendants of Chinese immigrants were legally mandated to adopt Indonesian surnames and have historically faced restrictions in accessing certain governmental roles. In Malaysia, where the population is approximately 34.1 million as of 2024, ethnic Chinese account for 22.4% of the total. Yet, they are still officially classified as Chinese in origin, with limited representation in key state institutions.

    Taiwan’s demographic evolution presents a compelling parallel to Vietnam’s historical experience. The population of Taiwan, especially in the southern regions, reflects a complex blend of Chinese and indigenous ancestry. In the 17th century, large waves of laborers from Fujian Province crossed the Taiwan Strait to work on Dutch plantations, laying the foundation for Chinese settlement on the island. A second major influx occurred in 1949, when the Kuomintang, defeated by communist forces, retreated to Taiwan. Accompanying them were thousands of Chinese refugees, including soldiers and government officials, many of whom intermarried with the island’s indigenous communities, further expanding the Chinese ethnic base.

    In contemporary Vietnam, third- and fourth-generation descendants of the Hoa (華) ethnic community commonly identify as "Vietnamese" when filling out census forms or official documents concerning national origin. Despite their Chinese ancestry, cultural assimilation over generations has led many to adopt a Vietnamese identity in both public and private life. Between 1990 and 2024, estimates indicate that the Chinese diaspora in Vietnam has grown to approximately one million individuals, reflecting both historical migration patterns and continued demographic integration.

    This pattern of integration has extended into the modern era. Between 1990 and 2022, an estimated 133,000 Vietnamese women married Taiwanese husbands, according to kyotoreview.org, giving rise to a new generation of Taiwanese-Vietnamese children. Their genetic composition may be represented as {4Y6Z8H+CMK+T}, where T denotes the Taiwanese demographic, itself comprising {4Y6Z8H+I}, with I signifying indigenous ancestry. This layered identity illustrates the continuing fusion of regional ethnicities and migration histories that together shape the evolving cultural landscape of Taiwan.

    Although Taiwan possesses a rich and layered anthropological history, its experience with Chinese integration is comparatively modest when measured against Vietnam’s centuries-long absorption of Chinese immigrants. From the time Vietnam functioned as a Chinese prefecture, it welcomed hundreds of thousands of settlers from the mainland. Following the fall of the Ming Dynasty in 1644, approximately 50,000 Ming loyalists fled the Manchu conquest and resettled in southern territories under Annamese governance. These refugees, known as Minhhương (subjects of the Ming), were predominantly Teochew speakers and rapidly assimilated into Vietnamese society; they adopted Vietnamese language and customs with fluency and ease. Their surnames, though rooted in Chinese tradition, underwent phonetic adaptation within the Vietnamese linguistic environment. For instance, Huỳnh and Hoàng both derive from 黃 (Huáng), while  and  correspond to 武 (Wǔ), illustrating regional shifts in pronunciation and integration over time.

    Vietnamese surnames, in general, closely reflect Chinese naming conventions in both structural format and semantic connotation. Many follow the classical Han pattern of a monosyllabic surname followed by a given name. This resemblance goes beyond surface formality; the phonological and tonal characteristics of Vietnamese names often align with Middle Chinese pronunciation patterns. Today, only recent Chinese immigrants, those who arrived within the past century, are officially classified as Hoa (華) in Vietnam’s national census. Based on 2019 census, this group numbered approximately 750,000 and includes speakers of Cantonese, Hainanese, Hokkienese, and Hakka dialects.

    The Vietnamese Kinh population, which includes individuals of mixed Chinese ancestry, has been shaped over centuries by deep cultural and ethnic integration. This blending is also evident within Chinese-Vietnamese communities, particularly in the diaspora. In contemporary Vietnamese music concerts, especially those produced in the United States, audiences may observe that many performers, often recent immigrants, exhibit physical traits commonly associated with East Asian heritage. Such visual similarities can make distinguishing between Vietnamese and Chinese appearances challenging.
    This phenomenon is especially noticeable in high-definition recordings of popular Vietnamese entertainment programs, including Asia Entertainment (episodes 74 through 79), Thuý Nga’s Paris by Night series (notably episodes 21, 50, 109, 110, 119, 130, and 138), and Vân Sơn productions (episodes 50 and 51). Filmed both in Vietnam and among the overseas Vietnamese community in the U.S., these shows showcase not only musical artistry but also the rich and multifaceted cultural identity of modern Vietnamese society.

    Ethnicity, however, involves far more complexity than a mark on the census survey, linguistic markers, or genealogical records alone. The phonetic forms embedded in Chinese-origin surnames, such as Huỳnh or Hoàng for 黃 (Huáng), and  or  for 武 (Wǔ), can offer clues about ancestral origins, including when and where families first settled in Vietnam. Visually, a Vietnamese national may easily be mistaken for a southern Chinese individual, and vice versa, particularly in provinces like Guangxi, Hunan, Jiangxi, and Guangdong. This phenomenon of mistaken identity between Chinese and Vietnamese individuals is observable even outside of Asia.

    In multicultural regions such as Southern California, where both communities have long coexisted, it is not uncommon for Vietnamese youth to be misidentified as Chinese, especially in group settings like school photographs or public gatherings. Unless one is comparing recently arrived northern Chinese students from Beijing with Vietnamese youth side by side, the distinction is often imperceptible. The author himself has frequently made such misidentifications in Chinatowns across North American cities. In essence, Vietnamese and Chinese individuals born and raised in Western countries like the United States are often indistinguishable by appearance alone.

    Moreover, unlike American Caucasians in Europe, who are generally distinguishable from local white Europeans, it is nearly impossible to immediately identify Vietnamese travelers among Chinese locals in markets or restaurants across cities in China. Many Vietnamese visitors report being mistaken for Chinese nationals, often assumed to be from another province. This perception stems from the shared physical traits between Vietnamese and Southern Chinese populations, particularly those of Taic-Yue origin, as opposed to Northern Chinese groups of Altaic descent. The author, who speaks Mandarin with a non-native heavy accent and has a darker complexion, was frequently mistaken for a Guangdong native while in Beijing. Regardless of political distinctions, Vietnamese citizens holding U.S. passports are often addressed in Chinese at border checkpoints. The author personally experienced such misidentification nine out of ten times at Chinese border gates, despite his passport clearly stating Vietnam as his birthplace.

    Anthropologically, two observable patterns emerge among overseas Chinese from Vietnam, particularly in their resettlement behavior in North American cities. In major urban centers such as San Francisco, Oakland, Los Angeles, and New York, Chinese-Vietnamese businesses are commonly found in either traditional Chinatowns or Vietnamese enclaves known as "Little Saigon". This occurs despite the historical resentment many Vietnamese hold toward Chinese imperialism. A similar sentiment exists among Koreans regarding Japanese colonial rule. Yet, Korean immigrants also tend to cluster their businesses in designated ethnic zones like Japan Town or Korean Town. These patterns suggest that, at a subconscious level, immigrant communities gravitate toward familiar anthropological environments.

    Interestingly, Chinese-Vietnamese individuals often prefer socializing with fellow Vietnamese, whether recent arrivals or long-established overseas Vietnamese, rather than with Chinese expatriates from Hong Kong, Taiwan, or mainland China. This preference reflects historical migration trends: most early Chinese immigrants to the U.S. prior to the 1980s were Cantonese speakers. Among the four Confucian-influenced societies, China, Vietnam, Japan, and Korea, people tend to seek out others with whom they share cultural and anthropological affinities, even in diaspora settings.

    Linguistically, Japanese and Korean have historically borrowed extensively from Chinese, incorporating a full set of Chinese characters into their writing systems. Yet, their spoken languages remain toneless and phonetically distinct, easily recognizable even to untrained Chinese or Vietnamese ears. In contrast, when a Chinese dialect is spoken, say, in a ballroom setting, a Vietnamese listener may need to concentrate closely to determine whether it is not simply another Vietnamese subdialect. This is due to the tonal similarities and pitch contours shared between the languages. Westerners often mistake Cantonese for Vietnamese for this reason. The author's wife, a Hainanese Han speaker familiar with Cantonese, remarked that Vietnamese sounded strikingly similar to Cantonese upon first hearing it.

    To illustrate this linguistic proximity, one might analogize Mandarin to English, Cantonese to German, and Vietnamese to Dutch, each sharing structural and phonetic features that make them perceptually adjacent in a comparative guessing game.

    Table 7: Races and languages

    It is important to recognize that race and language are not always intrinsically linked. In many cases, a population’s linguistic identity may diverge significantly from its ethnic origins. For example, several Asian countries — such as India, the Philippines, and Singapore — have adopted English as an official language, despite it being non-native. Similarly, Latin American nations predominantly use Spanish or Portuguese, languages introduced through colonization, as tools for national communication and unity.

    This phenomenon parallels the linguistic evolution of China’s "Middle Kingdom", where Mandarin, now known as Putonghua (普通話), meaning "common speech", emerged as a standardized national language. Historically referred to as Guoyu (國語), or  "national language", it continues to serve as a unifying medium across diverse ethnic and regional groups.

    In these contexts, linguistic proficiency does not always align with ethnic heritage. For instance, reports indicated that early version of Apple's iPhone2 voice recognition system (Siri) more accurately understood English spoken by individuals of Indian descent than by native-born Americans. This is unsurprising, given the widespread use of English across India, where regional accents have evolved into distinct English dialects, some of which may be challenging for second-language learners to comprehend.

    This broader observation is relevant to the historical development of the Vietnamese language. It is plausible that an early form of Vietic speech functioned as a kind of lingua franca among indigenous populations and Han Chinese colonists following the Han conquest of Jiaozhi in 111 B.C. Over time, this hybridized mode of communication may have gradually evolved into what we now recognize as modern Vietnamese, a language shaped by centuries of contact, adaptation, and integration.

    In the case of Vietnam, ethnic identity, particularly among the Kinh majority, remains entangled with unresolved questions surrounding historical genetic affiliations with Han Chinese ancestry. While one might hope that advances in DNA mapping could settle the debate, the reality is more complex. As seen in studies conducted on the Taiwanese population, genetic data often yields mixed results, complicated by layers of human emotion, cultural identity, and historical memory.

    It is reasonable to assume that the genetic composition of many Vietnamese individuals is compatible with that of Han Chinese populations in southern provinces such as Fujian, Guangdong, Hunan, and Guangxi. These regions are home to Sinicized descendants of the ancient "Hundred Yue" (百越 民族), which included diverse subgroups such as YuYue (于越), GanYue (干越), MinYue (閩越), DongOu (東甌), DongYue (東越), NanYue (南越), XiOu (西甌), LuoYue (駱越), OuYue (歐越), YangYue (揚越), DianYue (滇越), TengYue (騰越), and YueXi (越雟). These indigenous groups inhabited the southern periphery of China long before the Qin-Han era (先秦漢).

    The southern Han Chinese retained a genetic blend that included early Taic peoples from the Chu State—whose subjects, including Liu Bang, the founding emperor of the Han Dynasty, were originally Chu natives—as well as Yue populations from the NamViệt Kingdom (南越 王國), which once spanned from present-day Guangdong to northeastern Vietnam. In contrast, in Huabei or China North, following the permanent occupation of northern China by Altaic nomadic groups, including Turkic, Tartar, and Mongol peoples, the genetic makeup of Northern Chinese populations became increasingly distinct from their southern counterparts. 

    As a result, the Sinicized populations of China South continue to exhibit physical traits that distinguish them from Northern Chinese groups in regions such as Shaanxi, Shanxi, Shandong, and Beijing. These differences underscore the complex interplay of genetics, migration, and cultural assimilation that has shaped the ethnographic landscape of both China and Vietnam.

    Table 8: The Chinese mentality is the emigration mindset

    The Chinese have long demonstrated a strong inclination toward emigration in pursuit of better opportunities in a new land, a pattern that has significantly influenced the racial and cultural makeup of Vietnam, particularly after its independence in the 10th century and continuing into the present day. Notably, while ancient Annam successfully repelled Mongol invasions three times during the 13th century, the fall of the Song Dynasty triggered a wave of refugees from mainland China that spilled across Vietnam's southern border.

    Historically, Vietnam endured centuries under Chinese rule, governed by successive dynasties that facilitated a continuous influx of Han migrants. Despite this prolonged exposure to Sinicization, Vietnam has maintained a distinct national identity and sovereignty, with its people consistently resisting cultural assimilation imposed by China over the past two millennia.

    In comparison, Taiwan has experienced approximately 355 years of sustained connection with mainland China, particularly through waves of immigration from Fujian Province. These settlers eventually outnumbered the indigenous Austronesian population, accelerating the island's Sinicization, a process that began over 2,200 years ago on the mainland. In this regard, Taiwan's evolution into a distinct sovereignty in the 21st century mirrors Vietnam's earlier experience as a Chinese colony from 111 B.C. to 939 A.D.

    Meanwhile, Chinese refugees and immigrants have continued to arrive in what is now Vietnam, further shaping its demographic landscape. Taiwan, too, reflects this emigration mindset, sharing Vietnam’s legacy as a destination for Chinese settlers. The racial composition of both countries, therefore, can be seen as products of China's long-standing emigration ethos.

    For those unable to emigrate during their lifetimes, the traditional saying 離鄉背井 (to leave one's homeland and well) serves as a cultural rationale, preserving dignity in the face of displacement. Yet, in reality, Chinese emigration has profoundly transformed the social and cultural fabric of many host nations around the world.


    The inclusion of Taiwan in the broader discussion serves to highlight a series of parallel historical developments that have shaped the identity of Vietnam. Both nations have navigated complex relationships with China, marked by waves of migration, cultural influence, and political tension. In each case, segments of the population—particularly those with Chinese heritage—have coexisted alongside staunch nationalists who remain wary of Chinese ideological expansion and resist further integration.

    While Taiwan’s recorded history is relatively brief compared to Vietnam’s, its experience with China has been less extensive. Vietnam endured over a millennium of direct Chinese rule, beginning with the Han conquest in 111 B.C., followed by successive dynastic occupations and sustained cultural imposition. In contrast, Taiwan’s connection to the mainland—though significant—has been more episodic and less deeply entrenched, especially when considering prehistoric interactions and the full scope of historical contact that shaped Vietnam’s national trajectory.

    To illustrate the enduring legacy of migration and cultural blending, consider the following episode involving Chinese-Vietnamese refugees who resettled in the United States between 1975 and 1995. This story is chosen for its symbolic resonance with the ancient saga of the Yue people's displacement, those who, over two millennia ago, fled southward across the rugged Lingnan mountain ranges from their native southern China into Vietnam’s Tonkin region. That historic movement laid the foundation for the emergence of the Kinh majority in ancient Annam and offers insight into how the Vietnamese language evolved over time, regardless of which modern variant one chooses to examine.

    Now, let me take you to a corner of America where mistaken identities often arise, specifically, in distinguishing Vietnamese individuals of Chinese descent. Though the example is modest, its implications echo across broader cultural and historical contexts.

    As a regular patron of a vibrant Vietnamese café in Oakland’s Chinatown, California, the author has come to appreciate not only the cuisine but also the people behind it. The cooks serve up authentic Vietnamese dishes with care, and over time, the author has come to know several of the staff through casual conversation and shared memories. Like himself, many were boat people, refugees who fled Vietnam after the fall of Saigon in 1975.

    Genealogically, it’s likely that some of their ancestors were also refugees from the collapse of the Ming Dynasty in 17th-century China, fleeing the Manchu conquest and resettling in Vietnam. Many of these early migrants were Teochow (Chaozhou) speakers. In daily life, beyond their fluency in Vietnamese, the author has overheard them conversing in various Chinese dialects with local Chinese customers. Their ability to switch seamlessly between Vietnamese and Chinese dialects, fluidly and unconsciously, is genuinely admirable.

    The owner and most of the staff, like many Vietnamese nationals, are ethnically mixed, Vietnamese with Chinese ancestry. The cooks, however, speak only Vietnamese. Whether they too have Chinese roots is uncertain—perhaps only Heaven knows. Yet their presence adds another layer to the rich tapestry of cultural and ethnic blending that defines the Vietnamese diaspora in America.

    In all probabilities, I have never questioned the authenticity of the tasty food they cook, presumably Vietnamese cuisine, a few items obviously being of Chinese origin but you consume them like any Vietnamese dishes anyway because the Vietnamese version of them differs from the Chinese cuisine with less oil. For the Chinese palates they mostly taste with common ingredients as prepared in Chinese culinary, such as herbal anise and cinnamon. The only exception, however, is that the Vietnamese dishes are usually being sprinkled with fish sauce and added bits of lemongrass that make the taste to stand out, for example, the subtle flavor of Vietnamese pork pulls vs. Chinese dongpo meat stew. In all I enjoy their cooking with those delicious dishes in my favorite cafe. The keyword in Vietnamese cusine is balance, always carrying the two halves of opposite taste in their mixed seasoning, e.g., salty vs. sweet, sour vs. bitter.

    You may also love Vietnamese-seasoned Chinese dishes – or Chinese food with Vietnamese flavors, for that matter – as you do with those of southern-styled Khmer food, which adds up a bit more of sweet and sour balanced taste, to say the least; yet, they are not the same as Khmer or Thai plates. All said, metaphorically, the deliberate details are brought up therein is to illustrate an analogy of the racial and linguistic admixtures streaming southward from the north throughout the length of Vietnam history. How good the Chinese-Vietnamese food servers in the shop are identified with the Vietnamese nowadays is what they interact with their Vietnamese fellow countrymen overseas, talking and behaving like any natives of Vietnam, such as idolizing Vietnamese pop singers or gossiping some Vietnamese showbiz scandals, for example. All 'Vietnamized' characters associated above represent a fair picture of Chinese minority, especially those of Tchiewchow ethnicity in southern Vietnam that has totally immerged into the Vietnamese melting pot as opposed to other Chinese newcomers lately in the contemporary period. For them former group, they usually identified themselves in the official census as "Kinh" versus the "Hoa" by the latter group.

    The existing Sinitic-Vietnamese words become organic matters of linguistics just like the air and food around that Vietnamese speakers breathe and eat without even questioning the 'foreign Sinitic' elements in them. Analogously, compared to what some of us might still remember how we reacted when we happened to notice and secretly admire how a young German salesperson in a store somewhere in Germary spoke English so well, fluently not much differently a Britain's native. Similarly, don't you realize that we somehow paid much more attention to some rare one-of-a kind American comedian or pop singer who can talk and sing in Vietnamese in Paris by Night's concerts? In contrast, we as Vietnamese historical linguists have missed the same notable conjecture with those Vietnamese of Chinese descents (CV) – like the multilingual food servers who can speak Vietnamese and multiple Chinese 'languages' in the Vietnamese cafe mentioned above, to say the least. One of the reasons we have taken it for granted is that it was 'no big deal' for a Vietnamese of Chinese descents to acquire Vietnamese with native fluency. The point to make here is that we expect them a 'part' of Vietnamese national just like any Kinh individuals. Ironically, the Chinese heritage of all of the above is stripped off in the plain view.

    If you are a Vietnamese national, take a moment to look closely at your social circle. You may gradually come to realize that many of your acquaintances such as friends, colleagues, even extended family, trace their ancestry to Chinese immigrants, a detail that may have gone unnoticed in everyday interactions. For most, this heritage has never been a point of contention or prejudice. It simply blends into the fabric of Vietnamese society, where ethnic lines have long been blurred through centuries of migration and assimilation.

    Readers may find it worthwhile to explore their own family genealogy. Who knows if your ancestors may have been among those who fled China generations ago, eventually becoming part of the Vietnamese Kinh majority. This quiet transformation, shaped by waves of Chinese immigration, has contributed to the diverse yet cohesive identity of modern Vietnam. And within this society, there is little discrimination toward those of Chinese descent, perhaps because many Vietnamese themselves share that lineage.

    Consider the odds of becoming a celebrity in Vietnam, perhaps one in tens of thousands. The author has observed that a surprising number of well-known Vietnamese pop stars appear to have recent Chinese ancestry, as suggested by their given names. In many cases, these names subtly reveal their heritage. Artists such as Lam Trường, Quách Thành Danh, Huỳnh Trấn Thành, and Đàm Vĩnh Hưng, along with others like Lâm Ngọc Thoa and Lều Phương Anh, to say the least, exemplify this phenomenon. Their names often carry phonetic or structural traces of Chinese origin, and when asked, their responses tend to confirm it. 

    This cultural blending reflects a deeper linguistic and historical reality. Chinese, as a language, emerged from the admixture of Yue and proto-Tibetan elements within the broader Sino-Tibetan family. Vietnamese, in turn, can be viewed as a linguistic sub-branch of this Sinitic lineage one that has inherited and localized Chinese linguistic features over generations. In essence, Vietnamese language and identity have evolved within the same cultural pond, shaped by shared ancestry and historical convergence.

    Those with Chinese heritage, whether recent or distant, are not outliers but integral threads in the tapestry of Vietnam’s racial and cultural composition. They represent a significant portion of the national demographic, quietly contributing to the richness and complexity of Vietnamese identity (see Appendix L.)

    In the modern era, the formation of Vietnamese national identity—both within the country and across its diaspora—is deeply rooted in Confucian values, which are, in essence, Chinese cultural constructs. These values have permeated Vietnamese society for centuries, shaping its institutions, customs, and intellectual traditions. To understand this entanglement, one must imagine a time over 1,000 to 2,200 years ago, when segments of the ancestral Viet population began migrating southward from China’s southern regions.

    By the 13th century, these early emigrants had crossed the 16th parallel into the newly acquired territories of the Kingdom of Champa. Later waves continued their southward journey, reaching the southernmost tip of present-day Càmau Cape by the 18th century. This expansion occurred independently of earlier resettlements in the Indochinese peninsula, where Mon-Khmer populations had long been established.

    Along their migratory path, these settlers encountered a diverse array of individuals—exiled officials, land speculators, vagabonds, fugitives, and refugees—many of whom shared linguistic and cultural affinities. Vietnamese served as their common tongue, a medium of communication that masked the subtle Sinitic elements embedded in everyday interaction. Over time, through sustained contact with Chamic and Khmer communities, Vietnamese absorbed foreign lexicon seamlessly, blending with Sinitic roots as naturally as air mixes with water.

    Historically, Vietnam has remained in a quiet but constant state of preparedness for conflict, even during moments of apparent détente with its northern neighbor. A familiar pattern recurs: whenever a Chinese dynasty consolidates its power, its ambitions inevitably extend southward toward the so-called "renegade Annam". From the Han Empire to the modern People's Republic of China, successive Chinese regimes have viewed Vietnam not as a peer, but as a territory to reclaim, pressuring the southern frontier both inland and across the sea.

    Conventional wisdom might suggest that Vietnam, given its size and resources, would be overmatched in a modern confrontation. Yet history offers a different verdict. Despite internal challenges that are plagued with corruption, complacency, or political instability, Vietnamese patriotism has repeatedly proven resilient. When cornered, the Vietnamese have consistently resisted foreign domination and, time and again, prevailed against Chinese invasions. Their continued existence as a sovereign nation stands as enduring proof of that defiant spirit.

    This assertion is not hyperbole. A close examination of China’s geopolitical history reveals that Vietnam’s emergence as an independent state in 939 A.D. was never formally acknowledged in Chinese records. Instead, Annam was dismissed as a rebellious prefecture that vanished from imperial chronicles after its break from the collapsing Nan Han regime. Yet, against all odds, Vietnam reappeared in modern history as a sovereign entity, lacking continuity in Chinese historiography but asserting its own national narrative.

    The belief in inevitable victory over Vietnam has persisted across Chinese dynasties, from Han, Tang, and Song to Yuan, Ming, Qing, and the communist Red China. This mindset continues to shape China's foreign policy, as evidenced by its provocations: the land invasion of 1979, maritime clashes in 1974 and 1984, and the deployment of oil rigs into disputed waters in May 2014. That incident sparked violent riots in Vietnam, with over 100 Chinese-owned factories vandalized and migrant workers evacuated. Since 2015, China has escalated its presence in the South China Sea, constructing naval bases and asserting unilateral territorial claims, what Vietnam calls the "Eastern Sea". 

    What does this geopolitical tension have to do with linguistics? Everything. In both China and Vietnam, history has often been curated to serve political agendas. Western scholars, wary of controversy or unaware of the nuances, tend to avoid the political implications embedded in linguistic studies. This leaves them puzzled by the reluctance of Vietnamese scholars to acknowledge the profound Chinese cultural imprints on Vietnamese life.

    On the sidelines, it is no longer a question of whether Vietnamese nationalist hardliners—who are, after all, part of the broader narrative—can come to terms with the historical facts presented here. These insights are not merely academic; they are pivotal enough to challenge long-held assumptions and provoke deeper reflection. Some readers within the nationalist camp may already find themselves reconsidering their stance, perhaps even entertaining the idea of exploring the long and rugged Sino-Tibetan path of linguistic and cultural affiliation. In doing so, they may begin to see beyond isolated fragments and start recognizing a more expansive and interconnected historical landscape.

    In the formation of national identity that is much like the circumstances of one’s birth, individuals have no control over the historical trajectory of their country. Yet that history can deeply shape collective perception, often clouding objective thought with inherited prejudice. For Vietnamese citizens of Chinese descent who do not speak Vietnamese with native fluency, there remains a tendency to be indiscriminately categorized as part of the ethnic Chinese minority, regardless of their actual cultural integration or generational ties to Vietnam.

    Many among this group had long been part of Vietnam’s population prior to 1979. Some joined the mass exodus as boat people refugees, while others remained and gradually assimilated into the Kinh majority. In cities across Vietnam, the physical boundaries of Chinatowns faded as Chinese emigrants departed and Chinese-language schools were shuttered. Yet despite this integration, remnants of bias persist.

    Even today, Vietnamese television sitcoms occasionally feature comedians mimicking the speech of Chinese-Vietnamese individuals with exaggerated accents. Though often portrayed as innocent humor, such performances are shameful by modern standards. In Western societies, particularly in the United States, except in Trump's era, this kind of behavior is considered politically incorrect, even in private settings. A public figure engaging in such mockery would risk lasting disgrace.

    For those unfamiliar with the nuances of Vietnamese society, especially those living abroad, it is important to recognize that such portrayals reflect deeper issues of cultural sensitivity and historical tension. What may pass as casual entertainment in one context would be deemed unacceptable in another, underscoring the need for greater awareness and respect across cultural boundaries.

    Certain truths are often overlooked in discussions of Vietnamese identity and linguistic heritage. First, Chinese is a culture, not a race. Second, China has functioned as a multiethnic union since the Qin-Han era. Third, Vietnam, once part of that union, broke away in 939 A.D. and has maintained its sovereignty ever since. Despite the enduring influence of Confucian values—or more precisely, the legacy of a socialist authoritarian regime—many Vietnamese scholars resist acknowledging the depth of Chinese cultural and linguistic impact, fearing it may undermine nationalist ideals. Ironically, this resistance compromises academic neutrality, making it difficult to objectively trace the origins of the Vietnamese language.

    Shaped by centuries of mistrust toward China, Vietnamese scholars often respond emotionally to Sino-centric interpretations of history and linguistics. Their scholarship tends to be instrumental that is designed to safeguard national identity rather than pursue historical truth. As a result, academic objectivity remains elusive. While some non-Vietnamese researchers have made meaningful contributions by adopting a more neutral stance, their work is rarely recognized within Vietnam’s intellectual circles.

    The author has chosen to write this paper in English as both a strategic and philosophical decision. It is intended for readers who may be more open to a candid and critical exploration of the Vietnamese language’s origins, particularly its complex relationship with Chinese linguistic traditions. This is a bold undertaking in a field fraught with ideological sensitivities, one the author approaches with cautious optimism and unwavering resolve.

    He is fully aware of the risks involved. Others who have pursued the Sino-Tibetan path have often faced rejection or silence, and the Vietnamese linguistic establishment continues to resist any suggestion of hereditary affiliation between Vietnamese and Chinese. Yet such resistance, rooted in nationalist bias, will not deter the author’s commitment to advancing this inquiry with sincerity. If recognition ever comes, he suspects it may only be granted posthumously, as has been the case for many who challenged prevailing narratives.

    Writing in English also serves a practical purpose: it creates a buffer between the work and certain audiences who might otherwise respond with hostility. Until someone takes the initiative to translate it into Vietnamese, the author prefers to avoid direct confrontation with nationalist zealots—particularly those lacking academic training. Their reactions, often fueled by ideological conditioning under modern Vietnamese socialism, reflect a mindset unlikely to shift within our lifetime.

    As long as the specter of northern aggression looms, each generation of Vietnamese tends to harbor latent antagonism toward Sino-centric interpretations. This simmering sentiment, often amplified by nationalism, can erupt into fervent anti-Sinicism, sometimes hysterical, sometimes overwhelming. Such emotional suppression risks distorting academic discourse, potentially indoctrinating entire schools of thought to reject any serious engagement with Sinitic theorization. The author’s efforts to address Chinese-Vietnamese etymological connections may thus be met with blunt dismissal.

    Adding to the complexity of this inquiry is the resistance from another front: the Mon-Khmer traditionalists within Western linguistics, who remain steadfast in defending the Austroasiatic paradigm. Paradoxically, however, some of the author’s most unexpected allies have emerged from within that very camp: Western scholars who, despite their affiliations, have shown a willingness to listen and engage with alternative perspectives. Yet skepticism persists, and understandably so.

    Their reservations may stem from several factors. First, the author’s reconstruction of ancient phonology may appear unconventional or lack sufficient empirical grounding. Second, the methodologies favored by Western linguists often struggle to accommodate the intricacies of tonal languages, which complicates comparative analysis. Third, it may simply be a matter of presentation—the author may not yet possess the rhetorical polish needed to effectively communicate and advocate for his ideas.

    To the reader: as you engage with this work, try to set aside personal biases and let intellectual curiosity lead the way. The discovery of new Sinitic-Vietnamese etyma offers a rare opportunity to deepen your understanding and may prompt those in the field to reconsider their academic direction. Whether you choose to follow this rugged path or not, know that it carries the risk of isolation—but also the promise of profound insight into a neglected yet vital area of linguistic study.

    Vietnamese scholars, often functioning as extensions of the state apparatus, tend to operate within a rigid framework shaped by political expectations. Unlike their counterparts in the West, they struggle to embrace the principle that academic inquiry should remain independent of political influence. When politics infiltrates scholarship, it inevitably compromises the authenticity of academic achievement.

    To readers who reject the notion of apolitical history, the author respectfully seeks understanding for the unpopular viewpoint presented in this research. The goal is not to provoke but to allow the work to stand on its own merit without having to battle for recognition. In the past, the author refrained from engaging with online critics, choosing instead to focus on refining his research. But there is little value in continuing to tiptoe around nationalist fervor.

    Let us invoke the spirit of a Vietnamese proverb: "Mất lòng trước, được lòng sau." (Better to offend first and earn respect later.)  With that in mind, the author lays all political cards on the table. If politics must influence academia, let it do so in the realm of history, where Vietnam and China have been entangled since antiquity. Linguistics, by contrast, should remain a space for objective analysis, free from ideological distortion.


    IV) Prelude on the Sinitic etyma

    As the term suggests, a prelude (Chin. 序言, VS 'lờitựa') offers no definitive conclusions, but it sets the stage for what follows. This section introduces the underlying Sinitic elements embedded within the Vietnamese etymological layer, elements that have long lain dormant yet may hold the key to reinforcing the fragile foundations of Sino-Tibetan linguistic theory. While some well-informed Vietnamese readers may find the presence of multiple Chinese substrates unsettling, especially when these are layered atop what are traditionally considered native residues or mere loanwords. The purpose here is not to provoke, but to identify compelling etymological candidates through a series of intriguing observations.

    To be clear, the postulation of Yue versus Chinese substrata in Vietnamese is not meant to assert a direct genetic lineage between Vietnamese and Chinese dialects, nor between Vietnamese and Tibetan languages. Instead, the undeniable lexical and phonological similarities suggest a possible kinship—metaphorically speaking, “long-lost relatives” within the broader Sino-Tibetan family. When constructing a linguistic family tree that links Vietnamese to Sino-Tibetan roots, it is the shared features of tonality, phonological structure, and semantic intimacy that point toward a deeper affiliation.

    Take, for example, the Vietnamese terms for “mother”: 'mẹ', '(cậu)mợ', 'má', 'u', 'nạ', 'mẹ đẻ', 'mẹ ruột', 'mẹ ghẻ'. These correspond closely to Chinese equivalents such as '母 mǔ', '(舅)母 (jìu)mǔ', '媽 mà', '姆 mǔ', '娘兒 niár', '母親 mǔqīn', '親母 qīnmǔ', and '繼母 jīmǔ'. Likewise, Vietnamese terms for “father” such as 'bố' and 'tía' align with '父 fù' and '爹 diè', respectively. These are not obscure or specialized terms; they belong to the core vocabulary of both languages and reflect deep-rooted parallels that merit further linguistic investigation.

    The lexical items examined here, presented in their native encapsulation, form the foundation of the author's approach to a controversial hypothesis regarding Sinitic influence on Vietnamese. The objective is to isolate linguistic features that may suggest shared etymology, particularly where plausible cognates emerge through patterns of sound correspondence. One illustrative case is the Chinese expression 抵賴 (dǐlài), meaning “to deny” or “to shift blame,” which the author proposes as the source of the Vietnamese compound 'đỗlỗi'. In parallel, 賴 (lài) is posited as the origin of 'tại', meaning “because of”—a term that coincidentally shares phonetic similarity with the locative adverb 在 (zài, Sino-Vietnamese 'tại') that has little to do with the meaning "at".  The similar sound correspondence follows a phonological alternation pattern between initial consonants /l-/ and /t-/. Both terms convey notions of 'causality' and 'blame'. Meanwhile, in Sino-Vietnamese, 賴 is rendered as 'lại', meaning “to depend on.”

    This analytical framework diverges from conventional assumptions about Chinese loanwords, which typically rely on direct phonetic equivalence of rigid pattern word to word. Herein, the author argues that 'đỗlỗi' reflects a more nuanced structure. While it may be classified as a loanword, its internal composition reveals layered semantic associations: 'đỗ' is linked to 倒 (dǎo, SV 'đảo'), meaning "to pour over," and 'lỗi' to 罪 (zuì, SV 'tội', "guilt"), meaning "wrongdoing".

    Further supporting this associative model is the Vietnamese expression 'đỗthừa', meaning "to shift blame,", which aligns with the Chinese compound 推卸 (tuīxiè). These examples illustrate a core principle of the author's methodology: polysyllabic compounds are broken down into morphemic syllables, each carrying distinct semantic value and traceable to individual Chinese roots. In this framework, 'đỗlỗi' is parsed into 'đỗ' (< 倒  dào) + 'lỗi' (< 罪 zuì), with each syllable functioning as an independent lexical entity capable of etymological linkage.

    This same principle applies to the analysis of 在意 (zàiyì), meaning "to care" or "to pay attention", which the author equates with the Vietnamese 'đểý'. Such comparisons support the broader hypothesis that Vietnamese-Sinitic etyma can be identified not merely as loanwords, but as plausible cognates when examined through the lens of sound change and semantic convergence.

    While these items may still be classified as loanwords in a strict linguistic sense, the author's proposed framework invites a reevaluation of their origins. By focusing on phonological transformation and morphemic decomposition, this approach offers a novel pathway for tracing Vietnamese-Sinitic cognacy beyond conventional borrowing models.

    The objective of this section is to familiarize readers with such postulations and to address questions surrounding the presence of Sinitic elements in Vietnamese. The author will examine linguistic traits in contemporary Vietnamese that mirror peculiarities found in various Chinese dialects. Readers will come to understand how and why certain colloquial expressions are interchangeable across Vietnamese and Chinese, often without formal acknowledgment. Examples include: 'lâylất' 賴活 (làihuó, "hand-to-mouth"), 'bànchân' 腳板 (jiǎobǎn, "sole of the foot"), 'ănmày' 要飯 (yàofàn, "beggar"), 'đitiền' 隨錢 (suìqián, "give a monetary gift"), and scholarly Sino-Vietnamese idioms like 'sưtửHàđông' 河東獅子 (Hédōngshīzǐ, "a tiger wife from Hadong") or 'máuđàonướclã' 血濃於水 (xuěnóngyúshuǐ, "blood is thicker than water").

    These examples, among others, underscore the intricate and often overlooked interplay between Vietnamese and Chinese linguistic traditions. The author’s approach invites readers to reconsider long-held assumptions and explore the possibility of deeper, historically grounded connections.

    In the early stage of this survey, the author has compiled more than 420 essential monosyllabic lexical items sourced from a broad array of Sino-Tibetan etymologies, as originally documented by Shafer (1972). These entries were chosen for close examination and serve as conceptual anchors for exploring deeper linguistic affiliations (see Chapter 10 on Sino-Tibetan etyma.) The remarkable similarity many of these items bear to Vietnamese vocabulary raises a compelling question: how has such linguistic proximity gone largely unnoticed?

    For seasoned researchers in the field of Sinitic-Vietnamese historical linguistics, the presence of cognate relationships is difficult to ignore. Yet, the broader endeavor of establishing genetic affiliations across language families remains a formidable challenge—one that demands renewed scholarly engagement. It may ultimately fall to a future generation of Sino-Tibetan specialists to revisit and refine Shafer’s foundational work with updated methodologies and a more rigorous comparative framework.

    The central thesis of this study is anchored in preliminary etymological evidence drawn from a range of Sino-Tibetan languages, which will be explored in greater detail in the following chapter. While the paper offers original insights, it does not explicitly seek to reignite the contentious debate over whether Vietnamese should be reclassified within the Sino-Tibetan family. Nonetheless, readers sensitive to the notion of Chinese influence should be forewarned: the hypotheses advanced here involve identifying Vietnamese etyma with potential Chinese origins—an endeavor that may challenge entrenched assumptions and nationalist interpretations.

    For instance, consider the term 'Tiều' as a variant of 'Tàu' (meaning “Chinese”), which may trace back to Middle Chinese 朝 (cháo, zhāo, zhū), reconstructed as MC ɖiaw and Old Chinese *r’ew. Similarly, 水 (shuǐ) appears to correspond with Vietnamese 'nước' or 'nák' (possibly derived from 'đák'), meaning "water", and with 'sông' or 'kông' (from 'krong'), meaning "river", the latter showing phonetic parallels with Cantonese 'kong5' and 工 /kong1/ ('work').

    Other examples include:

    • 川 chuān 'dòng', 'con' ('stream', 'current') [ M 川 chuān < MC tɕʰʷiɛn < OC *kʰjon ],
    • 井 jǐng 'giếng' ('the well'),
    • 艘 sǎo 'tàu' ('ship'),
    • 江 jiāng 'sông' ('river'),
    • 泉 quán 'suối' ('spring'),

    These correspondences suggest a network of doublets and cognates embedded in Vietnamese that reflect deep Sinitic roots.

    Additional examples reinforce this pattern:

    • 日 rì for 'giời' ('sun') vs. 天 tiān 'trời' (sky) vs. 太陽  tàiyáng 'trờinắng' ('sunshine'),
    • 月 yuè 'giăng' ('moon') vs. 個月 gèyuè 'contrăng' ('monthly moon') vs. 年月 niányuè 'nămtháng' ('months and years'),
    • 石 shí 'đá' ('stone') vs. 石 dàn for 'tạ' ('a weight unit'),
    • 土 tǔ 'đất' ('soil') vs. 地 dì 'địa' ('earth'),
    • 鼠 shǔ 'chuột' (rat) vs. 子 zǐ 'chuột' ('as in the zodiac'),
    • 羊 yáng 'dê' (goat) vs. 未 wèi 'dê' ('zodiac'),
    • 貓 māo 'mèo' (cat) vs. 卯 máo 'mèo' ('zodiac'), and so on or forth.

    Some of these lexical parallels have been noted by earlier Sinitic scholars, while others remain provocative and open to further inquiry. Together, they form a compelling body of evidence that invites a reexamination of Vietnamese linguistic origins through a Sinitic lens.

    Until now, mainstream linguistic discourse has largely positioned Vietnamese basic vocabulary within the Austroasiatic Mon-Khmer framework. For example, the Khmer numerals from one to five — "muəj", "piː (pɨl)", "ɓəj", "ɓuən", and "pram" — are routinely cited as cognates of Vietnamese "một", "hai", "ba", "bốn", and "năm". Because these are basic words, the assumption follows that they must share a common linguistic ancestry. Yet this approach, repeated across countless studies, leaves us in a defensive posture, asking the same unresolved question: "What about numbers six through ten?"

    The narrative has become a recursive echo of early Austroasiatic Mon-Khmer theorists, perpetuated like a chain of reposts from the same uncritical source. As a result, search engine returns for queries like "Vietnamese basic words" overwhelmingly reinforce this view, effectively sidelining alternative perspectives, particularly those rooted in Sino-Tibetan analysis. This repetition shapes public and academic perception, often before readers have had the opportunity to explore competing theories.

    The author is concerned that his newly proposed Sinitic-Vietnamese etymological framework may be prematurely dismissed, not only due to entrenched academic bias, but also because of broader political sensitivities surrounding Chinese influence. To counter this resistance, he begins with a gentle approach. Metaphorically, the task resembles restoring a faded painting: carefully tracing and retouching obscured details until the original image reemerges with clarity.

    In the next chapter, the author will present a body of evidence of over 420 Vietnamese lexical items that show compelling associations with a wide range of Sino-Tibetan etyma. It is his hope that this work will prompt the linguistic community to reconsider long-held assumptions and begin a serious investigation into the proposed affiliations outlined in Chapter 10 on Sino-Tibetan etyma.

    The presence of Sinitic-Vietnamese etyma presented in this study plays a crucial role in establishing a visible Sino-Tibetan perspective within the digital linguistic landscape. As long as researchers continue to stake out intellectual space online, literally and figuratively, they offer readers alternative viewpoints beyond the dominant Austroasiatic narrative that saturates search results whenever queries on Vietnamese etymology arise. This is not a game of catch-up; it is a deliberate effort to build a network of hyperlinked indices, whether modest or expansive, that guide readers toward the Sinitic framework. The author’s strategy involves disseminating hundreds of Sinitic-Vietnamese etyma across cyberspace, laying the groundwork for future scholars to build upon.

    Consider, for example, besides the commonly quoted 'sông' (river) aligns with 江 jiāng (SV giang), while 'suối' (creek) corresponds to 泉 quán (SV tuyền), diverting attention from 'dòng' 川 chuān (SV xuyên), other foundational Vietnamese words and their Sinitic counterparts are worth mentioning:

    • 'cửa' ("door") reflects 戶 hù (SV hộ), subtly masking 口 kǒu (SV khẩu),
    • 'hiểu' ("understand") parallels 會 huì (SV hội), replacing 曉 xiǎo (SV hiểu),
    • 'hiền' ("good-natured") aligns with 善 shàn (SV thiện), supplanting 賢 xián (SV hiền),
    • 'ông' ("elder") connects to 公 gōng (SV công) and also to 翁 wēng (SV ông),
    • 'ong' ("bee") corresponds to 蜂 fēng (SV phong), and possibly 螉 wēng (SV ông),
    • 'lợn' ("pig") relates to 豚 tún (SV thốn), coexisting with 亥 hài (VS hợi),
    • 'chó' ("dog") matches 狗 gǒu (SV cẩu), while 'cầy' and 'cún' reflect 犬 quǎn (SV khuyển, "canine", "puppy"),

    These examples illustrate how one etymon may overlay another, and how disyllabic Vietnamese forms can yield doublets and homophones through associative sound change patterns. For instance:

    • 太陽 tàiyáng ("sun") → 'trời nắng'; 太 tài → 'trời'; 陽 yáng → 'nắng',
    • 天井 tiānjǐng ("sky well") → 'giếng trời'; 井 jǐng → 'giếng'; 天 tiān → 'trời',
    • 毫無 háowú ("not at all") → 'khônghề'; 無 wú → 'hổng' or 'không'; 空 kōng → 'không',
    • 拉活 làhuó ("to seek work") → 'làm việc'; 拉 là → 'làm' ~ 幹 gàn  →  'làm' ("work"); 幹活 gànhuó ("to work") → 'làmviệc'; 活 huó → 'việc' ("work"),
    • 安樂 ānlè ("peaceful and happy") → 'an lành'; 樂 lè → 'lành'; 良 liáng ("benign") → 'lành'.

    These patterns suggest a new set of rules for sound change and etymological association that have yet to be formally codified in the field. Identifying doublets in varied forms may also reveal hidden substrates. Importantly, historical linguistics does not operate under absolute formulas, despite what some conventional Vietnamese etymological studies may imply.

    While this research may not offer universally accepted conclusions, it has garnered attention and feedback since early drafts appeared online over a decade ago. The author is gratified to see that both Austroasiatic and Sino-Tibetan scholars have begun to acknowledge the significance of these findings. This survey, though open to refinement, contributes to unraveling the complex web of genetic affiliation between Chinese and Vietnamese etyma.

    Academically, this Sino-Tibetan etymology project is the product of painstaking effort, work that has indeed burned the midnight oil. Methodologically, it adheres to sound change principles rooted in Middle Chinese and Sino-Vietnamese phonology, while also incorporating Western analytical frameworks to support a cognitive approach to Sino-Tibetan etyma. Though it may not yet dismantle the prevailing Austroasiatic hypothesis, it offers a complementary classification of basic words into a core linguistic base.

    Historically, paradigm shifts in linguistics often require generational turnover. It may take another 60-year cycle for entrenched consensus to yield to new perspectives. By then, veteran theorists from both camps may have exited the stage, and a new cohort, unburdened by legacy biases, could revive the Sino-Tibetan theory with fresh energy and expanded evidence. As Austroasiatic resources become increasingly repetitive and depleted, the renewed Sino-Tibetan framework may gain traction.

    Historical context, as emphasized in earlier chapters, remains essential to understanding the development of Vietnamese and its speakers. This project uses strategic dissemination, akin to cultural restoration, to repair long-standing misconceptions. The dominance of Austroasiatic lexicons has misled many scholars in Sinitic fields. Yet their presence in Vietnamese is explainable: ancient migrations led to contact and word exchange, with some groups losing their native tongues and assimilating with local populations (Phan Hữu Dật, ibid.).

    Many Austroasiatic remnants in Vietnamese also carry Sinitic features, suggesting origins in southern China. If these traces stem from Yue sources, then the term "Austroasiatic" may itself be a misnomer, just as "Sinitic" is a linguistic designation rather than an ethnic one.

    As previously discussed, the term "Sinitic" is often understood today as referring to something affiliated with "Chinese" civilization. However, the concept itself predates the rise of the unified Qin Empire, from which the name "Qin" ultimately gave rise to the term "Sinitic." Ironically, this terminology has inadvertently strengthened Austroasiatic claims that seek to discredit Chinese linguistic influence. Their argument hinges on the assertion that since the Qin state had not yet emerged, the linguistic family labeled "Sinitic" could not have existed either; therefore, Vietnamese etyma could not plausibly originate from it.

    But this reasoning overlooks a critical point: the overwhelming presence of Sinitic elements in Vietnamese, over 99 percent of the Sinitic-Vietnamese etyma, including basic vocabulary, must have predated the Qin dynasty by centuries. If we cannot call it "Sinitic," then what should we call it?  Western scholars mostly unfamiliar with the terms like 'Taic-Yue' and 'Yue', etc., so to speak. And what of the term "Vietnamese," which itself did not exist in antiquity? In this paper, we use the term "Sinitic-Vietnamese" to encompass the shared linguistic heritage between Chinese and Vietnamese, regardless of whether specific etyma originated in Chinese or were shaped by Yue substrata and other linguistic forces moving in either direction.

    Vietnam’s historical trajectory reflects a gradual emergence from a Sinicized feudal colony. Ancient Annam remained a prefecture under Chinese imperial rule from 111 B.C. until 939 A.D., a fact emphasized repeatedly to underscore its significance. This pattern of influence continued into the era of Middle Chinese, particularly through northern Mandarin, which left a lasting imprint on Vietnamese phonology and vocabulary. The impact was especially pronounced during the fourth period of Chinese domination under the Ming Dynasty (1407–1427).

    Beyond official rule, waves of Chinese immigration driven by famine, war, and displacement, led to deep integration of Chinese settlers into Vietnamese society. These migrants brought with them not only cultural practices but also linguistic contributions, including basic words that had long been wrongly attributed to Mon-Khmer origins. Examples such as "chồmhỗm" (犬坐 quǎnzuò, "squat like a dog") and "hủtiếu" (果條 guǒtiáo, "rice pasta") reveal unsuspected Sinitic roots embedded in everyday Vietnamese speech.

    All things considered, the Vietnamese language may have evolved from an ancestral Yue form, initially resembling certain Taic variants likely spoken by the subjects of the ancient Chu State ("楚民")(see Appendix K - 越人歌 'Song of the Yue'; see also Bình Nguyên-Lộc, 1972). These forms existed long before the rise of Sinitic entities such as the Zhou, Qin, and Han dynasties. The Taic linguistic family also gave rise to Yue-related speech among southern Chinese ethnic groups like the Zhuang and Dai. Vietnamese likely underwent a developmental trajectory similar to that of Hokkienese (MinNan) and Cantonese (Jyut), both of which, by 939 A.D., had absorbed dominant Han and Tang linguistic elements that largely replaced their original aboriginal Yue forms spoken some 3,000 years ago (see Drake, F.S., ed., Symposium on Historical Archaeological and Linguistic Studies on Southern China, South-East Asia and the Hong Kong Region, 1967).

    Had Vietnamese been classified under the Sino-Tibetan family prior to the emergence of the Austroasiatic Mon-Khmer theory in the early 20th century, its Sinitic-centric features would have been more widely acknowledged. Loosely, Vietnamese might be described as a "Sino-Xenic topolect", or metaphorically, a Sinitic hybrid or "graft" language. This stands in contrast to an "adoptive language", a purely hybrid form (as Bloomfield described Albanese in 1933), or a creole like French-based speech in New Guinea or Haiti. To be exact, Vietnamese emerged from a scholarly and systematic transformation rooted in Mandarin, the official language of Chinese imperial courts.

    This research presents findings of Vietnamese words cognate with Chinese etyma, potentially supporting the placement of Vietnamese within a Sinitic-Yue branch of the Sino-Tibetan family. While it stops short of classifying Vietnamese as a fully Sinitic language, the concept of "Sinitic-Vietnamese" (VS) refers to Sinitic elements layered atop ancient Yue substrata. One may visualize Vietnamese as a linguistic tree: its roots are aboriginal Yue, while its trunk, branches, and leaves are grafted with Sinitic tissues. Think of nursery apple trees bearing multiple varieties, each grafted onto a common rootstock.

    In contrast, the Sino-Vietnamese (SV) lexicon more closely resembles Middle Chinese, particularly similar to "Chinese" lexicon in Cantonese. Interestingly, many Sino-Vietnamese words that overlap with Sinitic-Vietnamese forms resemble northern Mandarin vernacular, especially the colloquial language of imperial courts. Examples include:

    • 'đừng' 甭 béng ("don't")
    • 'xong' 成 chéng ("done")
    • 'được' 得 dé ("okay")
    • 'vâng' 行 xíng ("yes")
    • 'dạ' 喳 zhā ("yes, sir")
    • 'mainày' 明兒 míngr ("tomorrow")
    • 'luônluôn' 牢牢 láoláo ("always")
    • 'được rồi' 得了 déle ("fine"), etc.

    Mandarin, a northern Chinese dialect, evolved from Middle Chinese under heavy influence from Altaic-speaking conquerors who ruled China for nearly a millennium, roughly the same duration as Vietnam’s colonization under Chinese rule. These northern dynasties, including the Xiongnu (匈奴) of the BěiWèi State (北魏), Liáocháo (遼朝), the Mongols of the Yuan Dynasty, the Jurchen of the Jin Dynasty, and the Manchurians of the Qing Dynasty, shaped the linguistic landscape of northern China and, by extension, influenced Vietnamese speech.

    Historical evidence supports the influence of colloquial Mandarin on Vietnamese. Many examples cited in this paper are original contributions that complement earlier etymological work by scholars such as Sergei Anatolyevich Starostin and Lê Ngọc-Trụ. For instance:

    • VS 'màu' (color) ← 貌 mào (SV mạo)
    • VS 'khói' (smoke) ← 氣 / 汽 qì (cf. SV 'khí' for "air", VS 'hơi' for "vapor")
    • VS 'việc' (work) ← 役 yì (SV dịch), 務 wù (SV vụ)
    • VS 'buồn' (sad) ← 煩 fán (SV phiền), 悶 mèn (SV muộn)
    • VS 'việc' (work) ← 活 huó (SV hoạt), etc.

    The degree to which one accepts these etymological connections depends on their background in historical linguistics and openness to the Sinitic framework. Many readers, especially novices, may struggle to appreciate the breadth of postulated cognates due to preconceived beliefs. Some may even overlook self-evident etyma such as:

    • 早 zǎo → 'chào' ("hello")
    • 腚 dìng, 臀 diàn → 'đít' ("buttocks")
    • 屁 pì (SV tí, "hip") → 'phaocâu' ("chicken butt"), possibly linked to 'cáiđít' via 股 gǔ ~> 'cái' 個 gè (SV cá)

    These examples may be treated as doublets to explain sound change patterns such as /-ng/, /-n/ ~ /-t/ and /p-/ ~ /d-/.

    Whether all postulations are accepted or not, earlier findings of Chinese-origin etyma retain their scholarly value. Newly proposed etyma in this study will be elaborated with detailed analysis of sound change mechanisms, both common and specific. This reconstruction process is akin to restoring a faded painting, carefully revealing subtle details lost over time.

    Readers are assumed to be familiar with basic sound change patterns, such as:

    • /j-/ → /g-/ (e.g., 雞 jī → 'gà', "chicken")
    • /zh-/ → /gi-/ (e.g., 紙 zhǐ → 'giấy', "paper")

    These patterns are discussed further in Appendix B - Sound change patterns  and will be referenced only minimally for simplicity.

    You will encounter further discussions throughout this paper on the author's newly developed etymological methods, which have enabled the identification of numerous camouflaged Sinitic-Vietnamese etyma. These findings help illuminate the missing links in the linguistic and cultural affiliation between Vietnamese and Chinese, connections rooted in a 'linked kinship' and shaped by over a thousand years of Han Chinese domination in ancient Vietnam.

    Conclusion

    The author's newly developed etymological approach, centered on uncovering camouflaged Sinitic-Vietnamese etyma, offers a fresh lens through which to examine the linguistic evolution of Vietnamese. These findings help bridge longstanding gaps in our understanding of Vietnamese-Chinese affiliation, not merely through lexical resemblance but through historical and cultural entanglement. The concept of "linked kinship" is not speculative; it is grounded in over a thousand years of Han Chinese domination, migration, and integration in ancient Vietnam.

    Throughout this chapter, we have seen how geopolitical forces, colonial dynamics, and sustained cultural exchange have shaped the Vietnamese lexicon in ways that defy simplistic classification. The presence of Sinitic elements, often embedded beneath layers of Yue substrata, demands a reevaluation of Vietnamese linguistic identity. These etyma are not isolated anomalies; they are part of a broader pattern that reflects deep-rooted historical contact and shared linguistic ancestry.

    The author’s methodology, which combines historical documentation with phonological analysis, opens new pathways for identifying overlooked cognates and reinterpreting Vietnamese vocabulary through a Sinitic lens. This work does not aim to erase Austroasiatic contributions, but rather to restore balance to a discourse long dominated by one-sided narratives. By recognizing the complexity of Vietnamese linguistic heritage, we move closer to a more nuanced and accurate understanding of its origins.

    As the paper continues, readers will encounter hundreds of Sinitic-Vietnamese etyma, each a thread in the intricate tapestry of Vietnam’s linguistic history. These examples are not just academic curiosities; they are linguistic artifacts that speak to centuries of cultural convergence. In reclaiming these etyma, the author invites scholars to reconsider the foundations of Vietnamese and to explore the possibility of a Sinitic-Yue linguistic branch within the broader Sino-Tibetan family.

    Chapter 6 thus closes not with finality, but with an invitation: to look deeper, question inherited assumptions, and engage with the Vietnamese language as a living record of historical transformation.

    x X x


    ENDNOTES


    (K)^ See Kelley, Liam C. (2012). The Biography of the Hồng Bàng Clan as a Medieval Vietnamese Invented TraditionJournal of Vietnamese Studies, Vol. 7, No. 2: 87-122, published by: University of California Press.

    This paper critically examines an account called the "Biography of the Hồng Bàng Clan" in a fifteenth-century text, the Arrayed Tales of Selected Oddities from South of the Passes (LĩnhNam Chíchquái Liệttruyện). This account is the source for the "historical”" information about the Hùng kings. Scholars have long argued that this information was transmitted orally from the first millennium B.C. until it was finally written down at some point after Vietnam became autonomous in the tenth century. In contrast, this paper argues that this information about the Hùng kings was created after Vietnam became autonomous and constitutes an invented tradition.”

    (W)^ Journeymen in the field will understand why the Sino-Tibetan hypothesis of linguistic wave-theory is being shunned by the hard-cored Vietnamese nationalists, let alone the traditional family-tree one (Bloomfield, 1933. pp. 317, 18).

    (周)^ "The findings in the journal Science may help rewrite history because they not only show that a massive flood did occur, but that it was in 1920 BC, several centuries later than traditionally thought.

    This would mean the Xia dynasty, led by Emperor Yu, may also have started later than the period that Chinese historians have thought. Read more at: First evidence of legendary China flood may rewrite history"

    More information: "Outburst flood at 1920 BCE supports historicity of China's Great Flood and the Xia dynasty,"

    (董)^ Thánh Gióng, also known as Phù Đổng Thiên Vương (扶董天王), Ông Dóng and Xung Thiên Thần Vương (冲天神王)
    Source: https://en.wikipedia.org/wiki/Th%C3%A1nh_Gi%C3%B3ng

    ^ (L)The same phenomenon can also be observed in other languages of different roots, even though they are lumped together under the umbrella of Indo-European, such as English and French (not of "Gaulish" origin anyway): 'one' ~ 'un' or 'une', 'two' ~ 'deux', 'three' ~ 'trois', 'eye' ~ 'oeil', 'nose' ~ 'nez', 'tongue' ~ 'tongue', 'sun' ~ 'soleil', 'moon' ~ 'lune', 'fire' ~ 'feu', 'time' ~ 'temp', 'mountain' ~ 'montagne', 'wind' ~ 'vent', 'water' ~ 'eau', 'wine' ~ 'vin', etc.

    (X)^ Only the first Mon-Khmer numbers 1 to 5 are plausibly cognate, namely, "muəj" ,"piː (pɨl)", "ɓəj", "ɓuən", "pram", an eclectic assumption such as piː for "hai", in opposition to the 10-based numerical system in Vietnamese of which only the first 5 numbers correspond to "một", "hai", "ba", "bốn", "năm", respectively.

    As a matter of fact, the Vietnamese speakers are at ease with Chinese origin numbers in common usage and expressions such as "hạngnhất" (一等), "thứnhì" (第二), "bấtquátam" (不過三), "tứquái" (四怪), "mâmngũquả" (五果盤), "ănchia tứlục" (分利四六), "thấttuần" (七旬), "bátquái" (八卦), "bảngcửuchương" (九章版), "chục quảtrứng" (十個蛋), "mộttá" (一打), "nhịthậptứ hiếu" (二十四孝), "báchnhiên" (百年), "thiênthu" (千秋), "ngànvàng" (千金) "vạntuế" (萬歲), "muônthuở" (萬世), "tỷphú" (億富), etc. The Chinese numerical expressions in Vietnamese are innumerable, so to speak.

    (A)^ An Dương Vương is the title of Thục Phán, who ruled over the kingdom of Âulạc (now Vietnam) from 257 to 207 B.C. The leader of the ÂuViệt tribes defeated and seized the throne from the last King Hùng of the State of Vănlang, and united its people, known as the LạcViệt, with the ÂuViệt. In 208 B.C., the Capital Cổ Loa was attacked and the imperial citadel ransacked. An Dương Vương fled and committed suicide.
    Source: https://en.wikipedia.org/wiki/An_D%C6%B0%C6%A1ng_V%C6%B0%C6%A1ng

    (I)Namquốc Sơnhà (Territory of the Southern Nation) written in 1077 by Lý Thường Kiệt and recited next to the defense line of the Nhưnguyệt River (Cầu River), originally for raising the spirit of the soldiers to fight against Chinese invaders and Bình Ngô Đạicáo (Great Proclamation upon the Pacification of the Wu) by Minister Nguyễn Trãi to speak in the name of Bìnhđịnhvương Lê Lợi in the ĐinhMùi year (1427), announcing the pacification of the Chinese Ming invading troops, regaining the national independence, establishing the Later Lê Dynasty.

    (V)^ 1) Dương Đình (Diên) Nghệ 楊廷藝 or 楊延藝 (931-937)
    2) Kiều Công Tiễn 矯公羨 or 皎公羨 (937-938)
    3) Ngô Vương reign: 939–944
    4) Dương Tam Kha reign: 944–950
    5) Hậu Ngô Vương: Nam Tấn Vương & Thiên Sách Vương co-reign: 950–954
    6) Thiên Sách Vương reign: 954–965
    7) Ngô Sứquân (吳使君) reign: 965–968
    8) "The Anarchy of the 12 Warlords" or "Thập Nhị Sứquân Rebellion" (966–968)
    (Source: https://en.wikipedia.org/wiki/Ng%C3%B4_dynasty)

    (H)^ See: https://en.wikipedia.org/wiki/Hoa_people

    (T)^ For the pronoun "they" instead of "she", "he" or "s/he", the author find that sometimes the current usage of the singular "they" is suitable in many circumstances adopted by the Washington Post in its stylebook in December 2015 or US local Examiner newspapers in September 22, 2016. It was also American Dialect's word of the year in 2015.

    (M)^ The Mongol invasions of Vietnam or Mongol-Vietnamese War refer to the three times that the Mongol Empire and its chief khanate the Yuan Dynasty invaded ĐạiViệt (now northern Vietnam) during the Tran Dynasty and the Kingdom of Champa: in 1257–1258, 1284–1285, and 1287–1288. (Source: https://en.wikipedia.org/wiki/Mongol_invasions_of_Vietnam

    (英)^ That is metaphorically comparable to elaborating on China's Simplified Chinese vs. Traditional Chinese, along with Pinyin vs. Zhuyin transcribing systems being in use in Hong Kong or Taiwan or, analogously, cf. 面 miàn (face, noodle, wheat) for 麵 miàn (noodle, wheat) vs. VS 'mặt' (SV 'diện') and 'mì' (SV 'miến'), respectively, so to speak.

    (Y)"It is so said, their ancestors were descendants (of...)", but in relative terms, the forefathers of a nation who lived in a region centuries ago are not necessarily the direct biological ancestors of the people residing there today. In the specific case of Cantonese speakers mentioned earlier, we must consider the demographic shifts over the past 2,000 years, waves of immigration and emigration, the movement of locals and resettlers, and the blending of Sino-Tibetan subjects with Han-Yue populations. Many of the migrants who passed through or settled in the historical Canton region were not of Yue ancestry, even though they referred to their language as 'Jyut8waa2' rather than 'ʃieŋ21Jyut8' (the Yue language) or 'tiếng Việt' (the Vietnamese language).

    Likewise, Vietnam shares a similar historical trajectory. A significant portion of today’s Vietnamese population may not be direct descendants of the native inhabitants who, according to legend, helped the '18 Hùng Kings' establish the ancient nation of Vănlang. Whether those early founders were of Yue or Mon-Khmer origin, the modern Kinh people living in present-day Vietnam are not necessarily biologically linked to the original builders of the nation over two millennia ago.

    (交)Revisiting the XYZ Racial Formulary: To symbolically represent the ethnic composition of the Vietnamese people, we can assign weighted variables to reflect historical demographic shifts. Using the formula {4Y6Z8HCMK}, we approximate the modern Vietnamese racial makeup based on historical records, including Han-era census data. For example, population figures in the three prefectures under Han administration — Jiaozhi (交趾, Giaochỉ), Jiuzhen (九真, Cửuchân), and Rinan (日南, Nhậtnam)— grew from approximately 400,000 to 980,000 between 111 BC and 11 BC. These figures correspond to the Annamese composition {2Y3Z4H}, reflecting a blend of proto-Yue and Han elements.

    Historical accounts from the Qin Dynasty also note that between 15,000 and 30,000 unmarried Yue women were forcibly married to Qin foot soldiers (Lu Shih-Peng, 1964, Eng. p. 11; Chin. p. 47). Given China's longstanding tradition of meticulous household registration, these records are likely reliable.

    The ethnic makeup of ancient Annam closely mirrored that of Han Chinese. This resulted from the intermingling of early proto-Chinese {X} with proto-Yue aboriginals {YY}, typically in a 2:1 ratio across southern China. These interactions produced the indigenous Yue population {ZZZ}, found in ancient larger states such as Wu, Yue, and Chu. Over time, these groups were absorbed into the Han identity, symbolized as {HHHH}, representing three parts Z and four parts H in the unified Han Dynasty, analogous to the consolidation of the Qin Empire into a centralized Chinese state.

    Thus, the racial composition of later Han Chinese can be expressed as {X2Y3Z4H}, a product of the fusion between {X}, {YY}, {ZZZ}, and {HHHH}. Meanwhile, the Vietic lineage emerged from proto-Yue {YY} and later Yue {ZZZ}, forming the proto-Vietic population {YYZZZ}. These became the early Annamese {2Y3Z4H}, who evolved into modern Vietnamese {4Y6Z8H+CMK}, where CMK represents Cham and Mon-Khmer influences.

    This formulation reflects a dual-layered structure: the base {2Y3Z4H} enriched by {CK}, mirroring similar demographic transformations seen in southern Chinese populations such as the Fukienese and Cantonese. These groups underwent comparable racial blending during the Han Dynasty, suggesting a parallel trajectory with the Vietic population.

    If this model holds, then the symbolic formula for Austroasiatic populations may be represented as {6YCMK}, in contrast to the Vietnamese composition of {4Y6Z8HCMK}, highlighting the deeper Sinitic-Yue integration in Vietnamese ethnogenesis. (See Chapter 2: Rainwash from the Austroasiatic Sky).

    (一)^ In the Chinese language, there is an old saying that reads "一將攻城萬骨枯" Yī jiàng gōngchéng wàn gǔ kū. ('Nhất tướng côngthành vạn cốt khô.') is to convey such dreadful fact, that is, thousands of innocent residents inside living quarters of a citadel could easily have lost their lives under the hands of winning troops in the fighting. That is the customary norm of Chinese culture, so to speak. As we can see now, the population of the faraway and southernmost Annam prefecture could have already reached over a tenth of the 17 million of the Tang population by then.

    (E)^ The broader picture becomes clearer when we consider the demographic impact of foreign presence. For example, during the Vietnam War (1965–1975), more than 50,000 'Eurasian Vietnamese' children were born to American soldiers stationed in South Vietnam—a country with a population of roughly 22 million at the time. These births occurred within a relatively short span of just ten years, underscoring how quickly external forces can leave lasting imprints on a nation’s demographic and cultural landscape.

    Now, imagine how California might look 2,000 years from now if it were to become an independent country 1,000 years from today. Or consider Taiwan, 'How would its identity evolve over millennia under similar conditions?' These hypothetical scenarios invite us to reflect on how sustained foreign influence, migration, and cultural exchange shape national identity over time.

    (S)^ Fanqie is traditional method of indicating the pronunciation of a Chinese character by using two other Chinese characters, the first having the same consonant as the given character and the second having the final and tone. (Handian: 古漢語 注音 方法, 用 兩 個 字 注讀 另 一 個 字,  例如 '塑, 桑故切 (或 桑故反)'。 被 切 字 的 聲母 跟 反切 上字 相同 ( '塑' 字 聲母 跟  '桑' 字 聲母 相同, 都是 s),被 切 字 的 韻母 和 字調 跟反切 下字 相同 ('塑' 字的韻母的 字調 跟 '故' 相同, 都是 u 韻母, 都是 去聲。Source: 汉典 https://zdic.net).

    (文)A good examples is from "Bình Ngô Đạicáo Tânthời" written in 'classical language' with a modern context by the author. It is a cynical version of the 'Vietnamese proclamation of independence from China' in 1428, Vietnam's Le Dynasty. You may want to read the full version of it in Appendix L or do a Google search to see how "nationalism" and "politics" can obscure some good judgment:

    "
    凭吾丑告: 女丑讨华, 占有千秋, 婆权成性, 历载叶千, 巨大无双, 蝴蝶婆脷, 汉和岭蛮, 缩头乌龟, 中擦外伤, 坏而恋战, 南越百族, 湖广七雒, 独吾健在, 雄居南方, 旗花移到, 吾邦挚友, 好客有方, 来者良家, 流氓勿忘, 白藤江待, 南杀西杀, 旗中无敌, 维我独尊, 骑越虎也, 上之毋下, 入生出死, 大鱼气小, 急吃豆腐, 九死一生, 贪食疾身, 女等欺人, 甚不可忍, 君子报仇, 十年不晚, 咱走着瞧, 霸权破脷, 惹火焚身, 九泉归依!"

    (Trâu Ơi Bố Bảo: Trâu số đạo hoa, ngàn lẻ thu qua, hay thói quyền bà, sửxanh ghichép, cụ đại vôsong, baybướm lưỡibò, hánhởmulạnh, đầurùa lấpló, trong sứt ngoài thoa, lâm chiến bại hoài. HồQuảng dù mất, NamViệt vẫncòn, Hùng cứ phươngnam, kỳhoa dịthảo, hữuhảo chi bang, chuộngchìu hiếukhách, nhàlành kếtmối, lưumanh chớhòng, Bạchđằng BểĐông, Trườngsa Hoàngsa, duyngãđộctôn, kỳ trung vô địch, cởi cọp Việtnam, lênvoixuốngchó, vàosinhratử, ỷlớnhiếpbé, nuốtxương mắccổ, dỡsốngdỡchết, thamthựccựcthân, lũbay bốláo, đắcchí tiểunhân, nhịn cũng vừa thôi, quântử ratay, bàihọc ngànnăm, tổcha tụibay, báquyền bảláp, rướchoạvàothân, ngậmngùi chínsuối!)