Reframing the Origins of Chinese Lects and Vietnamese
by dchph
This article revisits the origins of Chinese civilization through the lens of linguistic intrusion. Drawing on Terrien de Lacouperie’s thesis and subsequent comparative scholarship, it argues that the so‑called "Chinese" language emerged from a mosaic of pre‑Chinese lects – Yue, Mon‑Taic, and proto‑Altaic substrata – long before dynastic consolidation. Early glossaries such as Erya 爾雅 (SV Nhĩnhã) and Fangyan 方言 (VS phươngngữ, 'regional speech') preserve traces of these substrata, revealing how syntax, classifiers, and tonal systems were shaped by southern influences.
The study further demonstrates that Vietnamese must be understood within this continuum of hybridity. Borrowings from Chinese were not drawn from a monolithic source but from a language already layered with non‑Chinese elements. Vietnamese thus embodies a hybrid legacy: Yue roots, Mon‑Taic substrata, and Sinitic overlays coexisting within a single linguistic identity.
By reframing the origins of Chinese lects and Vietnamese, this article challenges nationalist narratives of purity and isolation. It contends that intrusion and adaptation are the historical norm, and that only by acknowledging hybridity can linguistic scholarship achieve clarity and rigor.
I) The languages of China before the Chinese
Lacouperie advanced the provocative notion of "pre‑Chinese languages of China", a vast and complex substratum preserved only in fragmentary Chinese records. He argued that southern Mon and Taic languages profoundly influenced early Chinese syntax, phonology, and semantics, introducing features such as SVO word order, tonal development, and classifier usage. Works like the Erya and Yang Xiong’s Fangyan preserve thousands of regional terms, many non‑Chinese in origin, reflecting centuries of contact and migration. Sino‑Vietnamese further illustrates this legacy, preserving archaic Chinese sounds while coexisting with vernacular Vietnamese.
Long before the emergence of Chinese proper, a constellation of indigenous languages thrived across the regions south of the Yellow River and extending into the Red River Basin. Historical linguists identify these as branches of the broader Taic family – Taic‑Shan, Taic‑Dai, Mon‑Tai, Mon‑Paluang, and others – languages that ultimately gave rise to Dai, Yue, Austroasiatic, Mon‑Khmer, Viet‑Muong, and early Vietnamese. This section examines those ancient Taic‑Yue languages that preceded the rise of Sinitic, their speakers, and the hybridized descendants that were later classified as "Chinese dialects."
For languages of uncertain affiliation, it is unsurprising that Southeast Asian linguists have sometimes described them as "mixed," "hybrid," or even "generic." Yet in reality no language is truly "generic" in the sense of an artificial "Esperanto." Afrikaans, Albanian, Haitian French, and Vietnamese alike are natural languages with deep historical roots. Vietnamese, in particular, has long been classified by the Mon‑Khmer school as Austroasiatic, largely on the basis of its core vocabulary cognate with Mon‑Khmer forms.
Genetic affiliation, however, is rarely straightforward. Typologically, language A may share a portion of its lexicon with neighbor B, which in turn overlaps with C, and C with D, and so forth. At a distance, language Z may display scattered cognates across A, B, and C, though without necessarily being genetically related. Such patterns recall the intriguing resemblances sometimes noted between distant Asian and American Indian languages – for example, California’s Lake Tahoe and China’s "Tàihú" (太湖, SV Tháihồ), both denoting a "large body of water."
An anthropological‑linguistic scenario may help to frame the Vietnamese case. Let us posit Vietnamese (the ancient Annamese or Vietic tongue) as a descendant of an ancestral Y (Yue), itself a branch of T (Taic). This same T also gave rise to X (Zhuang), making ancient Vietnamese and Zhuang linguistic cousins, both distantly related to Z (Zhou). Z was later subsumed by Q (Qin), and together these lineages evolved into a composite XYZ (see Chapters Two and Six on the genetic components of Chinese and Vietnamese). From this amalgam emerged H (the Han peoples) and S (the Sinitic languages). Surrounding them were numerous now‑extinct languages – A, B, C, D – whose traces survive only in scattered vestiges.
Figure 1 - Linguistic ancestry diagram
T (Taic)
│
┌─────────────┴─────────────┐
│ │
Y (Yue) X (Zhuang)
│
└───> V (Vietnamese / Vietic, ancient Annamese)
↓
Z (Zhou) ─────────┐
│
└──> absorbed by Q (Qin)
│
└──> composite XYZ
↓
┌─────────────┴─────────────┐
│ │
C (Chinese lects) ─────────> S (Sinitic languages)Other extinct neighbors: A, B, C, D … (scattered vestiges)
Through centuries of intermingling, conquest, migration, and integration:
Y + T + Z + S + intermediaries (P, R, Q)
↓
K (Kinh, "mutated" Vietic‑Yue)
↓
V (Modern Vietnamese)
It is hypothesized that the proto‑Taic speakers gave rise to the Yue aboriginals, who once occupied vast stretches of pre‑Chinese territory, ranging from the northern Yangtze basin to the coastal regions of present‑day Zhejiang and Jiangsu. From this substratum, Vietnamese basic vocabulary may have drawn directly on elements of T, Z, and S, while at the same time exerting influence on its southern neighbors, including the Austroasiatic Mon‑Khmer languages. Such diffusion unfolded over centuries of sustained contact – marked by submission, migration, trade, warfare, annexation, and integration – as populations gradually moved southward.
Although Vietnamese and the Sinitic languages may not be genetically affiliated in the strictest sense, they nonetheless share a kinship through common ancestral cousins and intermediate carriers (P, R, Q), forged in the crucible of conquest and domination. Across long centuries and vast spaces, this process culminated in the emergence of the Kinh (K), a transformed lineage of earlier Vietic‑Yue peoples, who ultimately became the modern Vietnamese (V) we recognize today.
Table 1 - On the Pre-Chinese aboriginal Taic linguistic family
The Taic linguistic family examined in this study corresponds to what Terrien de Lacouperie, in The Language of China Before the Chinese (London, 1887; Taiwan reprint, 1966), described as the Mon‑Taic dialects. According to Lacouperie, these were the pre‑Chinese aboriginal dialects spoken across ancient China. Building on both historical traditions and legendary accounts, he sought to establish the affiliations between Taic, Chinese, and Yue as interpreted as follows.
The Pong (彭), also known as the Pan‑hu (盤瓠) race, held a predominant position in Central China, south of the Yellow River, at the time when the early Chinese, or Bak tribes, migrated into the region. Their leader, remembered as Pong, became the subject of numerous legends. He was said to have settled in northeastern Sichuan and western Henan, where he maintained friendly relations with the Chinese from the outset. Indeed, he reportedly aided them in resisting incursions from the Jung and Naga peoples advancing from the northwest. Many tribes later claimed descent from him, and some continued to venerate his memory. Their collective name, Ngao meaning "powerful," eventually evolved into the ethnonym Yao.
The Pan‑hu race was considered a branch of the Mon peoples from the southwest, who had occupied large parts of China prior to the arrival of the Chinese, that is, before the twenty‑third century B.C. From this branch, and through intermingling with northern Kuenlunic (崑崙 Kūnlún) tribes, the Taic or Shan‑Siamese populations are thought to have emerged. Over time, some of these groups migrated southward under pressure from Chinese expansion, spreading into Indo‑China and forming several distinct states.
The Pan‑hu language itself is not directly attested but is inferred from the dialects of the tribes descended from it. Its most notable feature was its ideological orientation, described as nearly opposite to that of the Kuenlunic languages. The oldest remnants of this speech were preserved by Chinese writers of the Han Dynasty, particularly in the Annals of the Eastern Han. Earlier traces appear in still older works, though there they are cited only with geographical markers, leaving scholars to infer the identity of the speakers. By contrast, in the Han sources the words are explicitly attributed to the Yao of the Pan‑hu race, a precision that, as Lacouperie emphasized, makes all the difference.
Ethnologically, besides what was discussed by the same author in Chapter 8 regarding the Pre-Chinese and the Chinese, per Lacouperie (ibid, pp. 116-119), on the ancestral Bak of the early Chinese as opposed to the pre-Chinese, he demonstrated that
"[...] the chief characteristic of these affinities between the early civilization of the Chinese 4000 years ago and the much older focus of culture of South-West Asia is that they are obvious imitations and borrowings. They have nothing original in themselves, and bear in the face that they do not come from common descent. They present the usual imperfectness unequally combined with a complete identity on some points and others which are always the accompaniment of acquisitions obtained through a social intercourse of protracted length, and not from a casual teaching and learning from books and scholars.
The name Bak [百] (now Peh), of the original Chinese immigrants, meant 'flourishing, many, all,' and also 'hundred.' But it has not the last meaning in such expressions as Peh sing 'all the surnames,' Peh kuan 'all the officials,' Peh Liao, same meaning, Peh Yueh [百越 BáchViệt] 'all the outside-borders,' etc., where no possible reference can be made to any precise number, since these various items comprise several hundreds, as in the case of the first three, or only a few, as in the last case. All through the Shu-King [書經] or Canon Book of History, it is employed as a whole though undetermined number. And as a matter of fact, the well-known expression Peh sing, above quoted, which appears from the beginning of Chinese history, and about which so many baseless speculations have been set forth, has never meant the hundred surnames, as was wrongly presumed, and this for several reasons. The supposition that Peh sing meant 'the hundred surnames ' (or families) was based on the fact that the Peh Jia sing or 'the hundred (?) family names,' which includes some 460 names, was only compiled under the Sung dynasty, i.e. after A.D. 960, when the number had increased largely and much beyond its original figure. But this admitted, the regular use of the family names does not go back much beyond the time of Confucius (B.C. 551-479), and when this list of surnames is carefully sifted, we do not find more than about sixteen surnames dating as far back as the beginnings of the Chinese in China; this small number, however, being only reached if we include a few family names quoted in the early traditions, and disappearing afterwards. Therefore, as the term Peh sing, 1 i.e. the 'Bak Surnames,' existed among the Chinese from the outset as an appellative for themselves, the word Peh, old Bak, could have, not the meaning of 'hundred,' but perhaps that of 'all, numerous, flourishing,' as stated above, should it have been still understood. And the meaning 'hundred,' which originally was apparently said bar, was only a homonymous sound in the limited phonetic orthoepy of the Chinese, expressed by the same symbol because of the similarity of sound, real only for them.
Bak was an ethnic and nothing else. We may refer as a proof to the similar name, rendered however by different symbols, which they gave to several of their early capitals, PUK, POK, PAK, all names known to us after ages, and of which the similarity with Pak, Bak, cannot be denied. In the region from where they had come, Bak was a well-known ethnic, for instance, Bakh in Bakhdhi (Bactra), Bagistan, Bagdada, etc. etc., and is explained as meaning 'fortunate, flourishing.'
Another ethnical name no less important is that which is now read 夏 Hia, also sha, in several ideo-phonetic compounds, and which was the proper appellative of one of the leading tribes of the immigrants when settled in 'a little bit of territory in the N.W.' It became the name of the Chinese people. The Ku-wen spellings tell us that its original full form was something like Ketchi, Ketsu, Ketsi, Kiitche, Kotchi, etc., which are all graphical attempts at rendering the exact name with the clumsy acrologic and syllabic system of the time being. We may take Kütche as an average of all these variants. Now this name is so much like that of the Kashshi on the north-east of Mesopotamia that, without suggesting in any way a relationship of some kind between the two peoples, there may have been an affinity of names from a common meaning suitable to both.
An analysis of the aforesaid book of the family surnames, the Peh kia sing, shows their number to be made up, besides the original names, of native appellatives brought in sometimes by the entrance of native tribes into the Chinese community, but principally from the native names of regions bestowed upon Chinese subjects as fiefs and territorial grants. Even the princely names taken by the early Chinese leaders in the Flowery Land were borrowed from those of native regions, as they conquered them. But an examination of all these proper names, tribal and geographical, would carry us much beyond the limits of the present work.
We have little to say here of the early language of the Chinese Bak tribes, and its subsequent evolution and development into several important dialects, as the matter is somewhat precluded by the object of the present work. We allude elsewhere to some of its characteristics and to the formation of its ideology (§§ 20-26) and tones (§§ 117, 230). The explanation of the gap now existing between the book-language 2 and the vernaculars requires some long explanations and demonstration much beyond our scope here. The following scheme, however, gives the list of the most important languages, dialects, and subdialects, with an indication of the probable dates of their branching off. It is the first attempt which has hitherto been made at classifying them, and thus far must be looked upon with regard to the relative position of several dialects and subdialects as provisional. A great deal of work and investigation remains to be done before such a classification can be completed. The total number of dialects and subdialects, hiang fan or local patois, etc., has been roughly estimated to be somewhat similar to that of the days of the year (360), and though they are not likely to affect the general lines of the classification below, it may be useful not to forget that the total 'figure of the names entered therein is only one-ninth of the general number."
Figure 2 - General Historical Scheme of the Chinese Family of Languages
Table 2 - Classical Chinese Lexicons Preserving Pre‑Chinese Layers
| Lexicon | Date / Dynasty | Compiler | Structure | Function | Relevance to Vietnamese Studies |
|---|---|---|---|---|---|
| Erya 爾雅 (SV Nhĩnhã) | Qin / early Han (3rd–2nd c. BCE) | Anonymous, later attributed to scholars of the Qin court | Organized by semantic categories (plants, animals, kinship, etc.) | Glossary of classical texts; earliest surviving Chinese lexicon | Preserves non‑Chinese etyma embedded in Zhou texts; shows how Yue and Mon‑Taic words were naturalized into “Chinese” vocabulary |
| Fangyan 方言 (VS Phươngngữ, 'regional speech') | Han Dynasty (1st c. CE) | Yang Xiong (53 BCE–18 CE) | Dialectal entries grouped by region | Records local vernaculars across China | Documents Yue, Tai, and other substratal forms; demonstrates diversity of speech communities that later influenced Vietnamese borrowings |
| Later glossaries (e.g., Shuowen Jiezi 說文解字, SV Thuyếtvăn Giảitự) | Han Dynasty (2nd c. CE) | Xu Shen | Organized by radicals and phonetic series | Standardizes script and phonology | Shows how hybrid vocabulary was codified into “official” Chinese, the source of many Sino‑Vietnamese forms |
Notes:
Erya is crucial for showing semantic borrowing: words for plants, animals, and kinship often reveal southern origins.
Fangyan is invaluable for dialectal diversity: it records regional speech that later fed into both Chinese lects and Vietnamese.
Together, they demonstrate that Chinese was already hybrid before Vietnam borrowed from it.
Linguistically, in terms of data availability as of the late 19th century, Lacouperie (ibid. pp. 3-5) noted that
"the languages mentioned in these pages are not all of those, or the representatives of those, which were spoken in the Flowery Land when the Chinese made their appearance in that fertile country some four thousand years ago. The Chinese have only occupied it, slowly and gradually, and their progressive occupation was only achieved nominally during the last century [i.e., the 18th century]. Some portions of the S. and S.W. provinces of Kueitchon [sic], Szetchuen, Yunnan, Kuangsi and Kuangtung are still inhabited by broken and non-broken tribes, representatives, generally cross-bred, mixed and degenerated, of some former races who were once in possession of the country. Therefore the expression pre-Chinese languages of China implies an enormous length of time, which still continues, and which would require an immense study should the materials be available.Unhappily the data are of the most scanty description. They consist of occasional references given reluctantly and contemptuously during their history by the Chinese themselves, who were little disposed to acknowledge the existence of independent and non-Chinese populations in the very midst of their dominion. Though they cannot conceal the fact that they are themselves intruders in China proper, they have always tried the use of big words and large geographical denominations, which blind the unwary readers, to shield their comparatively small beginnings. Such indications can be obtained only by a close examination of their ancient documents, such as their histories, annals, and the local topographies, where, in the case of the annals, they have to be sought for in the sections concerning foreign countries; an arrangement somewhat startling, though not unnatural when we consider the real state of the case from a standpoint other than the views entertained by the ancient sinologists on the permanence and the ever-great importance of the Chinese nation. But the Chinese, though careful to inscribe in one or another part of their records all that occurred between themselves and the aboriginal tribes, and all that they could learn about them, were not enabled to know anything as to the events, linguistical and ethnological, which took place beyond their reach. So that displacements of the old races, as well as the arrival of new ones, have taken place in the regions non-Chinese, now part of China proper. Foreign linguistic influences have also been at work, and of these we have no other knowledge than that deduced from the traces they have left behind them which enable us to disentangle their peculiar characteristics."
Syntactically, to say the least, as to the Southern linguistic influence on the Chinese language, per Lacouperie (ibid, pp. 16-17),
"the postposition of the genitive to its noun, which occurs not unfrequently in the popular songs of the Book of Poetry, where it cannot possibly be looked upon as poetic licence, belongs to an influence of different origin, and is common to the Mon and Taic languages." [...] "And for the position of the object to the verb, and the syntactical order of [ Subject+Verb+Object ] standard, in contradistinction with the unadulterated indices of the Ural-Altaic, which it formerly possessed, there is no doubt that the Chinese language was indebted to the native languages of the Mon, and subsequently to the Taic-Shan formation." [..]"The phonesis, morphology, and sematology of the language bear, also, their testimony to the great influence of the native languages. The phonetic impoverishment and the introduction and growth of the tones as an equilibrium to make up deficiencies from wear and tear, are results of the same influence. In the process of word-making, the usual system of the postplacing particles for specifying conditions in space and time common to the Ugro-Altaic linguistic alliance has been disturbed in Chinese, and most frequently a system of preplacing has been substitute for the older one. And finally, in the department of sematology, we have to indicate, also, as a native influence on the language of the Chinese, the habit of using numeral auxiliaries, or segregative particles, otherwise classifers, which, if it has not been altogether foreign to the older state of the language, would not have taken the important place it occupies in the modern dialects.""The vocabularies which, contrary to the usual habit, have not been the first considered have come at one pace with the preceding alternations. The loan of words have been intensive on both sides, native and Chinese, and reached to a considerable amount."
The linguistic characteristics as described above is in the Chinese standpoint as those have been put in the historical perspective. During the period of the Zhou Dynasty (1050-255 B.C.) the State of Chu was one of the great power of all, of a non-Chinese civilization, of which its territory covered from Anhui, Hebei, to Honan provinces, and a waving and ill-defined territory all around. On the east of Chu were the states of Wu and Yue, non-Chinese, covering the modern provinces of Jiangsu and Zhejiang in about 584 B.C. and the Wu was later conquered by the Yue in 473 B.C. Towards the end of the 4th century B.C., philosopher Mengzi (Mencius), took note that the Chu 'barbarians' spoke a shrieking language different from those people of the Qi State in today's Shandong Province. Note that the names of the kings of the Wu and the Yue have decidedly a non-Chinese appearance; therefore, it and all other states were in need of interpreters in the machinery of the Chinese government. (Lacouperie. Ibid, pp. 20-21).
In our time, we had at hand the Erya (爾雅), containing hundreds of local vocabularies, having been used as a common tool for communication among ancient states in ancient China, on the one hand. It was believed that it had been an interstate diplomatic language, on the other hand. As a matter of fact, Erya was a dictionary issued by the Zhou Dynasty that collected common words, including non-Chinese languages, with explanations and many double-words arranged in pairs, which is a characteristic feature of the Taic-Shan languages, commonly found in Shijing (詩經) or Classic of Poetry. In fact, "it contains many words which do not seem to have ever been used in any Chinese text properly so called. They are regional words borrowed from other stocks on vocables, and they could be expressed in Chinese writing only by the use of homonyms as phonetic exponents. [..] There are no less than 928 words or about one-fifth of general stock, which do not appear anywhere else than in the Erh-ya." (Lacouperie. Ibid, pp. 23)
Lacouperie, nevertheless, found that the most important work was Fangyan (方言 'Dialects') by Yang Xiong (楊雄 53 B.C.-18 A.D.) and much of the attention was paid to local words about the time of this author. Before Yang Xiong, other scholars had labored on the subject with collections of thousands of local words that had been utilized and adapted into Yang Xiong's work up to 9000 words arranged by subjects from 40 regions, many of which were only Chinese in name, and others not Chinese at all such as Hebei, Anhui, Hubei, Hunan, Jiangsu, Zhejiang, Guangdong, Guangxi, Sichuan, etc., within the modern proper of China. All in all, later generations added more items and brought them up to 12,000 words. Note that words in this remarkable work represent the collection of several centuries in which many names of states did not exist prior to his time, e.g., 南越 NanYue, 貴州 Guizhou, 湘 Xiang, and even the Qin State 秦國 Qin that was destroyed and partitioned in 436 B.C. by the states of Han 韓, Wei 衛, Zhao 趙, etc. (Lacouperie. Ibid, pp. 25, 29).
So, being such a case, the Chinese symbols attached to the recorded character-words were pronounced differently in each era, that is a serious matter to consider.
"This is made apparent by this fact, that differences of pronunciation are often indicated by symbols whose sounds have for long been homonymous. However, the best means to start with, and subjected to the least proportion of ulterior modifications, are the sounds preserved in the Sinico-Annamite, the most archaic of the Chinese dialects. The only preservation to be made, is that the hardening and strengthening which this dialectal pronunciation indication goes perhaps beyond the mark, and that half of its strength might be due to local peculiarity of the dialect."(Lacouperie. Ibid, p. 29)
By mentioning "Sinico-Annamite", termed as "Sino-Vietnamese" (SV) in this paper, Lacouperie not only meant "Sinico-Annamite" vocabulary but also an academic language considered as a dialect like those of Cantonese or Fukienese.
"Two languages are used in Annam. One employed by the literati only is pure literary Chinese, with the old sounds of the Ts'in [ Lacouperie: 'or /Tan/' 秦 Qín, SV Tần, 'Chine' ] period attached to the written characters. It is the Sinico-Annamite, this very dialect, which, with necessary allowance for decay and self divergence, rightly deserves the qualifications of the most archaic of the Chinese dialects.It is the curious fact that its existence was not, in the minds of many scholars, separated from that of the other language, the vernacular Annamese or Cochin-Chinese, which belongs, as recognized by John Logan, and though full of Chinese idioms, to the same family, as the Mon or Peguan [(1)]."(Lacouperie. Ibid, p. 54)
And by the time of the publication of his book, 1886, Lacouperie (Ibid. p. 55) noted that there were 3 writing systems used in Annam: (1) the chữNho (字儒), (2) chữNôm (字喃), (3) chữQuốcngữ (字國語), of which characteristics have been discussed previously in this paper and elsewhere by all Sinologists and Vietnamese specialists, all similarly described.
All said, the author herein would like to bring to the attention that many of those Mon-Taic vocabularies as list "Mon-Taic" by Lacouperie barely find plausible cognates in modern Vietnamese. Even though the author related only to non-Chinese ethnology of the country as Fairy Dragon's descendants (龍種) as those of the Mon-Taic races, starting with King of Kinhdương (京陽王 Jingyang Wang or SV 'Kinhdương Vương') whence Jingyang was a place name near the capital of Qin in Shaanxi. King Kinhdương was the son of Prince by a girl of of the race of the immortals (the race of Peng 彭 or Panhu 盤瓠, as previously mentioned, who were ancestors of the Taic race; hence, the phrase 'conrồngcháutiên' (or 'children of the Dagon and Immortal race') in the Vietnamese legends. King Kinhdương married a wife from Độngđình Lake (洞庭湖 Dongtinghu, in Hunan Province), also belonging to the Dragon race.
"King Lak-Long [Lạclong Quân (雒龍君)], the issue of this union, was the first of a series of eighteen rulers, the last of whom ended in 207 B.C. At the rate of twenty-five years a reign, the highest average possible, these speculative data lead to circa 800 B.C. as the probable date of these beginnings, which therefore would have taken place when the state of Ts'u [ 楚 Chu (Sở) ] in Hupeh and Hunan S. was in full prosperity.""The boundaries of the kingdom of these early Annamese rulers were, according to the tradition, on the east the sea, on the north, Tung ting lake, on the west Pa and Shuh, both names for Szetchuen, with one ruler whose reign of fifty years that ended in 202 B.C. when the third dynasty begins. The latter is no less than that founded by the successor of Jen Hiao [ 任囂 (Nhâm Ngao) ], Tchao T'o [趙佗 (Triệu Đà)], a rebel Chinese [秦 Qin (Tần)] general who established his sway all over the maritime provinces of the south, extending from Fuhkien to Tungking [東京 (Đôngkinh) or 'Tonkin', North Vietnam ]; which lasted with 5 rulers until 112 B.C., when it submitted to the Chinese dominion, which, however, was merely nominal in some parts, and not at all established on the east. It was recognized from that date, with the exceptions of three years (39 - 42 A.D.), until 186 A.D., when a native king, Si-nhip [士攝 (Sĩ Nhiếp), known as the Han's viceroy in Vietnam's early history, though ], ruled for 40 years. It was this king who introduced the Chinese literature, and prohibited the of the use of phonetic writing [?] hitherto employed by the Annamite."(Lacouperie. Ibid, pp. 53-54)
As we assigned the Chu populations as descendants from the Taic aboriginal peoples who gave birth to the Dai-Kadai (Taic-Shan, Mon-Shan, Mon-Taic by Lacouperie) and the Pre-Chinese Aboriginal Mon-Khmer (in this paper being termed as Taic-Yue, Yue, Daic, Tai-Kadai, Austroasiatic, Mon-Khmer, etc.) languages, for the latter tribes, Lacouperie states that, "the ancestors of the language and civilization of the Annamites, and partially also of their race, must be sought for in Central and Eastern China. We hear from history that the former population of the south, between the Kwangtung [Canton] and Tungking [Tonkin], both, inclusive, were generally displaced by, or intermingled with, half a million of colonists drawn chiefly from the region of modern Tchetkiang [Zhejiang] and its west, by Jen Hiao [Nhâm Ngao] in 218 B.C." (Lacouperie. Ibid, p. 52)
As a matter of fact, with regard to Mon-Shan affiliation, the author cited a number of its aboriginal languages, especially that of the "Paloungs"" (勃弄 'Po-lung', 'Palaung'), a language of the Mon-Talaing family and its speakers were settled in northwest Yunnan, which was later conquered by the Nanzhao (南詔) Kingdom of the Shan tribes in the 7th century.
To sum up, here are the key findings of the foregoing:"We have two vocabularies of their speech; one of 200 words collected in 1858 by Bishop P. A. Bigandet, which examined by John Logan, permitted this great scholar to recognize the Mōn-Annam relationship of the language. Another vocabulary was collected by Dr. Hohn Anderson at the time of his expedition in S.W. Yunnan. The latter list of words is less saturated with Shan words than the preceding. The indices of its ideology are 2 4 6 8 VI [ i.e., grammatically word order, e.g., adjectives and genitives follow nouns, etc., being like that of the French language (Lacouperie. ibid. p. 66) ], which confirm the glossarial evidence.""As we have seen in our foregoing §§ 31-33 the language spoken in Ts'u was not a Chinese dialect. And the statement of Hung k'iü, ruler in Ts'u from 887-867 B.C., saying, 'We are Man-y (i.e., aliens from the Chinese), and we do not bear Chinese names,' is an unnecessary confirmation. The words quoted from the Ts'u Fang yen are easily identified with the Mōn and Taic-Shan vocabularies in equal shares, when they are not simply altered Chinese. And the most frequent phonetic equivalent is that of k or h for a Chinese l, still existing in the modern language."(Lacouperie. Ibid. pp. 55-56)
- Fragmentary evidence: Knowledge of pre‑Chinese languages is limited, preserved mainly in reluctant Chinese records and scattered references.
- Southern influence: Mon and Taic languages shaped early Chinese syntax (genitive placement, SVO order), phonology (tonogenesis, phonetic reduction), and semantics (classifier system).
- Lexical borrowing: Intensive two‑way borrowing occurred between Chinese and indigenous languages, leaving a substantial shared vocabulary.
- Regional records: Works like the Erya and Yang Xiong's Fangyan preserve thousands of local and non‑Chinese terms, reflecting centuries of contact and migration.
- Chu, Wu, Yue States: Powerful non‑Chinese polities in the Zhou era maintained distinct languages, requiring interpreters in Chinese administration.
- Sino‑Vietnamese legacy: The so‑called "Sinico‑Annamite" preserved archaic Chinese sounds, functioning as both a scholarly register and a bridge to vernacular Vietnamese.
Regarding words from the "Paloungs" (勃弄 Po-lung) language, a Mon-related language, in this paper the author has cited them as "Palaung" from the list of 249 words in the table published by Luce G. H. (1965) (See What Makes Chinese So Vietnamese - Chapter 8.)
The author identifies several intriguing parallels in wordlists cited by Lacouperie, though most of the cognates he proposed were treated as loanwords from "Tai‑Shan" and "Mon‑Taic" aboriginal languages. The essential point is that the relationship between Vietnamese and these pre‑Chinese Mon‑Taic dialects, including Mon‑Khmer, s relatively loose, though. Their cognateness is not as firmly established as that of Chinese dialects such as Cantonese or Fukienese, for reasons discussed in earlier chapters. Lacouperie did note some striking correspondences: for instance, in the Tai‑Shan dialects of the Zhongjiazi (also "Tchung Miao"), the reduplicated form 田丁田丁 tien‑ting tien‑ting aligns with Vietnamese thằng ('servant'), while 媚娘 méiniáng parallels vợlớn ('first wife'). Yet claims by early Mon‑Annamese researchers that one‑third of 28 basic “Tchung Miao” words were cognate with Vietnamese appear overstated. Many forms, such as 阿妹 ami → em (SV amuội), 家奴 jianu → ngườinhà (SV gianô), 家公 ch’ia kung → ôngchủ (SV giacông), and 家婆 ch’ia pu → bàchủ (SV giabà), are more convincingly explained as Chinese cognates. Recent Vietnamese scholarship likewise rejects a purely Mon‑Khmer origin, situating many of these items instead within the Dai‑Kadai (Tày‑Thái) sphere (Nguyen Ngoc San 1993).
For non‑specialists, these lists illustrate how sound changes diverged morphologically as words spread across languages, sometimes reduced to vocables regardless of meaning. Over centuries, particularly during Vietnam’s millennium under Chinese rule, vocabularies shifted both diachronically and synchronically, with loanwords deeply embedded in the lexicon. This complicates efforts to distinguish true indigenous strata from borrowed layers when classifying genetic origins, especially between Chinese and Vietnamese, where forms are so closely aligned.
II) Hybridity as historical norm
Acknowledging the existence of "languages of China before the Chinese", the Sinitic‑Vietnamese hypothesis gains support from Luce’s 245‑item wordlist and similar compilations. By filtering out indigenous elements, Vietnamese basic vocabulary can be grouped into categories: (1) words with no Chinese connection, (2) cognates shared with Chinese and Mon‑Daic/Mon‑Khmer, (3) forms more closely aligned with Chinese than Austroasiatic or Daic‑Kadai, (4) items plausibly cognate only with Chinese and Vietnamese, and (5) fundamental lexemes absent from Mon‑Khmer lists but essential to any language.
Sample items include:
- Indigenous: tai (ear), mũi (nose), miệng (mouth), bốn (four), bảy (seven).
- Shared with Chinese: mắt 目 mù (eye), tay 手 shǒu (hand), gạo 稻 dào (rice), sắt 鐵 tiě (iron).
- Closer to Chinese: tiếng 聲 shēng (sound), lửa 火 huǒ (fire), nhà 家 jiā (home).
- Exclusive parallels: goá 寡 guǎ (widowed), liềm 鐮 lián (sickle), sông 江 jiāng (river).
- Core lexemes absent in Mon‑Khmer lists: uống 飲 yǐn (drink), khóc 哭 kù (weep), cười 笑 xiào (laugh), chuối 蕉 jiāo (banana).
Although Vietnamese basic vocabulary aligns dominantly with Chinese, many forms also appear across neighboring languages. This suggests that shared elements with Mon‑Khmer are better explained as outcomes of prolonged contact, resettlement, and typological convergence, rather than as evidence of a single Mon‑Khmer origin. (2)
Conclusion
The evidence from early glossaries, dynastic chronicles, and comparative linguistics makes clear that “Chinese” was never born as a singular, self‑contained language. It emerged from a plurality of lects – Yue, Mon‑Taic, proto‑Altaic – layered through centuries of intrusion and adaptation. The Bak tribes, Shang, Zhou, and Han each absorbed outsiders, weaving their speech and culture into what later became recognized as "Chinese".
For Vietnamese, this hybridity is equally defining. Borrowings from Chinese were not drawn from a monolithic source but from a language already hybridized. Vietnamese thus embodies a continuum of intrusion: Yue roots anchoring vernacular identity, Mon‑Taic substrata shaping phonology and syntax, and Sinitic overlays providing administrative and ritual vocabulary.
By reframing the origins of Chinese lects and Vietnamese in this way, the article challenges nationalist narratives of purity and isolation. It insists that intrusion is not a deviation but the historical norm. Languages evolve through contact, cultures adapt through exchange, and identities are forged through hybridity.
Recognizing this reality allows scholarship to move beyond ideological distortions and toward a more accurate, humanized understanding of linguistic history. Both Chinese and Vietnamese stand as living archives of intrusion – testaments to the creative power of cultural fusion.
References
Primary Chinese Historiography
Sima Qian. Shiji 史記 (Records of the Grand Historian).
Sima Guang. Zizhi Tongjian 資治通鑑 (Comprehensive Mirror to Aid in Government).
Han Dynasty records on Jiaozhi 交阯 / 交趾 and Jiaozhou 交州.
Tang Dynasty records on Annam Đôhộphủ 安南督護府 ('Protectorate General to the Pacified South').
Foundational Linguistic Studies
Haudricourt, A. G. (1954). De l’origine des tons en vietnamien. Journal Asiatique.
Karlgren, B. (1957). Grammata Serica Recensa. Stockholm: Museum of Far Eastern Antiquities.
Ferlus, M. (2012). Trade routes and sound change patterns in Vietnamese cognates across Southeast Asia.
Comparative and Historical Scholarship
Terrien de Lacouperie. (1887; reprinted 1966). The Languages of China Before the Chinese. London.
Boodberg, Peter A. (1979). Philological Notes on Turko‑Mongolic Influences in Early China.
Ruhlen, Merritt. (1994). The Origin of Language: Tracing the Evolution of the Mother Tongue.
Darwin, Charles. (1871). The Descent of Man. London.
Vietnamese Scholarship
Nguyễn Ngọc San. (1993). Tìm hiểu về Tiếng Việt Lịch sử. Hanoi: Nhà xuất bản Giáo dục.
Bình Nguyên Lộc. (1972). Nguồn gốc Mã Lai của dân tộc Việt Nam. Saigon.
Phan Hữu Dật. (1998). Nhân học Việt Nam. Hanoi.
Modern Historiography
Kiernan, Ben. (2017). Việt Nam: A History from Earliest Times to the Present. Cambridge University Press.
Bo Yang. (1993). The Ugly Chinaman and the Crisis of Chinese Culture. Vols. 69–71.
Lexical and Glossarial Sources
Erya 爾雅 (SV Nhĩ Nhã). Qin/Han‑era lexicon preserving non‑Chinese terms.
Fangyan 方言 ('regional speech'). Yang Xiong's Han‑era compilation of dialectal vocabulary.
FOOTNOTES
(1)^ Peguan is an older term for the Mon language and culture, historically centered in Lower Myanmar (Burma). It refers to the Mon people, their Austroasiatic language, and traditions, especially around the old capital of Pegu (modern Bago)