Polysyllabicity And Dual Inheritance
by dchph
This article redefines Sinitic‑Vietnamese (VS) as a comprehensive framework for analyzing the Chinese‑derived stratum of Vietnamese. Unlike the narrower Sino‑Vietnamese (SV), VS encompasses vernacular adaptations, pre‑Han loans, and parallel doublets. Drawing on comparative evidence from Yue substratum studies, Middle Chinese reconstructions, and Vietnamese etymology, the paper argues that Sintic-Vietnamese is not a residue of borrowing but a structural system that shaped Vietnamese identity.
The methodology of polysyllabicity and nucleus‑based grouping is introduced as a tool for tracing layered etymologies. Case studies demonstrate how Sinitic‑Vietnamese reframes the Austroasiatic vs. Sino‑Tibetan debate and highlights Vietnam's dual inheritance.
I) Introduction
- Sinitic‑Vietnamese (VS) designates the full body of Chinese‑derived vocabulary that has been localized within the Vietnamese linguistic environment. This category is broader than the conventional Sino‑Vietnamese (SV) subset and includes several distinct strata:
-
Sino‑Vietnamese (SV): A codified subset rooted in Middle Chinese phonology, functioning in Vietnamese much like Greco‑Latin loanwords in English.
-
Pre‑Sino‑Vietnamese forms: Older loans from pre‑Qin and Han eras, many with Old Chinese (OC) or Taic‑Yue origins.
-
Parallel forms: Doublets where one is formal‑literary and the other colloquial‑vernacular, sometimes diverging in meaning.
The scope of Sinitic‑Vietnamese thus encompasses all mono‑ and polysyllabic words of Chinese origin that have been naturalized in Vietnamese, including forms resembling Sino‑Vietnamese pronunciations but extending beyond the codified entries of a Hán‑Việt từđiển, including Từđiển Việt-Tàu.
Table 1 - Comparative Chinese → SV → VS → semantic innovation
| Chinese (Pinyin) |
Sino‑ Vietnamese (SV) |
Sinitic-Vietnamese (VS) | Phono-semantic innovation / Notes |
|---|---|---|---|
| 房 fáng | phòng | buồng | 'room, chamber' — regular shift b → f |
| 岸 àn | ngạn | cạn, bến | 'riverbank' — SV → VS multi-forms |
| 罷 bā | bãi | bỏ, mà | 'strike, stop, cessation' — semantic expanding |
| 畢 bí | tốt | rốt | 'graduation, completion' — regular shift b → t |
| 必 bì | tất | phải | 'inevitable, must' — regular shift b → t |
| 季 jì | quý | mùa | 'season, quarter' — semantic specialization |
| 節 jié | tiết | khớp | 'festival, node' — semantic extension |
| 偏 piān | thiên | nghiêng | 'bias, leaning' — irregular p → th |
| 匹 pí | thất | bốn, xấp | 'pair, match, lone' — irregular p → th, (Viet.), number 4, |
| 起 qǐ | khởi | dậy | 'rise, begin' — stable SV form |
| 兄 xiōng | huynh | anh | 'elder brother' — clipping hw → Øʔ |
| 煽 shǎn | phiến | phực | 'incite, fan' — regular sh → ph |
| 攝 shè | nhiếp | nhặt | 'act for, take' — regular sh → nh |
| 濕 shì | thấp | sấp | 'damp, humid' — regular sh → th |
| 灣 wān | loan | vịnh | 'bay' — stable SV form |
| 熄 xí | tức | tắt | 'extinguish' — regular x → t |
| 學 xué | học | nhái | 'study, learning' — irregular h → nh |
| 左 zuǒ | tả | trái | 'left' — possible interchange t → tr |
| 郵 yóu | bưu | vưu, 'bót' | 'postal' — modern semantic expanding |
| 貓 māo | miêu | mèo | Everyday vernacular for 'cat'; SV miêu in literary registers |
| 卯 mǎo | mão | mẹo, mèo | Zodiac vs. colloquial divergence; doublet layering |
| 佛 Fó | Phật | bụt | Formal religion vs. folk tales; early Indic loan adapted via Chinese |
| 婦 fù | phụ | bụa, vợ | SV and VS everyday term; phono-semantic expanding |
| 車 chē | xa | xe, cộ | Metathesis and doublet coexistence in vernacular |
| 公 gōng | công | ông; trống; cồ; cụ | Kinship and animal terms localized; semantic spread |
In this framework, Sinitic‑Vietnamese is not merely a collection of borrowings but a layered system. It blends foundational items rooted in the Yue substratum, incorporates Old Chinese elements, and is further enriched by the Sino‑Vietnamese layer of Middle Chinese loans. It also includes lexical items introduced through contact with northern Mandarin varieties during the nine centuries of Chinese colonial administration in Annam (111 B.C.-939 A.D.). Over time, these strata developed under the influence of successive dynasties, producing a system comparable to the Sinitic layers found in Cantonese and Hokkien.
By convention, the term Sino‑Vietnamese (漢越 Hán‑Việt) refers specifically to the systematic Vietnamese pronunciations of Chinese vocabulary, much like Latin‑ or Greek‑derived terms in English. These forms reflect Middle Chinese phonology filtered through Vietnamese sound patterns, paralleling the way Cantonese developed from the same courtly pronunciations.
Each lexical stratum carries its own history. The term Yue (越, 粵, 戉, 鉞, etc.), as used in Chinese classics, denotes the indigenous southern substratum upon which Sinitic‑Vietnamese was imposed. Archaeological and textual evidence suggests Yue communities pre‑dated the ethnolinguistic entity now called "Chinese" by millennia. Labels such as Sinitic or Sino‑ are therefore best understood as scholarly shorthand: imperfect but useful for comparative purposes. While "Chinese" is anachronistic for pre‑Qin contexts, it remains the most accessible term for a broad scholarly audience.
Etymologically, many foundational Vietnamese words currently classified as Austroasiatic (Mon‑Khmer) may instead descend from a shared Yue root. This hypothesized Taic‑Yue substratum predates proto‑Vietic and contributed to both the Việt‑Mường group and other Daic languages. Elements of this layer are also discernible in southern Chinese lects, suggesting a deeper regional interconnection.
In practice, Sinitic‑Vietnamese and Sino‑Vietnamese function in tandem. They complement one another across registers, from classical texts to modern colloquial speech, underscoring the extent to which Vietnamese identity is interwoven with Chinese linguistic material, for example:
- mẹo, mèo: 卯 mǎo (SV mão) vs. 貓 māo (SV miêu, VS mèo) for 'cat'
- mẹo 謀 móu (SV mưu) vs. (1) mưulược, (2) mánhlới, (3) mưuchước 謀略 móulüè (SV mưulược)
The classification of Vietnamese has long been contested. Since Haudricourt's (1954) demonstration of tonogenesis, the dominant view has placed Vietnamese within Austroasiatic. Yet this affiliation has never been beyond dispute. The unusually rich presence of Chinese, Tai‑Kadai, and Austronesian elements complicates the picture. While recent surveys emphasize Austroasiatic roots, the sheer scale of the Sinitic component, constituting 80-95% of the modern lexicon, demands a broader framework.
This article therefore proposes Sinitic‑Vietnamese as a more accurate category for analysis. Unlike the narrower Sino‑Vietnamese subset, Sinitic‑Vietnamese encompasses vernacular adaptations, pre‑Han loans, and parallel doublets. By foregrounding polysyllabicity, we can trace the semantic and phonological layering that defines Vietnamese as a language of dual inheritance, bridging Austroasiatic, Yue, and Sino‑Tibetan traditions.
II) Beyond borrowing
The case studies illustrate that Sinitic‑Vietnamese is not a passive borrowing layer but a structural system. The coexistence of vernacular and learned forms parallels Cantonese doublets (colloquial vs. literary readings), suggesting Vietnamese should be analyzed alongside southern Sinitic lects rather than apart from them.
The Yue substratum further complicates classification. If Yue languages were themselves Kra‑Dai or Sino‑Tibetan, then Vietnamese inherits not only Austroasiatic features but also deep Sino‑Tibetan strata. This supports a dual inheritance model, reopening the debate on whether Vietnamese should remain classified as Austroasiatic or be reconsidered as part of a broader Sino‑Tibetan family.
A) Austroasiatic hypothesis: Haudricourt (1954) and subsequent Vietic reconstructions emphasize Austroasiatic phonological correspondences.
In 1954, André‑Georges Haudricourt published his landmark study La place du Vietnamien dans les Langues Austroasiatiques, which permanently altered the classification of Vietnamese. He argued that Vietnamese belongs to the Austroasiatic family, specifically the Vietic branch, positioned between the Palaung‑Wa group to the northwest and the Mon‑Khmer group to the southwest.
Haudricourt's most influential contribution was his theory of tonogenesis. He demonstrated that Vietnamese tones were not inherited from Chinese but developed secondarily from the loss of final consonants and glottal stops, a process also attested in other Austroasiatic languages. This discovery showed that tones could emerge naturally from phonetic erosion, without requiring Sinitic ancestry.
The implication was decisive: Vietnamese is not genetically Sinitic. Its tonal system and much of its phonology can be explained through Austroasiatic historical processes, while the massive Chinese vocabulary is a later overlay. Haudricourt thus reframed Vietnamese as an Austroasiatic language with a heavy Sinitic superstratum, rather than a "mixed" or "Sinitic offshoot."
This hypothesis became the baseline classification for Vietnamese in modern linguistics. Later scholars such as Michel Ferlus, Mark Alves, and Laurent Sagart have refined or challenged aspects of Haudricourt's model, but his Austroasiatic hypothesis remains the foundation against which alternative views, including Sino‑Tibetan or Yue‑substratum theories, must be measured.
The author rejects Haudricourt's claim that Vietnamese tones had already stabilized by the end of the sixth century. The reason is straightforward: evidence from the Pre‑Han Vietnamese lexicon shows that most items already exhibited at least four tonal distinctions in the lower registers, rather than the full set of eight tones divided across both upper and lower registers. Moreover, the loanword models to be discussed next will rebut such theory.
B) Loanword models: Traditional accounts treat Chinese elements as "borrowings" but that rote borrowing models fail to explain the breadth of Vietnamese Sinitic integration.
Traditional accounts of Sino‑Vietnamese vocabulary have often relied on what might be called a "rote borrowing" model, in which Chinese words are assumed to have entered Vietnamese in much the same way that Greco‑Latin terms entered English, or as Sino‑Korean and Sino‑Japanese readings were transmitted: through the memorization of written glosses, largely detached from spoken interaction. Yet this model has proven inadequate for the Vietnamese case. As John Duong Phan's Cornell dissertation Lacquered Words: The Evolution of Vietnamese Under Sinitic Influences (2013; circulated widely by 2018) demonstrates, the sheer scale and depth of the Sinitic layer in Vietnamese cannot be explained by rote scholastic borrowing alone. Instead, Vietnamese absorbed Chinese vocabulary through centuries of sustained bilingual contact, vernacular adaptation, and layered phonological shifts. The result is not a residue of learned glosses but a living system of semantic and phonological integration, one that shaped the very structure of the language.
In his Cornell dissertation, John Phan challenged the long‑standing assumption that Sino‑Vietnamese vocabulary entered the language through rote memorization of written glosses, as in the case of Sino‑Korean or Sino‑Japanese. Phan argued that this "rote borrowing" model cannot account for the sheer scale and depth of Sinitic integration in Vietnamese, where as much as three‑quarters of the lexicon is of Chinese origin.
Instead, he demonstrated that Vietnamese absorbed Chinese vocabulary through centuries of sustained bilingual contact, vernacular adaptation, and layered phonological shifts. The result was not a detached body of learned readings but a living system that permeated everyday speech, literature, and cultural identity.
Phan's work reframes the Sinitic layer as a structural component of Vietnamese, not a residue of external borrowing. This insight aligns with the broader argument advanced here: that Sinitic‑Vietnamese should be treated as a comprehensive system, comparable to southern Sinitic lects such as Cantonese, rather than as a peripheral borrowing stratum.
C) Sino‑Tibetan hypothesis: Sagart and others have argued for deeper Sino‑Tibetan connections, citing parallels in basic vocabulary and morphology.
While the Austroasiatic classification of Vietnamese has long been the mainstream view, some scholars, most notably Laurent Sagart, have argued for a deeper connection between Vietnamese and the Sino‑Tibetan family. This alternative hypothesis suggests that Vietnamese is not simply an Austroasiatic language with a heavy Chinese overlay, but may share ancestral roots with Old Chinese and Tibeto‑Burman languages.
Sagart and others point to parallels in basic vocabulary – words for everyday concepts such as 'eat', 'drink', 'water', 'cow', or 'sea' – that are difficult to explain as later borrowings. These are the kinds of words that usually resist borrowing, which makes their presence across Vietnamese and Sino‑Tibetan languages especially striking. In addition, scholars have noted morphological similarities, such as affixal patterns and word‑formation strategies, that align Vietnamese more closely with Tibeto‑Burman than with Mon‑Khmer.
Phonological evidence also plays a role. The development of tones in Vietnamese, long explained by Haudricourt as a secondary innovation within Austroasiatic, can also be compared to similar processes in Tibeto‑Burman languages. This raises the possibility that Vietnamese and Sino‑Tibetan languages share not only contact phenomena but also common historical mechanisms.
The implications are significant. If Vietnamese does indeed share a genetic relationship with Sino‑Tibetan, then it should be seen as part of a larger continuum that includes Chinese, Tibeto‑Burman, and, of course, Yue substratal languages. This would challenge the long‑standing Austroasiatic consensus and reposition Vietnamese as a bridge language, reflecting its geographic and cultural role at the crossroads of southern China and mainland Southeast Asia.
Not all scholars are convinced. Many argue that the similarities Sagart highlights can be explained by intense contact and borrowing, rather than shared ancestry. For now, the Austroasiatic classification remains the standard in reference works. Yet the Sino‑Tibetan hypothesis continues to attract attention, especially as new comparative data emerges. It keeps alive the possibility that Vietnamese may one day be reclassified, not as a peripheral Austroasiatic language, but as part of the broader Sino‑Tibetan family.
D) Yue substratum: Studies of Old Yue languages suggest a southern base that predates "Chinese" proper, aligning with Vietnamese substratal evidence.
Long before the Qin and Han dynasties consolidated what we now call "Chinese", the Red River Delta and Lingnan Corridor were home to the Yue peoples. Their languages, sometimes linked to Kra‑Dai, Austroasiatic, Austronesian, or even Hmong‑Mien families, formed a southern linguistic base that predates "Chinese proper". Archaeological finds and early texts, including the Song of the Yue Boatman (越人歌 – 528 BCE), attest to a distinct speech tradition in the south, separate from northern Sinitic varieties. (2)
For Vietnamese, this Yue substratum is visible in core vocabulary and cultural domains that cannot be explained as later Chinese loans. Words like cá ('fish'), ông ('elder'), and cộ ('cart') show parallels with Yue‑derived forms in Cantonese, Hokkien, and other southern lects. Even the ethnonym Việt (越) encodes this heritage: "Việtnam" literally means "the Yue of the South".
The idea of a Yue substratum refers to the linguistic and cultural layer contributed by the ancient Yue peoples of southern China and northern Vietnam, who inhabited the region long before the consolidation of a "Chinese" identity under the Qin and Han dynasties. Studies of Old Yue languages – sometimes linked to Kra‑Dai, Austroasiatic, Austronesian, or even Hmong‑Mien families – suggest that these communities formed a southern linguistic base that predates what we now call "Chinese proper." Archaeological and textual evidence, including the famous Song of the Yue Boatman (528 BCE), indicates that Yue speech was distinct from northern Sinitic varieties and left enduring traces in the lexicons of both Vietnamese and southern Chinese lects such as Cantonese and Hokkien.
By aligning Vietnamese substratal evidence with what is known of Old Yue, supposed represented by the lect spoken by subjects of the Chu State as diplomatic language as record in the Erya (爾雅) dictionary, scholars argue that Vietnamese did not simply absorb Chinese vocabulary from above, but grew out of a Yue‑based linguistic ecology that was later overlaid with Han and Tang Sinitic layers. This perspective reframes Vietnamese as a Yue‑descended, Sinitic‑integrated language, rather than a peripheral Austroasiatic offshoot. It also helps explain why Vietnamese shares so many structural and lexical features with southern Chinese lects: both are heirs to the same Yue foundation, subsequently reshaped by waves of Sinicization.
III) Methodology
This study applies the principle of polysyllabicity, grouping forms nucleus‑based rather than isolating monosyllables. This approach makes it possible to:
A) Identify layered etymologies: tracing developments from monosyllabic cores, binomial coinage, disyllabic synonyms, and doublets that emerge through polysyllabic adaptation.
B) Track semantic divergence across registers: showing how the lexicon evolves independently of tonal constraints or the assumptions of traditional tonogenesis.
C) Situate Vietnamese within a Yue‑Sinitic continuum: demonstrating that parallels with Cantonese, Hokkien, and other southern lects reflect shared Yue ancestry as much as later Sinitic influence.
The data for this analysis are drawn from comparative Sino‑Tibetan materials, historical dictionaries, reconstructions of Old and Middle Chinese phonology, and annotated corpora of Vietnamese vernacular usage. Together, these sources provide the evidentiary base for a systematic re‑evaluation of Vietnamese etymology.
This paper therefore lays the conceptual and methodological foundation for analyzing Sinitic‑Vietnamese, a core stratum of Vietnamese linguistic identity. Sinitic‑Vietnamese designates the deeply naturalized layer of Chinese‑derived vocabulary, shaped through centuries of sustained contact with northern Sinitic lects and the wider Sino‑Tibetan world. Viewed through an interdisciplinary lens, Sinitic‑Vietnamese embodies the cumulative integration of Chinese elements into Vietnamese, forged through dynastic administration, cultural transmission, and vernacular adaptation across the long arc of Annamese history.
In scope, Sinitic-Vietnamese encompasses all Chinese‑derived vocabulary localized in Vietnamese. Within this domain, Sino‑Vietnamese forms a codified subset rooted in Middle Chinese phonology. Emerging during the Han and Tang periods, Sino‑Vietnamese provided the backbone of administrative, literary, and colloquial registers. Crucially, Sino‑Vietnamese is not a static residue of borrowing but a living system of semantic and phonological adaptation, continuously reshaped by the interplay of learned and vernacular usage.
The next section will also introduce illustrative examples of Sinitic‑Vietnamese vocabulary whose etymologies, despite clear Sinitic or Sino‑Tibetan origins, have often been misclassified as Mon‑Khmer. These cases highlight the methodological challenges and historical misattributions that have shaped the field.
The discussion extends into a new frontier of Vietnamese historical linguistics: the identification of prominent Sino‑Tibetan (漢藏 Hàn‑Zàng) etymological evidence, to be elaborated in the author's Parallels with the Sino‑Tibetan Languages. One of the primary objectives here is to establish a structured methodology for investigating this evidence, thereby reopening the long‑standing debate over whether Vietnamese should be reclassified as a member of the Sino‑Tibetan family.
By framing Sinitic‑Vietnamese as a comprehensive category, broader than the narrower Sino‑Vietnamese subset, this chapter situates Vietnamese within a Yue substratum and proposes a Sino‑Tibetan affiliation based on phonological and semantic evidence. This challenges the conventional Austroasiatic classification and underscores the need for methodological renewal.
Cultural domains such as the zodiac, agriculture, and literary traditions further demonstrate the enduring influence of Yue‑Taic heritage. Lexical and idiomatic examples – mẹo (卯), ngọ (午), gà (雞), trống (雄), cồ (公), mái (母), and the colloquial phrase "Bấtkể ai nóigànóivịt, mình chỉ nóingang" (不管 講雞講鴨, 我 只 講鵝) – illustrate bidirectional transfer and deep‑rooted cognates.
By integrating historical periodization, comparative linguistics, and typographic precision, this chapter lays the groundwork for a polysyllabic annotated lexicon and a revised linguistic historiography. It advocates for a reclassification of Vietnamese and a more nuanced understanding of its Sinitic layers, with the goal of advancing both methodological clarity and scholarly accessibility.
IV) Case studies
A) Polysyllabic approach to Sinitic‑Vietnamese etymology
Polysyllabicity, as the central principle, enables the identification of layered etymologies and semantic shifts across registers.
This study adopts a polysyllabic, or, exactly, disyllabic‑centered approach to the Vietnamese etymology of Chinese origin. By recognizing the fundamentally disyllabic character of Vietnamese, we move away from treating sound change as a series of isolated phonemic substitutions. Instead, we analyze it as the dynamic transformation of entire syllabic clusters. This perspective parallels the evolution of Latin polysyllabic roots, which generated diverse lexical forms across the Indo‑European languages.
Accordingly, Vietnamese disyllabic words are here transcribed in combined formation, mirroring Mandarin pinyin conventions, to reflect their structural integrity, for example,
- 廢話 fèihuà "nonsense" → bahoa ~ baphải
- 溫馨 wēnxīng "warm" → ấmcúng
- 開心 kāixīn ~ 高興 gāoxìng "pleased" → vuilòng
What is striking about such formations is that the sound changes between syllables often diverge dynamically and dramatically from their original phonological forms. These shifts are not random; they reflect systematic phonological processes that reshape both sound and meaning. This paper examines these processes in depth to explain why many Vietnamese words of Chinese origin appear so distinct from their sources.
Multiple sound changes within a single syllable of a disyllabic compound may reveal broader patterns, but they can also mislead readers into thinking the results are irregular or ad hoc. The aim here is to demonstrate that disyllabic sound change is systematic, historically grounded, and central to identifying the vast corpus of Chinese‑derived vocabulary in Vietnamese.
Take the example of bahoa. The phonological shift from 費 fèi to ba can be reconciled, but the semantic connection is less straightforward. In Vietnamese, ba does not relate to meanings such as 'three', 'father', or 'tortoise'. Instead, it reflects a sound change pattern /f-/ ~ /hw-/, comparable to the interchange observed between MinNan dialects and Mandarin /f-/. Conceptually, however, /fèi/ aligns more closely with phế 'waste' and bỏ (廢 fèi, 'abandon'), both carrying connotations of rejection or uselessness.
Crucially, the syllables ba- and -hoa in bahoa do not function independently in Vietnamese. As bound morphemes, they combine into a single disyllabic semantic unit: bahoa 'nonsense'. In this case, one plus one yields one unified meaning, not two separate definitions. The same principle applies to baphải.
By contrast, the semantic evolution of fèi into bỏ- is more transparent. Consider:
- bỏphế 費除 fèichú "eradicate"
- bỏđi 費棄 fèiqì "abandon"
- đồbỏ 費物 fèiwù "the unwanted" (metathesis)
- bỏhoang 荒費 huāngfèi "deserted" (metathesis)
Yet even here, bỏ is not exclusively tied to 費 fèi. Sound changes from Chinese to Vietnamese, especially in disyllabic compounds, are manifold and context‑dependent. To illustrate, additional Vietnamese expressions derived from Chinese disyllables yield homophones with bỏ:
- bãibỏ 排除 páichú "abolish"
- bỏphiếu 投票 tóupiào "cast a ballot"
- vứtbỏ 抛棄 pàoqì "discard" (metathesis)
- bỏđi 放棄 fàngqì "let go"
- bỏqua 放過 fàngguò "let go"
- bỏmặc 不理 bùlǐ "abandon"
- bỏlỡ dịpmay 放過機會 fàngguò jīhuì "miss an opportunity" (~ bỏqua dịpmay)
- bỏtiền (vô túi) 放錢 (進入 口袋 里) fàngqián (jìnrù kǒudài lǐ) "put money into the pocket"
- bỏtiền ra mua 花錢來買 huàqián lái măi "spend money to buy"
- bỏphí 白費 báifèi "to waste"
- bỏrơi 摽落 piāoluò "abandon"
- rờibỏ 拋離 pāolí "desert" (metathesis)
These examples show that Vietnamese disyllabic formations often diverge both phonetically and semantically from their Chinese origins. The transformations reflect adaptive processes of reordering, semantic realignment, and phonological reshaping.
The emergence of bỏ in these compounds, along with other lexical innovations, highlights the interplay of phonological assimilation, semantic extension, and syntactic reordering, particularly through reversed word structure. Compounds such as đồbỏ and bỏhoang likely arose as local adaptations to fit Vietnamese syntactic habits.
B) Yue-layered stratum and commonly-shared cultural etyma
The scope of Sinitic‑Vietnamese sometimes extends loosely to include other strata: forms traceable to Old Chinese (OC), also referred to as Archaic Chinese (ArC), Ancient Chinese (AC), and occasionally Early Middle Chinese (EMC) as well. It may also encompass the class of "Tiền‑Hán‑Việt", or pre‑Sino‑Vietnamese loanwords from pre-Qin-Han era, along with their Vietnamese variants, some of which may date back to proto‑Chinese origins.
Such archaic forms belong to various pre‑Han linguistic stages, representing ancestral precursors to OC in the pre‑Qin era, centuries before present (B.P.). Over time, Sino‑Tibetan and Sinitic etyma circulated bidirectionally between Chinese and ancient Vietnamese lexicons, undergoing changes in both form and meaning, for example,
-
bụt, Phật, vãi: 佛 Fó (SV Phật) [M 佛 Fó, fú, bó, bì (Phật, bột, phất, bất) < MC but, phut < OC *bɯd || Note: Derived from 'Buddha' in Sanskrit, cf. VS 'bụt' > SV 'Phật'. Cantonese: fat42, Wenzhou 溫州: vai42. In Vietnamese, 'Bụt' preceded the later equivalent of Buddha, which gives rise to variants Buddha, Buddhist, Buddhist monk. ]
-
bụa, phụ, vợ: 婦 fù (SV phụ) [ M 婦 (媍) fù < MC buw < OC *bɯʔ || cf. 'goábụa' 寡婦 guăfù (widow), 'vợchồng' 公母 gōngmǔ (wife and husband), meaning wife, lady, woman. ]
-
chài, lưới, chàilưới, là: 羅 luó (SV la) [ M 羅 luó < MC la < OC *ra:l || cf. 羅 luó (SV la) + 羅 luó (VS lưới) : net-fishing, bird net, net. ]
-
cộ, xe, xecộ, cỗ, cỗxe: 車 chē (SV xa) [ M 車 chē, jū, jù < MC cʰia, kɨə̆ < OC *kʰlja, *kla || cf. 'xe' 車 chē, 'cộ' 檋 jù (SV cục) and possible cognate Cantonese 架車 /kache/) : carriage, car, modern automobile. ] (1)
-
ông, trống, cồ: 公 gōng (SV công) [M 公 gōng < MC kəwŋ < OC *klo:ŋ || cf. 雞公 jīgōng 'gàcồ' ~ 'gàtrống' (rooster), 主公 zhǔgōng 'ôngchủ' (master), 公母 gōngmǔ (trốngmái, vợchồng): duke, public, senior male figure, man of authority, grandfather, husband's father, rooster. ]
These examples illustrate that Sinitic‑Vietnamese is not a passive borrowing layer but a structural system. The coexistence of vernacular and learned forms parallels Cantonese doublets (colloquial vs. literary readings), suggesting Vietnamese should be analyzed alongside southern Sinitic lects rather than apart from them.
The Yue substratum further complicates classification. If Yue languages were themselves Kra‑Dai or Sino‑Tibetan, then Vietnamese inherits not only Austroasiatic features but also deep Sino‑Tibetan strata. This reopens the debate on whether Vietnamese should remain classified as Austroasiatic or be reconsidered as part of a broader Sino‑Tibetan family
The divergence between these linguistic classifications stems largely from their synchronic mode of analysis. For example, the term 'Sinitic', though historically tied to the Qin State of the 3rd century B.C., is retroactively applied to proto-Chinese formations that predate the Qin Dynasty by millennia, reaching back beyond the Shang and Xia dynasties to encompass over five thousand years of linguistic development.
Modern Vietnamese began to take shape in the 12th century with a majority of Sinitic-Vietnamese vocabulary can be traced across the past three millennia through Chinese historical records (Nguyễn Tài Cẩn, 1978). In prehistorical period, however, research on Yue origin of Vietnamese requires engagement with alternative hypotheses, such as those proposed by De Lacouperie (1887) and even scholars of the Austroasiatic Mon-Khmer school, which offer provisional frameworks for understanding deeper linguistic relationships.
In the early 20th century, Vietnamese used to be classified a of Sino-Tibetan language. Nevertheless, there was not a notable research on such supposition.
To make that happen, this research, drawing on extensive comparative analysis, isolates newly identified Vietnamese terms attested within Sino-Tibetan languages. Following exemplified cases are for illustrations of how close their etyma:
-
"bồng" ~ "bế" 抱 bào (SV bão): 'carry' [ N. Ass. Midźu ba (N),Taying ba (N) (p. 186), E. Nyising bü (p. 194) | (Haudricourt) Daic Siamese peek, Lao ɓɛk, Shan mɛk, Tay Noir, Tay Blac ɓɛʔ, Tho bɛk || cf. Hainanese /boŋ2/ ]
-
"biển" ~ "bể" 海 hăi (SV hải ~ VS "khơi"): 'the sea' [ Sino-Tibetan: M. Bur. pań-lay, Karenic *pań, Pwo pə9-lai28, Sgaw pä7-lâ7, p@7-lâ7 || cf. Cantonese /hoi2/ for VS "khơi" as in "rakhơi" @ 出海 chūhǎi (SV xuấthải, 'set sails'), "ngoàikhơi" @ 海外 hǎiwài (SV hảingoại, 'be out at sea') ]
- "bò" 牝 bì (SV bí): 'cow' [ OB ba, OB E. *bik || A W. Bod. Burig bā (p. 83), Groma, Śarpa bo (calf), Dangdźongskad, Lhoskad ba (p. 93), Central Bodish Lagate pa-, Spiti, Gtsang, Dbus, Ãba bʿa, Mnyamslad, Dźad pa (p. 98), other Bod. languages Rgyarong (ki)-bri, -bru (p. 120), modern Bod. dialects New Mantśati (bullock), Tśamba Lahuli (ox) bań, Rangloi bań-ƫa (bullock) (p. 130) || also Chin. 牝 byi/ (Chin. cow, female of animal), OB ãbri-mo (tame female yak) (p. 59), Minor group Toţo pik-(a), Dimal pi-(a) (p. 187), Southern Branch Kukish *b@ń, Luśei b@ń, Thado boń, Vuite -b@ń- (p. 250), E. Himalayish bʿi, Khambu pi', Lohorong, Yakhha pik (p. 330) | for 'buffalo': Luśei pă-na, Khami *mă-na, Karenic *-na-, Karenni pæ2-nä2, Pwo pə1-na6, Sgaw pə2-nə8, Bwe pa-nä2 (p. 414) | (Haudricourt) Chin. ńǔ- 牛 (M níu), Siamese ŋwă, Lao, Tay Noir ńuo, Shan, Tay Blanc ńo, Tho, Nung mɔ, Sui mo, Mak pho (p. 501) ]
-
"ăn" (唵 ǎn, SV àm): 'eat' [ Also VS "ngậm" (hold in the mouth) || M àn 唵 ʿām-, Luśei *um, Siamese ʿ@m (p. 71) || Note: 唵 àn is plausibly cognate to VS 'ăn' or eat. As Sino-Tibetan scholars, Shafer or Haudricourt should switch this word with their M hán 含 ɣām-. Kangxi Dictionary define this entry as 'eat with the hand.' ]
-
"nước" (淂 dé, SV đắc): 'water' [ In semantic alignment with 'water' as define in the Kangxi Dictionary as 'Guangyun - Entering Tone - 德·德': 淂 'appearance of water'. Also read with the fanqie 丁力切. 'Kangxi Dictionary - Water Section - Eight': 淂 in Guangyun, read 都則切; in Jiyun, read 的則切. Both pronounced 德. 'Yupian': means "water." Also glossed as "appearance of water." Additionally, Guangyun records 丁力切, pronounced 滴. The meaning is the same. || cf. Proto-Vietic *ɗaːk, Cantonese /dak1/ || cf. (Haudricourt) Daic Siamese ʾnām, Shan, Sui, Mak nam, Lao, Tho, Ahom, Tay Noir, Tay Blanc, Dioi, Mak năm, Nung ram, Bê nɔm, Li nom, nəm (p. 482) ]
As a result, the scope of inquiry expands beyond Vietnamese-Chinese (越漢 YuèHàn, or 'Sinitic-Vietnamese') cognates to encompass etymologies distributed across the broader Yue and Sino-Tibetan spectra. This expanded scope includes reflexes traceable to proto-Chinese (上古 漢語 Shànggǔ Hànyǔ) and pre‑Qin-Han strata, with evidence of bidirectional lexical transfer between ancestral Yue (越) and Sinitic (漢 Hàn) domains. In doing so, the analysis directly challenges established Austroasiatic theories that assert a Mon‑Khmer (MK) origin for Vietnamese, backed by Sino-Tibetan or – 'Bod' (3) etyma, offering substantial support to the Sino-Tibetan hypothesis. This re‑evaluation is grounded in shared phonological innovations, semantic correspondences, and structural patterns documented across the Sino-Tibetan continuum, all framed within the polysyllabicity principle for rigorous cross‑linguistic comparison.
Defining Sinitic‑Vietnamese as a category moves us beyond the narrow 'loanword' model. It is not simply a residue of borrowed terms but a structural system that organizes entire semantic domains — religion, governance, kinship, and more. Its close parallels with southern Sinitic lects indicate that Vietnamese should be studied in tandem with Cantonese and Hokkien, rather than in isolation.
While this layered inheritance supports a dual model in which the Yue substrate combines with the Han superstrate, the Yue substratum itself complicates classification. If Yue belonged to Kra‑Dai or to Sino‑Tibetan, then Vietnamese necessarily inherits multiple deep strata.
Conclusion
By reframing Sinitic‑Vietnamese as a comprehensive system, a framework encompassing all Chinese‑derived vocabulary localized in Vietnamese. It integrates pre‑Sino‑Vietnamese loans, codified Sino‑Vietnamese, and vernacular doublets into a dynamic system. By defining Sinitic‑Vietnamese in this way, we gain a clearer picture of Vietnamese as a language of layered inheritance, bridging Austroasiatic, Yue, and Sino‑Tibetan traditions.
This article challenges the reduction of Chinese elements to the notion of "loanwords". Vietnamese emerges as a language of dual inheritance, where Yue substratum and Han superstratum intertwine. Polysyllabicity reveals layered etymologies that support a reclassification debate with implications for both Vietnamese and Sino‑Tibetan studies.
Future research should expand the corpus of doublets, integrate phonological reconstructions, and map semantic domains across East and Southeast Asia. Such work will not only refine Vietnamese classification but also contribute to a more nuanced understanding of language contact in the region.
References
- Aitchison, Jean. Language Change: Progress or Decay? Cambridge University Press, 1994.
- Alves, Mark J. "Categories of Grammatical Sino‑Vietnamese Vocabulary." Mon‑Khmer Studies 37 (2007): 217–229.
- Alves, Mark J. "What’s So Chinese About Vietnamese?" In Papers from the Ninth Annual Meeting of the Southeast Asian Linguistics Society, 2001.
- Anttila, Raimo (ed.). Historical and Comparative Linguistics. Amsterdam/Philadelphia: John Benjamins, 1989.
- Baxter, William H. & Sagart, Laurent. Old Chinese: A New Reconstruction. Oxford University Press, 2014.
- Bloomfield, Leonard. Language. New York: Henry Holt, 1933.
- Bynon, Theodora. Historical Linguistics. Cambridge: Cambridge University Press, 1977.
- Edmondson, Jerold A. "Tibeto‑Burman Languages of Việtnam." In Linguistics of the Tibeto‑Burman Area, 2002.
- Ferlus, Michel. Linguistic Evidence of the Trans‑Peninsular Trade Route from North Việtnam. Mahidol University / SIL International, 2012.
- Haudricourt, André G. "Comment reconstruire le chinois archaïque." Word 10 (1954): 351–364.
- Haudricourt, André G. "The Limits and Connections of Austroasiatic in the Northeast." In Studies in Comparative Austroasiatic Linguistics, ed. Norman Zide. The Hague: Mouton, 1961.
- Karlgren, Bernhard. Grammata Serica Recensa. Stockholm: Museum of Far Eastern Antiquities, 1957.
- Kelley, Liam C. "The Biography of the Hồng Bàng Clan as a Medieval Vietnamese Invented Tradition." Journal of Vietnamese Studies 7, no. 2 (2012): 87–122.
- Lü, Shih‑P’eng. Việtnam During the Period of Chinese Rule. Hong Kong: University of Hong Kong, 1964.
- Matisoff, James A. Sino‑Tibetan Etymological Dictionary and Thesaurus (STEDT). University of California, Berkeley, ongoing project.
- Nguyễn, Tài Cẩn. Giáo Trình Ngữ âm Lịch sử TiếngViệtnam. TP HCM: NXB Giáo dục, 2000.
- Nguyễn, Tài Cẩn. Nguồn gốc và Quá trình Hình thành Cách đọc Âm HánViệt. TP HCM: NXB Khoa học Xã hội, 1979.
- Peiros, Ilia & Starostin, Sergei. Comparative Vocabulary of Sino‑Tibetan Languages. Moscow: Nauka, 1996.
- Pulleyblank, Edwin G. Middle Chinese: A Study in Historical Phonology. Vancouver: University of British Columbia Press, 1984.
- Sagart, Laurent & Baxter, William. Old Chinese Reconstruction Project. 2011.
- Schuessler, Axel. ABC Etymological Dictionary of Old Chinese. University of Hawai‘i Press, 2007.
- Sidwell, Paul. "The Austroasiatic Central Riverine Hypothesis." Journal of Language Relationship 4 (2010): 117–134.
- Sun, Tianxin. Yuenan Han Ziyin de Lishi Cengci Yanjiu 越南漢字音的歷史層次研究. Taiwan Pedagogy College, 2011.
- Taylor, Keith Weller. The Birth of Việtnam. Berkeley: University of California Press, 1983.
- Wiens, Herold J. Han Chinese Expansion in South China. USA: Shoe String Press, 1967.
FOOTNOTES
(1)^ According to Starostin, in Middle Chinese 車 also reads /tʂa/, FQ 尺遮 (whence Mand. chē, Viet. xa), but this reading is rather recent (judging from rhymes in Guangyun 廣韻, not earlier than Eastern Han) and must have stemmed from some Old Chinese (OC) dialect. Vietnamese has also a colloquial loan from the same source, that is "xe" /sɛ/. If the reconstruction is indeed *kla, one could think of an early borrowing from OC, hence, "cộ". Meanwhile, interestingly, there exist also 檋 jù (SV cục) as "cộ" cognate to variants 檋, 輂, 輁, 梮) jù where the former characters having the phonetic M 車 chē, jū, jù [ M 車 chē, jū, jù < MC cʰia, kɨə̆ < OC *kʰlja, *kla ] with the latter lexicons are late development, as usual, of word-formed module {ideographic radical + signific stem}.
(2)^
VIỆTNHÂN CA
(Đỗ N. Thành dịch)
Năm nầy bảo với năm xưa
Thương chàng hoàng tử thương chiều chiều
xưa
Sớm chiều em hận tương tư
Mà ai hiểu đặng tình yêu sâu
đầy.
濫兮抃草濫予
Lạm hề biện thảo lạm dư
昌枑澤予昌州州
Xương hằng trạch dư xương châu châu
飠甚州焉乎秦胥胥
Thực thầm châu yên hồ tần tư tư
縵予乎昭
Mạn dư hồ chiêu
澶秦逾渗惿随河湖
Thìn tần du sâm, đề tuỳ hà hồ
(See: Sinitic-Vietnamese : APPENDICES)
(3)^ "An exonym for Tibet that appeared in Tang Dynasty. Some scholars argue the second syllable, 蕃, was originally read with the -n coda in Middle Chinese (i.e. pʉɐn or bʉɐn, the former of which regularly gives rise to modern Mandarin fān). They argue that the modern Tǔbō reading is recent, possibly originating from French sinologist Jean-Pierre Abel-Rémusat's (1788-1832) argument that the second syllable should be pronounced this way to match Old Tibetan བོད་ (bod, "Tibet") (Pelliot, 1915). Rhymes in poetry from Tang and Yuan dynasties also suggest that the second syllable 蕃 was read with the -n coda during those times (Yao, 2014). " (See 吐蕃 - Wiktionary)