In Vietnamese
dchph in collaboration with Copilot
Introduction
The foundational Sinitic-Vietnamese (VS) lexical stratum of the Vietnamese language, as examined in Chapter 1 and outlined in its Executive Summary, is not merely a passive accumulation of Chinese loanwords. Rather, it constitutes a dynamic, internally layered, and etymologically rich stratum shaped by sustained and multifaceted contact between northern Chinese lects and a Yue-derived proto-Vietic substrate.
This report presents a comprehensive analytical review of the Chapter 1 corpus, encompassing all cited etymons, semantic chains, and polysyllabic annotations. Its purpose is to demonstrate how the corpus substantiates the thesis that Sinitic-Vietnamese forms a core indigenous layer of the language, not a superficial literary veneer.
The structure of the report follows the established research directives: corpus architecture, etymological analysis, register stratification, comparative linguistic features, and semantic chain mapping. Each section unpacks the methodologies and findings of Chapter 1 with precision and clarity.y.
Corpus structure
Chapter 1’s investigation is anchored in a rigorously curated corpus designed to illuminate the stratified evolution and contact-induced dynamics of Vietnamese vocabulary. Far more than a static collection of lexical entries, the corpus functions as a multidimensional analytical framework—archiving, contextualizing, and stratifying vocabulary by origin, phonological development, and sociolinguistic integration.
Comprising roughly 800 to 900 lexical items, the corpus is intentionally selective rather than exhaustive. Its entries are drawn from a diverse array of sources: early written records such as ChữNôm texts, comparative data from both modern and archaic Vietic lects, contemporary Vietnamese usage, and systematic cross-referencing with Old and Middle Chinese reconstructions. Its internal architecture reflects both genetic lineage and areal convergence, with structural segmentation guided by principles of historical linguistics and contact typology.
The analytic scaffolding of the corpus includes:
Tripartite Sinitic stratification: Early Sino-Vietnamese (ESV), Late Sino-Vietnamese (LSV), and Recent Sino-Vietnamese (RSV) are aligned with recognizable sociohistorical periods: Han/Jin, Tang/Song, and Ming/post-Ming eras, respectively1 2.
Etymon-centered annotation: Each entry is subjected to rigorous multi-level annotation, specifying VS/SV forms, Middle Chinese (MC) and Old Chinese (OC) reconstructions, meanings, and critical comparative notes.
Polysyllabic annotation protocols: Compound and reduplicative structures are recorded with morpheme-by-morpheme glossing, accommodating both native analytic patterns and contact-induced forms3.
Layer-tagged lexical indexing: Items are explicitly tagged for stratum, register (literary, colloquial, vernacular), and-when sufficient data allow-socio-regional provenance and alternative readings.
Phonological and semantic extension: Sinitic-Vietnamese words derived from Sinitic compounds will produce more meanings than the original ones via sound changes conveying their associated semantic roots.
This corpus serves as both a philological instrument and a digital-ready platform for algorithmic annotation, enabling granular etymological tracing and macro-level register mapping.
The corpus is supported by a series of structured tables, each serving a
critical analytical function:
- Cross-referenced comparative reconstructions of Old Chinese (OC)16 vs. alongside Mark J. Alves’s proto-Sino-Vietnamese onset inventory9,
- Core indices for identifying lexical strata based on rime, onset, and tonal behavior,
- Cross-referenced tables linking Early Sino-Vietnamese (ESV) and Late Sino-Vietnamese (LSV) candidates, phonetic indices, and register-based doublets4 5.
Additionally, selected entries are annotated with comparative data from conservative Vietic languages such as "Rục" and "Mường", allowing for stratified anchoring across historical layers. This layered design results in a digitizable corpus well-suited for both philological analysis and algorithmic annotation workflows.
In summary, the corpus is both structurally rigorous and analytically versatile, designed to support micro-level etymological tracing as well as macro-level register mapping. It provides the evidentiary foundation necessary to distinguish the internally stratified, contact-generated Sinitic-Vietnamese stratum from later, more superficial borrowings4 6.
Etymological analysis
At the heart of Chapter 1 lies the etymological mapping of lexical entries, which underpins the broader argument concerning the depth, modality, and register of Sinitic influence in Vietnamese. The analysis organizes the corpus into reliable strata by applying phonological, semantic, and historical filters, referencing both Middle Chinese and Old Chinese reconstructions alongside comparative Vietic data.
Key criteria and techniques:
-
Phonological correspondences: The diagnostic use of segmental
correspondences, especially in initial (onset) and final (rime and coda)
positions, allows differentiation between early and late borrowings. For
example, words with Vietnamese voiced fricative onsets (v-, d-, gi-, g-)
correlating with complex OC clusters are proven markers of deeper, older
integration, frequently paralleling conservative forms in non-Vietnamese
Vietic lects (Rục, Mường)2.
-
Tonal evolution: Leveraging Haudricourt’s model, the chapter
scrutinizes how OC final *-s and *-ʔ gave rise to the three major
Vietnamese tonal sets (ngang/huyền, hỏi/ngã, sắc/nặng) and matches tone
shifts with historical borrowing periods2.
-
Compound and reduplication annotation: Etymons that recur as
components of compounds or reduplicants are tagged for polysyllabic
annotation, revealing not only contact-induced formations but also
morphosemantic layering3.
- Comparative etymology: Each etymon is cross-referenced for parallel forms in MC, OC15, Sino-Tibetan18, and other Vietic or Austroasiatic languages, ensuring that supposed borrowings are not, in fact, retentions or autochthonous innovations16 .
Below is a representative extract from the compiled etymon table for selected cornerstone entries.
|
Etymon |
VS/SV |
MC Reconstruction |
OC Reconstruction |
Gloss |
Comparative Notes |
|
劍 jiàn |
(thanh) |
kjəm |
**kə.ms > *kams |
sword |
Lenition [ɣ-], Rục təkɨəm → sesquisyllabic Vietic parallel |
|
鏡 jìng |
s-kương / gương / kiếng / |
kiajŋ |
**sk’raŋs > *kraŋs |
mirror |
Prefix **s- reflects Sino-Tibetan instrumentality; doublet preserved |
|
唱 chàng |
ʔ-ɕướng / xướng / khoan |
tɕʰiɐŋ |
**d̥raps > |
to chant |
Dummy prefix **ʔ- with Vietic comparanda |
|
公 gōng |
cồ / |
kəwŋ |
***qˤoŋ > *klo:ŋ |
public; |
Initial k- robust in SV; traceable in Mường, etc. Cf. Old Khmer khloñ, Proto-Tai *luŋᴬ |
|
魚 yú |
ngá / |
ŋɨə̆ |
*ŋa |
fish |
Nasal onset across SV and MC > glottal ʔ- > k-; Vietic “ngá”; cf. MinNan 魚汁 yúzhī 'catsup' |
|
海 hǎi |
khơi / |
həj |
**hmɯːs > |
sea |
Doublets mapped with LMC onset traits; Cf. 'mệ', 'mẹ' 母 mǔ (SV mẫu) , 每 (OC *mɯːʔ), 晦 (OC *hmɯːs, “dark”); in numerous Zhou texts 海 = 晦 huì (OC *hmɯːs) |
|
龍 lóng, lǒng, máng |
(thuồng-) |
luawŋ |
**r-loŋ > |
dragon; aquatic monster serpent; |
Rime /loŋ/ shares features with ESV and LSV; Cf. VS 'thuồngluồng', Khmer រោង (roong, “year of the dragon”), Thai มะโรง (má-roong, “dragon; year of the dragon”) |
|
大 dà |
đại / |
daj, da |
**lats > *da:ds |
big; |
Appears in all strata; LMC tonal distinctions observed and doublets preserved; Wang (1982) also lists 誕 OC *l'aːnʔ as cognate, Cf. VS 'lớn' (big) |
|
心 xīn |
tâm / |
sim |
**slɯm > *sə.m |
heart, |
Partial ESV preservation, comparable with early loans/transfers |
|
酒 jǐu |
tửu / |
tsɨu |
**ʔsluʔ > |
wine |
Texture identifies as RSV candidate (late borrowing) |
These and other entries are annotated with detailed Middle Chinese and Old Chinese phonological data drawn from the work of renowned scholars, including Bernhard Karlgren, Wang Li 王力, Li Rong 李榮, Zhao Rong-Fen 邵榮芬, Zhengzhang Shangfang 鄭張尚芳, Pan Wuyun 潘悟雲, Edwin G. Pulleyblank16, and the Baxter-Sagart reconstruction system7 5, and importantly Sino-Tibetan etymology colossus work by Shafer17, a goldmine of comparative etymology. Each etymon is cross-verified with reflexes found in conservative Vietic dialects such as "Rục" and "Mường" 8, as well as many Khmer lects, ensuring stratified accuracy and comparative depth.19 20.
Selected standout examples where Vietnamese words show strong cognate relationships with Sino-standout examples where Vietnamese words show strong cognate relationships with Sino-Tibetan roots, especially those from Chinese, Bodic, Burmic, and Daic branches17. The etymology runs cross virtually all categories of human languages such as Body Parts & Physicality , Verbs & Actions, Kinship & Social Roles, Food & Agriculture!
Body Parts & Physicality
| Viet | Meaning | Sino-Tibetan Cognates | Chinese Etyma | Notes |
|---|---|---|---|---|
| lưỡi | tongue |
Kukish lei, Bodish ltśe |
舌 shé, 脷 lěi |
脷 more plausible for VS; Cf. Cantonese 'lei6' |
| chân | leg/foot |
Bodish rkań,
Kukish kʿoń |
腳 jiǎo, 足 zú |
Also linked to 脛 jìng (VS cẳng) |
| bụng | belly |
Kukish puk, Burmish puik |
腹 fù | Rated ***** |
| mắt | eye |
Bodish mig, Kukish mik |
目 mù | Rated ***** |
| mũi | nose |
Bodish mtśʿul,
Kukish tśʿul |
鼻 bí | Rated ** |
Food & Agriculture
| Viet | Meaning | Sino-Tibetan Cognates | Chinese Etyma | Notes |
|---|---|---|---|---|
| bánh | cake |
Daic pɛń, Kukish piŋ |
餅 bǐng | Rated ****** |
| muối | salt |
Kukish tśi, Burmish śo-ra |
鹽 yán, 硝 xiāo |
Complex etymology |
| gừng | ginger |
Daic khiŋ, Kukish khiń |
薑 jiāng | Rated ****** |
| cá | fish |
Burmish ńa, Kukish kʿai |
魚 yú | Rated **** |
| cơm | rice |
Burmish tśa, Lolo tsa- |
膳 shàn | Also linked to 飯 fàn |
Kinship & Social Roles
| Viet | Meaning | Sino-Tibetan Cognates | Chinese Etyma | Notes |
|---|---|---|---|---|
| bà | grandma |
Kukish pi, Bodish pʿyi-mo |
婆 pó | 妣 bǐ cited but less plausible |
| bố | father |
Kukish pu, Burmish bʿui |
父 fù | Strong phonetic match |
| chị | elder sister |
Kukish tśar, Bodish ʾa-tśʿe |
姊 zǐ | Also compared with 姐 jiě |
| cháu | nephew; niece |
Kukish tʿu, Burmish tu |
姪 zhí | Semantic alignment with SV 'tỉ' |
| cậu | maternal uncle |
Kukish kʿu, Bodish kʿu-bo |
舅 jìu | Rated ****** |
Verbs & Actions
| Viet | Meaning | Sino-Tibetan Cognates | Chinese Etyma | Notes |
|---|---|---|---|---|
| cắt | cut |
Bodish tśa, Daic kăt |
割 gé | Rated ****** |
| dẫn | lead |
Daic tśuń, Burmish tsiń |
引 yǐn | Rated ***** |
| sỏ | play |
Kukish tśai, Luśei tśai |
耍 shuǎ | Rate **** |
| chọn | choose |
Daic khɔń, Kukish dzək |
選 xuǎn | Rated *** |
| lấy | take |
Kukish laʾ, Burmish lu |
拿 ná | Rated **** |
Observations
-
The author uses a star rating system (from * to ******) to indicate degrees of cognateness.
-
Cognates often span multiple Sino-Tibetan branches, cited only two instances herein, reinforcing the hypothesis of deeper substratal connections.
-
Many Vietnamese words show stronger alignment with Chinese etyma than with Mon-Khmer roots, say, "đầu; trốc, troốc" [ M 頭 tóu < MC dəw < OC *do: ], "thân, mình" [ M 身 shēn < MC ɕin < OC *qʰjin], "chân, cẳng" [ M 脛 (踁) jìng, héng, xìng < MC ɦɛjŋ < OC *ɡeːŋʔ, *ɡeːŋs ]21, etc.
Through this precise etymological annotation, the corpus substantiates several key claims:
- that Sinitic influence on Vietnamese was driven by widespread oral contact rather than limited to literary borrowing,
- that phonological features such as preinitials, rime alternations, and tonal reversals are best understood as outcomes of sustained bilingualism within a multi-register society, and
- that many so-called Sinitic etyma in Vietnamese are in fact deeply indigenized, functioning as core components within native semantic chains and compounding structures9.
The following table of randomly selected Sinitic disyllabic lexemes serves to substantiate the preceding claims.
|
Etymon |
VS/SV |
MC |
OC |
Gloss |
Etymological Notes |
|
贏錢 |
ăntiền |
jiajŋdziɛn |
*leŋʔslenʔ |
win money |
賭輸贏 dǔshūyíng (ănthuađủ) 'put a bet on' |
|
彼時 |
bấygiờ |
bǐdʑɨ |
*pralʔdjɯ |
by then |
Cf. 彼一時此一時. Bǐyīshícǐyīshí. (Bấygiờ khác, bâygiờ khác.) 'It's different now.' |
|
白鴿 |
bồcâu |
baɨjkkəp |
*bra:gkuːb |
white pigeon |
白鴿 成為 和平的 象徵. Báigē chéngwéi hépíngde xiàngzhēng. (Bồcâu tượngtrưng cho hòabình.) 'White dove became a symbol of peace.' |
|
邊界 |
bờcõi |
penkəɨj |
*mpeːnkre:ds |
frontier |
民族 為了 保衛 國家 的 邊界 而 戰鬥. Mínzú wèile bǎowèi guójiā de biānjiè ér zhàndòu. (Dântộc vìlà bảovệ bờcõi nướcnhà mà chiếnđấu.) 'The people fought to defend the nation's borders.' |
|
阻隔 |
cáchtrở |
tʂə̆kɯæk |
*ʔsraʔkreːɡ |
separate |
Note the reverse order of the disyllabic word. Ex.: 《詩經》: 邊關 阻隔 千里, 情懷 相連. 'Shījīng': Biānguān zǔgé qiānlǐ, qínghuái xiānglián. ('Thikinh': Quansan cáchtrở muônngàn, tuyxamàgần.) 'Book of Odes: Frontiers may stretch across vast distance, yet sentiment flows unbroken.' |
|
殘羹剩飯 |
cơmcặn-canhthừa |
dzankaiŋ-ʑiŋbwan |
*za:nskraŋ-ɦljɯŋsbonʔ |
hand-downs |
Idiomatic expression reserves all semantic and contour of sound. |
|
休想 |
chớhòng |
hɨusɨaŋ |
*qʰuslaŋʔ |
don't you ever think of |
你 想 騙 我? 休想! Nǐ xiǎng piàn wǒ? Xīxiǎng! (Anh muốn bịp tôi hả? Đừnghòng!) 'You want to fool me? Don't even think about it!' |
|
露底 |
đểlộ / |
luotei |
*ɡraːɡstiːlʔ |
let out a secret, unveil a secret. Also: (informal), expose one's underwear, |
All doublets are reserved here just to show how the Vietnamese adapt well all semantic variants of the disyllabic word. |
|
甭想 |
đừnghòng |
bjuawŋsɨaŋ |
[ non-existent ] |
don't even think about |
Semantically "休想 xīuxiǎng". This alignment underscores how
Vietnamese expressions—at their core—resonate more closely with
northern Sinitic colloquialism. Ex. '他國 要是 趁亂 佔領 邊界? 甭想! Tāguó yào chènluàn zhànlǐng biānjiè? Béngxiǎng! (Nướclạ muốn nhânlúc hỗnloạn chiếm biêngiới hả? Đừnghòng!) 'Foreign country wants to encroach our border during chaos? Not a chance!' |
|
孝道 |
hiếuthảo |
haɨwdaw |
*qʰruːsl'uːʔ |
filial piety |
Cf. 孝順. xiàoshùn (hiếuthảo). Both are dissyllabic derivatives expressing the same semantic core—filial piety—yet the lexical preference remains at the discretion of Vietnamese speakers. |
The robustness and granularity of such etymological annotation, especially in cross-referencing with polysyllabic compounds and derived forms, is critical for demonstrating the depth, not superficiality, of Sinitic integration in Vietnamese.5 6.
Register layering
One of the report’s core findings, anchored in corpus evidence, is the persistent and nuanced layering of Sinitic-Vietnamese vocabulary across formal, colloquial, and vernacular registers. This register stratification is visible in both the phonological and sociolinguistic domains, as revealed through doublets, tone reversals, and regional variants.
Principal findings include:
-
Grassroots bilingualism: Early Sinitic borrowings, particularly at the ESV level, entered
Vietnamese chiefly via grassroots oral bilingualism, rather than via an
elite “reading pronunciation” tradition. These words were nativized both
phonetically and pragmatically and permeated the vernacular register,
functioning as high-frequency, “native-feeling” items, e.g.,
“mũ” (帽 mào, ‘hat’), “giày” (鞋 xié, ‘shoe’), “vợ” (婦 fù ,‘wife’)10.
-
Literary-literacy layer: With the institutionalization of Literary Sinitic in administration
and scholarship (especially from the Tang period on), a formally
codified register emerged. This LSV register preserved systematic
MC-derived readings, crystallized through rhyme dictionaries like
Qieyun, and maintained consistent phonological and tonal patterns.
Essentially, these items were circulated within educated and written
registers, e.g., "phụ" (婦 fù, ‘woman’), "pháp" (法, ‘law’), "ký" (記 jì, ‘to record’)5 10.
-
Colloquialization and doublets: A key side effect of this history is the proliferation of
doublets-pairs of lexical items traceable to a single Chinese source but
split across temporal and register boundaries. For instance, an item
like ‘gươm’ (劍 jiàn, 'sword', ESV: grassroots, vernacular) stands against ‘kiếm’ (sword, LSV:
literary), ‘vợ’ (婦 fù, 'wife', OSV/vernacular) opposed to ‘phụ’ (LSV/literary), and ‘mùi’
(smell, OSV/native) to ‘vị’ (未 wèi, 'taste', LSV/formal)10.
-
Regional and social layering: The corpus also tracks how socio-regional dialects and strata align
with different Sinitic layers; for instance, items retained in
conservative Mường or North-Central Vietnamese suggest pre-literary
embedding. Some words retained in these regions show older, non-tonal,
or weakly tonal forms, in contrast to standardized MC-based forms
prominent in Hanoi or written Vietnamese8 5, e.g., Mường/North-Central form: ‘chài’ (net) vs. ‘võng’
(網 wǎng, 'net',
'hammock'), ‘chàm’ (藍 lán, 'indigo', also in vernacular Vietnamese) vs. 'lam' (藍
lán, 'blue'), 'chài' vs. 'lưới' (羅
luó, 'net'), etc.
-
Stratification in word formation: Compounds like ‘giáosư’ (professor,
教師), ‘thưviện’ (library,
書院), and ‘bácsĩ’ (doctor,
博士) show not only register
stratification but also semantic specialization within the high-register
layer, often diverging semantically from their Chinese or Japanese
analogues10.
- This evidence directly supports the chapter’s thesis that VS is not a single, uniform layer, but a stratified system reflecting centuries of bilingualism, diglossia, and social differentiation. It also illustrates the unique capacity of the Vietnamese lexicon to synthesize and innovate, even as it inherits imported morphemes6 10.
Comparative features of Sino-Vietic contact
The corpus methodology and comparative approach undertaken in Chapter 1 provide robust evidence for contact-induced convergence, divergence, and substratum influence between Chinese, Vietic, and other East/Southeast Asian languages. This analysis relies on meticulous cross-referencing and reconstruction, leveraging (in particular) the AMC (Annamese Middle Chinese) hypothesis-that a southern Chinese lect became nativized and absorbed into proto-Vietic during the first millenium CE.
Comparative findings across domains
-
Phonological systems: Vietnamese is one besides of very few non-Sinitic languages to
preserve the palatal/retroflex sibilant distinction of Early Middle
Chinese (a distinction lost in most modern Chinese dialects). In LSV,
reflexes of MC labiodentalization (e.g., SV v- < MC *v-) and grade
II palatalization (e.g., -y- medial) appear more regularly than in
corresponding Cantonese or Mandarin forms, but also preserve
conservative features not shared with these counterparts, due to their
southern, “Annamese” lect origins5 10.
-
Lexical inheritance & innovation: While hundreds of core words in Vietnamese are securely assigned to
Sinitic origin (either as ancient loans or as systematic SV readings), a
small proportion of the basic lexicon remains demonstrably Austroasiatic,
especially in numerals and agricultural terms11.
-
Semantic drift and chain innovation: Vietnamese, far more than
Japanese or Korean, systematically coins new compounds out of SV
morphemes, e.g., 'linhmục' (靈牧 língmù, ‘priest’), 'giảkimthuật' (冶金術
yějīnshù, ‘alchemy’),
establishing semantic innovations not paralleled in Chinese itself. These
reflect not just borrowing but creative recombination in a diglossic
area10.
-
Proof from conservative dialects: Numerous items regarded as “Sinitic” in Vietnamese find their
closest parallels in conservative Vietic languages-especially Rục,
Thavung, and Mường, which preserve presyllabic structure (e.g., Rục
'təkɨəm', Rục prefixal formations) and pre-tonal syllabification, thus
serving as a living laboratory for contact phonology12 1.
-
Typological convergence: Vietnamese displays the analytic, morphemic-syllabic,
non-inflectional profile typical of the Mainland Southeast Asia Sprachbund
(areal grouping), with a small number, though, but with the persistence of
polysyllabic and sesquisyllabic forms in non-standard dialects points to
contact-induced convergence and layered histories, not simplistic
replacement11 2.
|
Etymon (漢字) |
VS/SV Form |
MC Reconstruction |
OC Reconstruction |
Gloss |
Comparative Notes |
|
榮光
róngguāng
|
quangvinh, vangbóng, vẻvang / vinhquang |
ɦwiajŋkwaŋ | *ɢʷreŋkʷaːŋs | glorious | This is compelling case study that intersects multiple phonological trajectories from Early Middle Chinese (EMC) into Literary Sino-Vietnamese (LSV) and vernacular Vietnamese (VS). |
|
望文生義 wàngwén-shēngyì
|
trôngvăng-đặtnghĩa /
vọngvăn-sanhnghĩa
|
maŋsmiun-
ʂaɨjŋŋjiə̆
|
*maŋmɯn-
shleŋŋrals
|
folk etymology
|
This idiom is the strong case of Vietnamese reservation of
palatal-retroflex sibilant distinctions from
Early Middle Chinese (EMC), reflexes of
labiodentalization (e.g., SV v- <
MC v-) and of
grade II palatalization (e.g., medial
-y-)
|
|
木偶戲 mù'ǒuxì
|
kịchmúarối / mộcngẫuhí
|
məwkŋəwhjiə̆
|
*moːɡŋoːsqʰrals
|
puppetry
|
This is a compelling example of
lexical inheritance and innovation, especially
when viewed through the lens of Sinitic-Vietnamese (SV)
transmission and vernacular adaptation. Shift in phonological,
semantic, and structural innovation exemplifies
lexical innovation: the SV compound is
semantically reinterpreted and replaced by a native phrase that
better fits vernacular usage and cognitive framing.
|
|
四姊 sìjiě
|
chịtư,
chếtư vs.
chị 'bốn
/ tứtỷ
|
sɨtsiɪ
|
*hljidsʔsiʔ
|
sister four
|
'chế', 'chị', 'tư' are a Sinitic-Vietnamese
word, but 'bốn' is cognate with Mường 'pổn', Khmer 'buən'.
Cf. 'emba' (三妹 sānmēi. 'sister three.'), and note that the following words reserve all cultural context: 'chịcả' (大姐 dàjiě ', eldest sister) , 'anhcả' (大兄 dàxiōng, 'eldest brother'), 'anhhai' (二兄 èrqiōng, 'second older brother'), |
|
仔細 zǐxì
|
tỉmỉ /
tửtế
|
tsɨsɛj
|
*tsɨse:s
|
kind; meticulous
|
This is a case of innovation. Cf. 慈濟 cíjì (SV 'từtế', VS 'tửtế') = 'kindhearted'.
|
|
名聲 míngshēng
|
thanhdanh, tiếngtăm,
danhtiếng, vangtiếng, tiếngvang /
danhthanh
|
miajŋɕiajŋ
|
*meŋqʰjeŋ
|
fame, renown
|
The case of 名聲 → thanhdanh in
Literary Sino- Vietnamese (LSV) does not directly exemplify the
preservation of the palatal-retroflex sibilant distinction of Early Middle Chinese (EMC), but it does intersect with
other phonological conservatisms that LSV retains—particularly
in labiodentalization and medial palatalization. Cf. 望文生義 wàngwénshēngyì
(SV vọngvănsinhnghĩa, 'fork terminology').
|
|
善良
shànliáng |
hiềnlương/
lươngthiện
|
dʑianlɨaŋ | *ɡjenʔraŋ | morally good and kind | Multiple Vietnamese reflexes across Sino-Vietnamese, vernacular, and semantic analogs, each reflecting distinct etymological strata via Vietnamization @ 善 shàn (SV thiện) ~ 'hiền' 賢 xián (hiền), @ 良 liáng ~ 'lành'. |
Semantic chains and polysyllabic annotation
A critical dimension underpinning the chapter’s thesis is the presence of semantic chains-both diachronic (layered developments across time) and synchronic (coexisting derivatives and compounds) — most vividly observed in how Sinitic and vernacular roots combine, diverge, and radiate across registers.
Characteristic examples:
-
Direct semantic chaining: Lexical roots such as 劍
(gươm/kiếm, ‘sword’) recur across derived expressions and
technical compounds—for example, gươmđao (‘swords and
sabers’)—highlighting indigenous compounding practices distinct from donor
languages and evidencing deep assimilation of Sinitic material9.
-
Compounding of Sino-Vietnamese roots: The construction of
polysyllabic compounds—such as giáosư (professor,
教師), thuỷngư (‘aquatic
animal’, combining ‘water’ and ‘fish’), and nhạcsĩ (‘musician’,
樂士)—exemplifies both the
generative capacity and inventive reconfiguration of Sinitic morphemes
within Vietnamese morphosyntactic and semantic frameworks10.
-
Semantic divergence within lexical chains: Certain chains
reveal functional differentiation across registers and diachronic layers.
For example, vị (味,
‘taste’, formal) contrasts with mùi (vernacular, ‘smell’);
lạy (Old SV, ‘kowtow, bow’) diverges from lễ (SV,
‘ceremony’); and việc (Old SV, ‘work, event’) stands apart from
dịch (SV, ‘service, corvée’). These pairings illustrate how
semantic domains were restructured over time, with shifts in usage, tone,
and sociolinguistic context.9.
-
Macro-domain chaining: The corpus documents extensive
“macro-chains”—entire semantic fields such as metallurgy (vàng, bạc, sắt, đồng, gang, thép), agronomy, and kinship—that can be reconstructed diachronically. These
domains were progressively enriched or supplanted by Sinitic vocabulary as
waves of social, technological, and administrative innovation permeated
proto-Vietic speech communities, reshaping lexical landscapes at the
systemic level9.
- Polysyllabic annotation: Morpheme-level annotation practices—exemplified by compounds such as giáosư (giáo ‘teach’ + sư ‘master’) and nhiệtkế (nhiệt ‘heat’ + kế ‘device’, i.e., thermometer)—not only illuminate the internal compositional logic of Sinitic formations but also trace the pathways through which Chinese-derived roots were refunctionalized within indigenous Vietnamese semantic and syntactic frameworks.9.
|
Etymon |
VS/SV |
MC |
OC |
Gloss |
Comparative Notes |
|
刀劍 dāojiàn |
đaogươm, |
tawkjəm |
*ta:wkams |
bladed weapons |
Cf. |
|
眼鏡 yǎnjìng vs. 目鏡 mùjìng |
mắtkính, mắtkiếng,
kiếngmắt / |
|
eye glasses |
(Hakka, Southern and Puxian Min, Hainanese) Reflects reordering and substrate phonology; similar to Hainanese /matkɛng/ in morphemic order and phonotactics as opposed to VS "kiếngmắt" -- which reflects reordering and substrate phonology. "kiếng" is a Southern variant of "kính". Rục /matkɛng/ and Hainanese forms preserve older phonological features, offering “living laboratory” evidence for contact phonology. |
|
|
唱和 chànghé |
xướnghoạ, xướnghò,
khoanhò, hòkhoan/ |
tɕhaŋɦwa |
*tʰjaŋsɡoːls |
chant |
Cf. 你們 倆 一唱一和. Nǐmen liǎ yīchàngyīhè. (Haiđứa mày kẻxướngngườihò.) 'The two of you chant in the chorus collusively.' |
|
微算機 wēisuànjī |
máyvitính / vitoáncơ |
mjɪswankɨj |
*mɯjsloːnsʔkɨjkɯl |
micro computer |
微算機 is a Vietnamese compound of [微 (vi) – micro] + [ 算 (toán) – compute ] + 機 (cơ) – machine] = 'micro computer', a polysyllabic compound with SV and VS roots. OC and MC sounds are what to make up the pronunciation. |
|
魚汁 yúzhī |
sốtcá, nướccá, mắmcá, nướcmắm / ngưtrấp |
ŋjotɕip |
*ŋakjub |
fish sauce |
There is a noteworthy etymological case in English—the word ketchup (or catsup) (note the syllable cat)—which has a Sino-origin meaning of “fish sauce” (魚汁 yúzhī). The British originally borrowed it from the Fujian region in earlier times, where locals used fermented fish sauce for seasoning. However, when this item was brought back to England, the English language transformed it into “tomato sauce.” |
|
公雞 gōngjī vs 雄雞. xióngjī |
gàcồ / gàtrống côngkê / hùngkê |
kəwŋkiej |
*klo:ŋke: |
rooster |
They are strong cases of Semantic divergence within lexical chains with variants across numerous Sinitic lects. Cf. 母雞 mǔjī (gàmái, gàmẹ, 'hen'), not to mention ancient local forms such as 雞公 jīgōng雞母 jīmǔ (gàmẹ). |
|
龍飛鳳舞
|
rồngbay-phượngmúa / longphi-phụngvũ |
luawŋpwyj- |
*b·roŋpɯl- |
grand and flamboyant style |
Another case of direct semantic chaining. |
|
大志 |
chícả, chílớn / đạichí |
dajtɕɨ |
*da:dstjɯs |
high aims |
Another case of semantic divergence within lexical chains, cf, variant: 胸無大志. xiōngwúdàzhì. (ngườikhôngcóchí.) |
|
顆心 kēxīn |
tráitim, contim, quảtim / khoảtâm |
kʰwasim |
*kʰloːlʔslɯm |
heart |
This should be a case of macro-domain chaining within the semantic divergence within lexical chains. Cf. 果 guǒ (SV quả) 'fruit', hence 'trái' (clipping of VS 'quảtrái' 果實 guǒshí (SV quảthực) that gives rise to a classifier 顆 kē for 'con', 'quả', 'trái' (small round objects), Vietnamized as a morphemic-modifier syllable in 顆心 kēxīn. |
|
酒席
|
rượutiệc, tiệcrượu / tửutịch |
tsɨuziajk |
*ʔsluʔljaːɡ |
banquet |
Example: 富家 一 席酒 窮漢 半年 糧. Fùjiā yì xíjǐu, qiónghàn bànnián liáng. (Một bữa tiệcrượu của ngườigiàu bằng lương nửanăm kẻnghèo.) |
|
力氣 lìqì |
sứclực, hơisức / lựckhí |
lɨkhɨj |
*rɯɡkʰɯds |
strength |
Native parallels in Vietic, while VS 'sức' is, a
native Vietnamese Proto-Vietic k-rək, cognate of
Chinese 力, not merely a translation or borrowing. It reflects a
shared etymological ancestry and semantic continuity, while lực is
the Sino-Vietnamese literary reflex directly borrowed from Middle
Chinese. In this case, 氣 qì (SV khí, VS hơi,
'steam') is associated with 'sức'. Cf. 氣力 qìlì (SV
khílực, VS hơisức) 'power'.
|
|
本錢 běnqián |
tiềnvốn vốnliếng /
bảntiền bổntiền |
pwəndziɛn |
*pɯːnʔslenʔ |
root/funds |
This is a case of semantic divergence within lexical chains with phonological innovation on the second syllable, which is commonplace in Sinitic-Vietnamese. Ex. 她的 本錢 是 青春 美麗. Tāde běnqián shì qīngchūn měilì. (Vốnliếng của nàng là thanhxuân trẻđẹp.) 'Her capital is youth and beauty.' |
|
工役 gōngyì |
côngviệc / côngdịch |
ywek |
*wjek |
service / work |
a compound that reflects both Sino-Vietnamese inheritance and vernacular semantic fusion, ahybrid compound typical of Vietnamese lexical layering. For 'việc' (役 yì, SV dịch) semantic layering of 為 (OC *ɢʷal, 'to do') with k-extension, its etymology can be traced directly to the Classical Chinese compound 工役 (gōngyì), though. Cf. 公務 gōngwù (SV côngvụ, VS côngviệc) 'business' |
|
幸福 xìngfú |
phướclành, phúclành / hạnhphúc hạnhphước |
ɦəɨjŋpʰuw |
*ɡreːŋʔpɯɡ |
bliss |
幸福 (xìngfú) and Vietnamese expressions like hạnhphúc, phướclành, and phúclành are historically and semantically related, though they represent distinct etymological layers. In fact they are made up of compounding of Sinitic-Vietnamese on Sino-Vietnamese roots. Cf. 良 (liáng, SV lương, VS lành) |
|
妙法 miàofǎ |
pháp / phépmàu, phépmầu / diệupháp |
miawfǎ |
*mewspqab |
miracle |
phép and pháp are reflexes of 法, stratified by register: phép for colloquial/magical, pháp for formal/legal/religious. SV diệupháp, preserves Buddhist doctrinal nuance while in Sinitic-Vietnamese phépmàu, reflecting emotional and magical connotations. |
|
寡婦 guǎfù |
goáphụ, goábụa, bàgoá, bàgiá, ởgoá, ởvậy / quảphụ |
kʷɯabuw |
*kwra:ʔbɯʔ |
widowed woman |
A rich array of Vietnamese reflexes across Sino-Vietnamese, vernacular, and idiomatic registers, each encoding different layers of phonological inheritance, semantic drift, and cultural framing via Vietnamization @ 寡 guă ~ 'ở' 於 yú (vu), 'giá', @ 婦 fù ~ 'vợ', 'bụa', 'bà' 婆 pó (bà) |
Digital tools and annotation practices
The methodological innovations in Chapter 1 extend beyond theoretical linguistics, grounded in explicit annotation protocols and strategic digital tool integration. Key components include:
-
Corpus annotation standards: Utilization of structured templates enables hierarchical, coded annotation of etymological trees, register transitions, and semantic chain pathways. These formats are fully compatible with automated parsing and scalable lexicon development.
-
Spreadsheet and case-by-variable formatting: Items are compiled in tabular form with columns for VS/SV representation, stratum/register classification, Middle and Old Chinese reconstructions, glosses, and comparative notes—facilitating both statistical modeling and graphical visualization.
-
Comparative database linkage: The corpus integrates with external linguistic databases, including classical Chinese rime books, Kangxi Dictionary, Shafer's Sino-Tibetan research, numerous authors' works on Middle Chinese and Old Chinese reconstruction, Austroasiatic etymological inventories, and compiled Vietnamese word lists by international linguists, which enables robust synchronic and diachronic querying across language families.
-
Register and tonogenesis tagging: Annotation protocols explicitly mark register (colloquial vs. literary), lexical stratum (ESV/LSV/RSV), and tonal origin—capturing features such as coda type, pitch contour, and onset class—in alignment with contemporary corpus linguistic standards.
These practices yield a replicable and extensible annotation framework, essential for long-term comparative research, digital humanities applications, and empirical testing of the AMC and substratum hypotheses advanced in the chapter.
Linking corpus findings to the chapter’s thesis
The corpus analysis in Chapter 1 provides compelling, multi-axis support for its central thesis: that the Sinitic-Vietnamese lexical stratum is not a superficial overlay derived from distant literary Chinese, but a deeply embedded, generative system. This system emerged through sustained bilingual contact with regional Sinitic lects and a Yue-derived proto-Vietic substrate—forming a foundational layer in the Vietnamese lexicon that reflects centuries of linguistic convergence, adaptation, and innovation.
Key points of thesis substantiation:
Stratification over replacement: The coexistence of overlapping registers, recurrent doublets, and regionally differentiated variants reflects a layered lexical system in which native Vietic forms, Sinitic borrowings, and bilingual innovations actively interact. Rather than a linear process of substitution or overwriting, the Vietnamese lexicon reveals a dynamic stratification shaped by sustained contact, functional differentiation, and contextual adaptation.10.
Creative adaptation and innovation: The use of compounding, semantic chaining, and native coinage with Sinitic material—especially within technical, administrative, and scholarly registers—attests to the active linguistic agency of Vietnamese speakers. Rather than passively transmitting reading glosses from Chinese texts, they domesticated and recontextualized imported morphemes, embedding them within indigenous syntactic and semantic frameworks to generate novel, functional expressions10.
Comparative validation of the AMC model: Cross-linguistic data from Southwest Chinese lects, conservative Vietic varieties, and other Austroasiatic branches affirm the Red River Delta as a sustained contact zone. These comparisons substantiate the AMC model’s premise: that a localized Sinitic variety exerted deep phonological, lexical, and semantic influence on emerging Vietnamese, shaping its structure through prolonged and multidirectional interaction12.
Synthesis and Conclusion
At its core, the corpus and its stratified analysis offer extensive, multi-dimensional, and empirically grounded validation of the argument that the Sinitic layer in Vietnamese is not a passive residue of literary Chinese, but a dynamic, creative, and foundational substrate shaped by localized, heterogeneous contact.
The Chapter 1 corpus—meticulously curated and annotated—serves as a benchmark for analytic and comparative corpus linguistics in the context of East and Southeast Asian language contact. Through structural precision, multi-tiered etymological mapping, detailed register and phonological annotation, and the explicit tracing of semantic chains and polysyllabic innovation, it reveals the embeddedness and generative capacity of the Sinitic-Vietnamese stratum within the broader Vietnamese lexicon.
Most importantly, the corpus affirms the central thesis: that Sino-Vietnamese (SV) is not merely a superimposed layer of foreign vocabulary, but a deeply indigenized stratum—lacquered into the linguistic fabric of Vietnamese through centuries of adaptation, semantic reconfiguration, and creative agency. This insight not only reframes the historical trajectory of Vietnamese but also establishes a new paradigm for approaching language contact and lexical stratification in global linguistic research.
References
- Vietic languages .Wikipedia
- Mark Alves. Early Sino-Vietnamese Lexical Data and the Relative Chronology of Tonogenesis In Chinese And Vietnamese
- Mark Alves. 2009. Vietnamese Vocabulary. WOLD
- Mark Alves. Identifying Early Sino-Vietnamese Vocabulary via Linguistic, Historical Archaeological and Ethnological Data
- Mark Alves. Notes on Sino-Vietnamese Historical Phonology
- John Phan. Lacquered Words: The Evolution Of Vietnamese Under Sinitic Influences From The 1st Century BCE Through The 17Th Century CE
- John Phan. Lacquered Words: The Evolution Of Vietnamese Under Sinitic...
- The Baxter-Sagart reconstruction of Old Chinese
- Mark Alves. From Vietic Presyllables To Vietnamese Simplex Onsets
- Sino-Vietnamese vocabulary.Wikipedia
- The Etymologies of Vietnamese Numeral Terms and Implications of ....
- Historical Ethnolinguistic Notes on Proto-Austroasiatic and Proto ....
- Template: etymon.Wikipedia
-
Stefan Th. Gries and Magali Paquot. Chapter 26 Writing up a Corpus-Linguistic Paper
-
漢典 zdic.net
-
Nguyễn, Ngọc San. 1993. Tìm hiểu về Tiếng Việt Lịch sử. TP HCM: NXB Giáo dục.
-
Shafer, Robert. 1966-1974. Introduction to Sino-Tibetan (4 volumes). Wiesbaden: Otto Harrassowitz.
-
Thomas, David D. 1966. “Mon-Khmer Subgroupings in Vietnam,” in Norman Zide (ed.) Studies in Comparative Austroasiatic Linguistics. The Hague: Mouton.
-
Luce, Gordon Hannington. 1965. "Danaw, a Dying Austroasiatic Language" in “Historical Linguistics” Indo-Pacific Linguistic Studies