Sunday, August 24, 2025

Analytical Introduction To Introduction To Sinitic-Vietnamese

On the Foundational Sinitic Lexical Stratum
In Vietnamese

 

dchph in collaboration with Copilot


Introduction

The foundational Sinitic-Vietnamese (VS) lexical stratum of the Vietnamese language, as examined in Chapter 1 and outlined in its Executive Summary, is not merely a passive accumulation of Chinese loanwords. Rather, it constitutes a dynamic, internally layered, and etymologically rich stratum shaped by sustained and multifaceted contact between northern Chinese lects and a Yue-derived proto-Vietic substrate.

This report presents a comprehensive analytical review of the Chapter 1 corpus, encompassing all cited etymons, semantic chains, and polysyllabic annotations. Its purpose is to demonstrate how the corpus substantiates the thesis that Sinitic-Vietnamese forms a core indigenous layer of the language, not a superficial literary veneer. 

The structure of the report follows the established research directives: corpus architecture, etymological analysis, register stratification, comparative linguistic features, and semantic chain mapping. Each section unpacks the methodologies and findings of Chapter 1 with precision and clarity.y.

Corpus structure

Chapter 1’s investigation is anchored in a rigorously curated corpus designed to illuminate the stratified evolution and contact-induced dynamics of Vietnamese vocabulary. Far more than a static collection of lexical entries, the corpus functions as a multidimensional analytical framework—archiving, contextualizing, and stratifying vocabulary by origin, phonological development, and sociolinguistic integration.

Comprising roughly 800 to 900 lexical items, the corpus is intentionally selective rather than exhaustive. Its entries are drawn from a diverse array of sources: early written records such as ChữNôm texts, comparative data from both modern and archaic Vietic lects, contemporary Vietnamese usage, and systematic cross-referencing with Old and Middle Chinese reconstructions. Its internal architecture reflects both genetic lineage and areal convergence, with structural segmentation guided by principles of historical linguistics and contact typology.

The analytic scaffolding of the corpus includes:

  • Tripartite Sinitic stratification: Early Sino-Vietnamese (ESV), Late Sino-Vietnamese (LSV), and Recent Sino-Vietnamese (RSV) are aligned with recognizable sociohistorical periods: Han/Jin, Tang/Song, and Ming/post-Ming eras, respectively1 2. 

  • Etymon-centered annotation: Each entry is subjected to rigorous multi-level annotation, specifying VS/SV forms, Middle Chinese (MC) and Old Chinese (OC) reconstructions, meanings, and critical comparative notes. 

  • Polysyllabic annotation protocols: Compound and reduplicative structures are recorded with morpheme-by-morpheme glossing, accommodating both native analytic patterns and contact-induced forms3. 

  • Layer-tagged lexical indexing: Items are explicitly tagged for stratum, register (literary, colloquial, vernacular), and-when sufficient data allow-socio-regional provenance and alternative readings. 

  • Phonological and semantic extension: Sinitic-Vietnamese words derived from Sinitic compounds will produce more meanings than the original ones via sound changes conveying their associated semantic roots.

This corpus serves as both a philological instrument and a digital-ready platform for algorithmic annotation, enabling granular etymological tracing and macro-level register mapping.

The corpus is supported by a series of structured tables, each serving a critical analytical function:

  • Cross-referenced comparative reconstructions of Old Chinese (OC)16 vs. alongside Mark J. Alves’s proto-Sino-Vietnamese onset inventory9, 
  • Core indices for identifying lexical strata based on rime, onset, and tonal behavior,
  • Cross-referenced tables linking Early Sino-Vietnamese (ESV) and Late Sino-Vietnamese (LSV) candidates, phonetic indices, and register-based doublets4 5. 

Additionally, selected entries are annotated with comparative data from conservative Vietic languages such as "Rục" and "Mường", allowing for stratified anchoring across historical layers. This layered design results in a digitizable corpus well-suited for both philological analysis and algorithmic annotation workflows.

In summary, the corpus is both structurally rigorous and analytically versatile, designed to support micro-level etymological tracing as well as macro-level register mapping. It provides the evidentiary foundation necessary to distinguish the internally stratified, contact-generated Sinitic-Vietnamese stratum from later, more superficial borrowings4 6.

Etymological analysis

At the heart of Chapter 1 lies the etymological mapping of lexical entries, which underpins the broader argument concerning the depth, modality, and register of Sinitic influence in Vietnamese. The analysis organizes the corpus into reliable strata by applying phonological, semantic, and historical filters, referencing both Middle Chinese and Old Chinese reconstructions alongside comparative Vietic data.


Key criteria and techniques:

  • Phonological correspondences: The diagnostic use of segmental correspondences, especially in initial (onset) and final (rime and coda) positions, allows differentiation between early and late borrowings. For example, words with Vietnamese voiced fricative onsets (v-, d-, gi-, g-) correlating with complex OC clusters are proven markers of deeper, older integration, frequently paralleling conservative forms in non-Vietnamese Vietic lects (Rục, Mường)2.

  • Tonal evolution: Leveraging Haudricourt’s model, the chapter scrutinizes how OC final *-s and *-ʔ gave rise to the three major Vietnamese tonal sets (ngang/huyền, hỏi/ngã, sắc/nặng) and matches tone shifts with historical borrowing periods2.

  • Compound and reduplication annotation: Etymons that recur as components of compounds or reduplicants are tagged for polysyllabic annotation, revealing not only contact-induced formations but also morphosemantic layering3.

  • Comparative etymology: Each etymon is cross-referenced for parallel forms in MC, OC15, Sino-Tibetan18, and other Vietic or Austroasiatic languages, ensuring that supposed borrowings are not, in fact, retentions or autochthonous innovations16 .

Below is a representative extract from the compiled etymon table for selected cornerstone entries.


Table 1: Select register-layered monosyllabic Sinitic entries

Etymon
(漢字)

VS/SV
Form

MC Reconstruction

OC Reconstruction

Gloss

Comparative Notes

劍 jiàn

(thanh)
gươm / gươm /
kiếm


kjəm

**kə.ms >
*kams

sword

Lenition [ɣ-], Rục təkɨəm → sesquisyllabic Vietic parallel

鏡 jìng

s-kương / gương / kiếng / 
kính

kiajŋ

**sk’raŋs >
*kraŋs

mirror

Prefix **s- reflects Sino-Tibetan instrumentality; doublet preserved

唱 chàng

ʔ-ɕướng / xướng / khoan

tɕʰiɐŋ

**d̥raps >
*tʰjaŋs

to chant

Dummy prefix **ʔ- with Vietic comparanda

公 gōng

cồ /
ông /
trống /
công

kəwŋ 

***qˤoŋ >
*klo:ŋ

public;
grandpa;
male;
duke

Initial k- robust in SV; traceable in Mường, etc.

Cf. Old Khmer khloñ, Proto-Tai *luŋᴬ 

魚 yú

ngá /
ngư /

ŋɨə̆

*ŋa

fish

Nasal onset across SV and MC > glottal ʔ- > k-; Vietic “ngá”; cf. MinNan 魚汁 yúzhī 'catsup'

海  hǎi

khơi /
bể /
biển / 
hải

həj

**hmɯːs >
*hmlɯːʔ

sea

Doublets mapped with LMC onset traits; Cf. 'mệ', 'mẹ'  母  mǔ (SV mẫu) , 每 (OC *mɯːʔ), 晦 (OC *hmɯːs, “dark”); in numerous Zhou texts 海 = 晦 huì (OC *hmɯːs)

龍 lóng, lǒng, máng

(thuồng-)
luồng /
long /
rồng

luawŋ

**r-loŋ >
*b-loŋ

dragon;
aquatic
monster
serpent;

Rime /loŋ/ shares features with ESV and LSV; Cf. VS 'thuồngluồng',  Khmer រោង (roong, “year of the dragon”), Thai มะโรง (má-roong, “dragon; year of the dragon”)

大 dà

đại /
thái /
to /
cả

daj, da

**lats >

*da:ds

big;
full;
eldest

Appears in all strata; LMC tonal distinctions observed and doublets preserved; Wang (1982) also lists 誕 OC *l'aːnʔ as cognateCf. VS 'lớn' (big)

心 xīn

tâm /
tim /
lòng /
õi

sim

**slɯm >

*sə.m

heart,
soul;
mind; core

Partial ESV preservation, comparable with early loans/transfers

酒 jǐu

tửu /
rượu

tsɨu

**ʔsluʔ >
*tsuʔ

wine

Texture identifies as RSV candidate (late borrowing)


These and other entries are annotated with detailed Middle Chinese and Old Chinese phonological data drawn from the work of renowned scholars, including Bernhard Karlgren, Wang Li 王力, Li Rong 李榮, Zhao Rong-Fen 邵榮芬, Zhengzhang Shangfang 鄭張尚芳, Pan Wuyun 潘悟雲, Edwin G. Pulleyblank16, and the Baxter-Sagart reconstruction system7 5, and importantly Sino-Tibetan etymology colossus work by Shafer17a goldmine of comparative etymology. Each etymon is cross-verified with reflexes found in conservative Vietic dialects such as "Rục" and "Mường" 8as well as many Khmer lects, ensuring stratified accuracy and comparative depth.19 20.


Selected standout examples where Vietnamese words show strong cognate relationships with Sino-standout examples where Vietnamese words show strong cognate relationships with Sino-Tibetan roots, especially those from Chinese, Bodic, Burmic, and Daic branches17. The etymology runs cross virtually all categories of human languages such as Body Parts & Physicality , Verbs & Actions, Kinship & Social Roles, Food & Agriculture!


Table 2: Select extract of full ST etymons with Kukish and Bodish as representatives 

Body Parts & Physicality


Viet              Meaning    Sino-Tibetan Cognates     Chinese Etyma Notes
lưỡi         tongue     Kukish lei,
    Bodish ltśe
        shé,
        lěi
    脷 more plausible for VS;
    Cf. Cantonese 'lei6'
chân         leg/foot     Bodish rkań,
    Kukish kʿoń
        jiǎo,
        
    Also linked to
    脛 jìng (VS cẳng)
bụng         belly     Kukish puk,
    Burmish puik
             Rated *****
mắt         eye     Bodish mig,
    Kukish mik
             Rated *****
mũi         nose     Bodish mtśʿul,
    Kukish tśʿul
             Rated **

Food & Agriculture


Viet             Meaning  Sino-Tibetan Cognates          Chinese Etyma Notes
bánh         cake         Daic pɛń,
        Kukish piŋ
        bǐng     Rated ******
muối         salt         Kukish tśi,
        Burmish śo-ra
        yán,
        xiāo
    Complex etymology
gừng         ginger         Daic khiŋ,
        Kukish khiń
        jiāng     Rated ******
        fish         Burmish ńa,
        Kukish kʿai
             Rated ****
cơm         rice         Burmish tśa,
        Lolo tsa-
        shàn     Also linked to 飯 fàn

Kinship & Social Roles

Viet             Meaning       Sino-Tibetan Cognates         Chinese Etyma     Notes
grandma     Kukish pi,
    Bodish pʿyi-mo
         cited but less plausible
bố father     Kukish pu,
    Burmish bʿui
         Strong phonetic match
chị elder sister     Kukish tśar,
    Bodish ʾa-tśʿe
         Also compared with 姐 jiě
cháu nephew;
niece
    Kukish tʿu,
    Burmish tu    
        zhí Semantic alignment with SV 'tỉ'
cậu maternal uncle     Kukish kʿu,
    Bodish kʿu-bo
        jìu Rated ******

Verbs & Actions

Viet         Meaning         Sino-Tibetan Cognates     Chinese Etyma Notes
cắt     cut     Bodish tśa,
    Daic kăt
Rated ******
dẫn     lead     Daic tśuń,
    Burmish tsiń
yǐn Rated *****
sỏ     play     Kukish tśai,
    Luśei tśai
shuǎ Rate ****
chọn     choose     Daic khɔń,
    Kukish dzək
xuǎn Rated ***
lấy     take     Kukish laʾ,
    Burmish lu
Rated ****

Observations

  • The author uses a star rating system (from * to ******) to indicate degrees of cognateness.

  • Cognates often span multiple Sino-Tibetan branches, cited only two instances herein, reinforcing the hypothesis of deeper substratal connections.

  • Many Vietnamese words show stronger alignment with Chinese etyma than with Mon-Khmer roots, say, "đầu; trốc, troốc" [ M 頭 tóu < MC dəw < OC *do: ], "thân, mình" [ M 身 shēn < MC ɕin < OC *qʰjin], "chân, cẳng" [ M 脛 (踁) jìng, héng, xìng < MC ɦɛjŋ < OC *ɡeːŋʔ, *ɡeːŋs  ]21etc.

Through this precise etymological annotation, the corpus substantiates several key claims: 

  1. that Sinitic influence on Vietnamese was driven by widespread oral contact rather than limited to literary borrowing, 
  2. that phonological features such as preinitials, rime alternations, and tonal reversals are best understood as outcomes of sustained bilingualism within a multi-register society, and 
  3. that many so-called Sinitic etyma in Vietnamese are in fact deeply indigenized, functioning as core components within native semantic chains and compounding structures9.

The following table of randomly selected Sinitic disyllabic lexemes serves to substantiate the preceding claims.


Table 3: Folk lexical extensions by core components


Etymon
(漢字)

VS/SV
Form

MC
Reconstruction

OC
Reconstruction

Gloss

Etymological Notes

贏錢
yíngqián

ăntiền


jiajŋdziɛn

*leŋʔslenʔ

 win money

賭輸贏 dǔshūyíng (ănthuađủ) 'put a bet on'

彼時
bǐshí

bấygiờ

bǐdʑɨ

*pralʔdjɯ 

by then

Cf. 彼一時此一時. Bǐyīshícǐyīshí. (Bấygiờ khác, bâygiờ khác.) 'It's different now.'

白鴿
báigē

bồcâu

baɨjkkəp

*bra:gkuːb 

white pigeon

白鴿 成為 和平的 象徵. Báigē chéngwéi hépíngde xiàngzhēng. (Bồcâu tượngtrưng cho hòabình.) 'White dove became a symbol of peace.'

邊界
biānjiè

bờcõi 

penkəɨj

*mpeːnkre:ds

frontier

民族 為了 保衛 國家 的 邊界 而 戰鬥. Mínzú wèile bǎowèi guójiā de biānjiè ér zhàndòu. (Dântộc vìlà bảovệ bờcõi nướcnhà mà chiếnđấu.) 'The people fought to defend the nation's borders.'

阻隔
zǔgé

cáchtrở

tʂə̆kɯæk 

*ʔsraʔkreːɡ  

separate

Note the reverse order of the disyllabic word. Ex.: 《詩經》: 邊關 阻隔 千里, 情懷 相連. 'Shījīng': Biānguān zǔgé qiānlǐ, qínghuái xiānglián. ('Thikinh': Quansan cáchtrở muônngàn, tuyxamàgần.) 'Book of Odes: Frontiers may stretch across vast distance, yet sentiment flows unbroken.'

殘羹剩飯
cángēng-
shèngfàn

cơmcặn-canhthừa

dzankaiŋ-ʑiŋbwan

*za:nskraŋ-ɦljɯŋsbonʔ  

hand-downs

Idiomatic expression reserves all semantic and contour of sound.

休想
xīuxiăng

chớhòng

hɨusɨaŋ

*qʰuslaŋʔ 

don't you ever think of

你 想 騙 我? 休想! Nǐ xiǎng piàn wǒ? Xīxiǎng! (Anh muốn bịp tôi hả? Đừnghòng!) 'You want to fool me? Don't even think about it!'

露底
lòudǐ

đểlộ /
lộtẩy / 
lộxì

luotei

*ɡraːɡstiːlʔ

let out a secret, unveil a secret. Also: (informal), expose one's underwear,  

All doublets are reserved here just to show how the Vietnamese adapt well all semantic variants of the disyllabic word.

甭想
béngxiăng

đừnghòng

bjuawŋsɨaŋ 

[ non-existent ]

don't even think about

Semantically "休想 xīuxiǎng". This alignment underscores how Vietnamese expressions—at their core—resonate more closely with northern Sinitic colloquialism.
Ex. '他國 要是 趁亂 佔領 邊界? 甭想! Tāguó yào chènluàn zhànlǐng biānjiè? Béngxiǎng! (Nướclạ muốn nhânlúc hỗnloạn chiếm biêngiới hả? Đừnghòng!) 'Foreign country wants to encroach our border during chaos? Not a chance!'

孝道
xiàodào

hiếuthảo

haɨwdaw 

*qʰruːsl'uːʔ

 filial piety 

Cf. 孝順. xiàoshùn (hiếuthảo). Both are dissyllabic derivatives expressing the same semantic core—filial piety—yet the lexical preference remains at the discretion of Vietnamese speakers.


The robustness and granularity of such etymological annotation, especially in cross-referencing with polysyllabic compounds and derived forms, is critical for demonstrating the depth, not superficiality, of Sinitic integration in Vietnamese.5 6.

Register layering

One of the report’s core findings, anchored in corpus evidence, is the persistent and nuanced layering of Sinitic-Vietnamese vocabulary across formal, colloquial, and vernacular registers. This register stratification is visible in both the phonological and sociolinguistic domains, as revealed through doublets, tone reversals, and regional variants.


Principal findings include:

  • Grassroots bilingualism: Early Sinitic borrowings, particularly at the ESV level, entered Vietnamese chiefly via grassroots oral bilingualism, rather than via an elite “reading pronunciation” tradition. These words were nativized both phonetically and pragmatically and permeated the vernacular register, functioning as high-frequency, “native-feeling” items, e.g., “mũ” (帽 mào, ‘hat’), “giày” (鞋 xié, ‘shoe’), “vợ” (婦 fù ,‘wife’)10. 

  • Literary-literacy layer: With the institutionalization of Literary Sinitic in administration and scholarship (especially from the Tang period on), a formally codified register emerged. This LSV register preserved systematic MC-derived readings, crystallized through rhyme dictionaries like Qieyun, and maintained consistent phonological and tonal patterns. Essentially, these items were circulated within educated and written registers, e.g., "phụ" (婦 fù, ‘woman’), "pháp" (法, ‘law’), "ký" (記 jì, ‘to record’)5 10. 

  • Colloquialization and doublets: A key side effect of this history is the proliferation of doublets-pairs of lexical items traceable to a single Chinese source but split across temporal and register boundaries. For instance, an item like ‘gươm’ (劍 jiàn, 'sword', ESV: grassroots, vernacular) stands against ‘kiếm’ (sword, LSV: literary), ‘vợ’ (婦 fù, 'wife', OSV/vernacular) opposed to ‘phụ’ (LSV/literary), and ‘mùi’ (smell, OSV/native) to ‘vị’ (未 wèi, 'taste', LSV/formal)10. 

  • Regional and social layering: The corpus also tracks how socio-regional dialects and strata align with different Sinitic layers; for instance, items retained in conservative Mường or North-Central Vietnamese suggest pre-literary embedding. Some words retained in these regions show older, non-tonal, or weakly tonal forms, in contrast to standardized MC-based forms prominent in Hanoi or written Vietnamese8 5, e.g., Mường/North-Central form: ‘chài’ (net) vs. ‘võng’ ( wǎng, 'net', 'hammock'), ‘chàm’ ( lán, 'indigo', also in vernacular Vietnamese) vs. 'lam' ( lán, 'blue'), 'chài' vs. 'lưới' ( luó, 'net'), etc.

  • Stratification in word formation: Compounds like ‘giáosư’ (professor, 教師), ‘thưviện’ (library, 書院), and ‘bácsĩ’ (doctor, 博士) show not only register stratification but also semantic specialization within the high-register layer, often diverging semantically from their Chinese or Japanese analogues10.

  • This evidence directly supports the chapter’s thesis that VS is not a single, uniform layer, but a stratified system reflecting centuries of bilingualism, diglossia, and social differentiation. It also illustrates the unique capacity of the Vietnamese lexicon to synthesize and innovate, even as it inherits imported morphemes6 10.

Comparative features of Sino-Vietic contact

The corpus methodology and comparative approach undertaken in Chapter 1 provide robust evidence for contact-induced convergence, divergence, and substratum influence between Chinese, Vietic, and other East/Southeast Asian languages. This analysis relies on meticulous cross-referencing and reconstruction, leveraging (in particular) the AMC (Annamese Middle Chinese) hypothesis-that a southern Chinese lect became nativized and absorbed into proto-Vietic during the first millenium CE.

Comparative findings across domains

  • Phonological systems: Vietnamese is one besides of very few non-Sinitic languages to preserve the palatal/retroflex sibilant distinction of Early Middle Chinese (a distinction lost in most modern Chinese dialects). In LSV, reflexes of MC labiodentalization (e.g., SV v- < MC *v-) and grade II palatalization (e.g., -y- medial) appear more regularly than in corresponding Cantonese or Mandarin forms, but also preserve conservative features not shared with these counterparts, due to their southern, “Annamese” lect origins5 10. 

  • Lexical inheritance & innovation: While hundreds of core words in Vietnamese are securely assigned to Sinitic origin (either as ancient loans or as systematic SV readings), a small proportion of the basic lexicon remains demonstrably Austroasiatic, especially in numerals and agricultural terms11.

  • Semantic drift and chain innovation: Vietnamese, far more than Japanese or Korean, systematically coins new compounds out of SV morphemes, e.g., 'linhmục' (靈牧 língmù, ‘priest’), 'giảkimthuật' (冶金術 yějīnshù, ‘alchemy’), establishing semantic innovations not paralleled in Chinese itself. These reflect not just borrowing but creative recombination in a diglossic area10. 

  • Proof from conservative dialects: Numerous items regarded as “Sinitic” in Vietnamese find their closest parallels in conservative Vietic languages-especially Rục, Thavung, and Mường, which preserve presyllabic structure (e.g., Rục 'təkɨəm', Rục prefixal formations) and pre-tonal syllabification, thus serving as a living laboratory for contact phonology12 1. 

  • Typological convergence: Vietnamese displays the analytic, morphemic-syllabic, non-inflectional profile typical of the Mainland Southeast Asia Sprachbund (areal grouping), with a small number, though, but with the persistence of polysyllabic and sesquisyllabic forms in non-standard dialects points to contact-induced convergence and layered histories, not simplistic replacement11 2. 

Table 4: Comparative etymon examples

Etymon

(漢字)

VS/SV

Form

MC

Reconstruction

OC

Reconstruction

Gloss

Comparative Notes

榮光
róngguāng
quangvinh,
vangbóng,
vẻvang /
vinhquang
ɦwiajŋkwaŋ *ɢʷreŋkʷaːŋs glorious This is compelling case study that intersects multiple phonological trajectories from Early Middle Chinese (EMC) into Literary Sino-Vietnamese (LSV) and vernacular Vietnamese (VS).
望文生義  wàngwén-shēngyì
trôngvăng-đặtnghĩa /
vọngvăn-sanhnghĩa
maŋsmiun-
ʂaɨjŋŋjiə̆ 
*maŋmɯn-
shleŋŋrals 
folk etymology
This idiom is the strong case of Vietnamese reservation of palatal-retroflex sibilant distinctions from Early Middle Chinese (EMC), reflexes of labiodentalization (e.g., SV v- < MC v-) and of grade II palatalization (e.g., medial -y-)
木偶戲 mù'ǒuxì
kịchmúarối / mộcngẫuhí
məwkŋəwhjiə̆
*moːɡŋoːsqʰrals
puppetry
This is a compelling example of lexical inheritance and innovation, especially when viewed through the lens of Sinitic-Vietnamese (SV) transmission and vernacular adaptation. Shift in phonological, semantic, and structural innovation exemplifies lexical innovation: the SV compound is semantically reinterpreted and replaced by a native phrase that better fits vernacular usage and cognitive framing.
四姊  sìjiě 
chịtư,
chếtư
vs.
chị 'bốn
tứtỷ
tsiɪ
*hljidsʔsiʔ 
sister four
'chế', 'chị', 'tư' are a Sinitic-Vietnamese word, but 'bốn' is cognate with Mường 'pổn', Khmer 'buən'.
Cf.  'emba' (三妹 sānmēi. 'sister three.'), and note that the following words reserve all cultural context: 'chịcả' (大姐 dàjiě ', eldest sister) , 'anhcả' (大兄  dàxiōng, 'eldest brother'), 'anhhai' (二兄 èrqiōng, 'second older brother'),
仔細  zǐxì
tỉmỉ /
tửtế
tsɨsɛj 
*tsɨse:s
kind; meticulous
This is a case of innovation. Cf. 慈濟 cíjì (SV 'từtế', VS 'tửtế') = 'kindhearted'.
名聲 míngshēng
thanhdanh,  tiếngtăm,
danhtiếng,
vangtiếng,
tiếngvang /
danhthanh
miajŋɕiajŋ
*meŋqʰjeŋ
fame, renown
The case of 名聲 → thanhdanh in Literary Sino- Vietnamese (LSV) does not directly exemplify the preservation of the palatal-retroflex sibilant distinction of Early Middle Chinese (EMC), but it does intersect with other phonological conservatisms that LSV retains—particularly in labiodentalization and medial palatalization. Cf. 望文生義 wàngwénshēngyì (SV vọngvănsinhnghĩa, 'fork terminology').
善良
shànliáng
hiềnlương/
lươngthiện
dʑianlɨaŋ *ɡjenʔraŋ morally good and kind Multiple Vietnamese reflexes across Sino-Vietnamese, vernacular, and semantic analogs, each reflecting distinct etymological strata via Vietnamization @ 善 shàn (SV thiện) ~ 'hiền' 賢 xián (hiền), @ 良 liáng ~ 'lành'.


These comparative patterns both validate and complicate the notion of Sinitic-Vietnamese as a distinct stratum, showing not only what was borrowed or nativized, but how Sinitic features were remixed with enduring Yue/Vietic structures and semantic fields.

Semantic chains and polysyllabic annotation

A critical dimension underpinning the chapter’s thesis is the presence of semantic chains-both diachronic (layered developments across time) and synchronic (coexisting derivatives and compounds) — most vividly observed in how Sinitic and vernacular roots combine, diverge, and radiate across registers.


Characteristic examples:

  • Direct semantic chaining: Lexical roots such as  (gươm/kiếm, ‘sword’) recur across derived expressions and technical compounds—for example, gươmđao (‘swords and sabers’)—highlighting indigenous compounding practices distinct from donor languages and evidencing deep assimilation of Sinitic material9. 

  • Compounding of Sino-Vietnamese roots: The construction of polysyllabic compounds—such as giáosư (professor, 教師), thuỷngư (‘aquatic animal’, combining ‘water’ and ‘fish’), and nhạcsĩ (‘musician’, 樂士)—exemplifies both the generative capacity and inventive reconfiguration of Sinitic morphemes within Vietnamese morphosyntactic and semantic frameworks10. 

  • Semantic divergence within lexical chains: Certain chains reveal functional differentiation across registers and diachronic layers. For example, vị (, ‘taste’, formal) contrasts with mùi (vernacular, ‘smell’); lạy (Old SV, ‘kowtow, bow’) diverges from lễ (SV, ‘ceremony’); and việc (Old SV, ‘work, event’) stands apart from dịch (SV, ‘service, corvée’). These pairings illustrate how semantic domains were restructured over time, with shifts in usage, tone, and sociolinguistic context.9. 

  • Macro-domain chaining: The corpus documents extensive “macro-chains”—entire semantic fields such as metallurgy (vàng, bạc, sắt, đồng, gang, thép), agronomy, and kinship—that can be reconstructed diachronically. These domains were progressively enriched or supplanted by Sinitic vocabulary as waves of social, technological, and administrative innovation permeated proto-Vietic speech communities, reshaping lexical landscapes at the systemic level9. 

  • Polysyllabic annotation: Morpheme-level annotation practices—exemplified by compounds such as giáosư (giáo ‘teach’ + sư ‘master’) and nhiệtkế (nhiệt ‘heat’ + kế ‘device’, i.e., thermometer)—not only illuminate the internal compositional logic of Sinitic formations but also trace the pathways through which Chinese-derived roots were refunctionalized within indigenous Vietnamese semantic and syntactic frameworks.9. 

    The ability to identify, annotate, and analyze these chains-their branching, looping, and sometimes discontinuous nature, demonstrates that the Sinitic-Vietnamese layer is not just an inert deposit but a creative substratum that Vietnamese speakers exploited for both lexical innovation and semantic extension.

    Below is the requested table, designed as a synthesis of the chapter’s approach and data. Each entry represents compounds formed from either a major Sinitic root or a core comparative form in Table 1 above, analyzed in-depth in Chapter 1.
    Table 5: Etyma by core corpus extraction

    Etymon
    (漢字)

    VS/SV
    Form

    MC
    Reconstruction

    OC
    Reconstruction

    Gloss

    Comparative Notes

    刀劍 dāojiàn

    đaogươm,
    gươmđao,
    gươmdao, daokiếm / kiếmđao, đaokiếm

    tawkjəm 

    *ta:wkams

    bladed weapons

    Cf. 
    劍刀 jiàndāo as attested in Chinese classics, e.g., 《三國演義》第九一回:「或為 刀劍 所 傷,魄 歸 長夜,生 則 有 勇,死 則 成名。」 'Sānguó Yǎnyì-- Dìjǐushíyì Huí': 'Huòwéi dāojiàn suǒ shāng, pò guī chángyè, shēng zé yǒuyǒng, sǐ zé chéngmíng. ('Tamquốc Diễnnghĩa' -- Hồi Thứ Chínmốt: Hoặcbị gươmđao chém thương, làm ma vấtvưởng; sống thì anhdũng, chết được thànhdanh.)

    眼鏡  yǎnjìng vs. 目鏡 mùjìng 

    mắtkính, mắtkiếng,
    mắtkính,
    kínhmắt,

    kiếngmắt /
    mụckính, nhãnkính


    eye glasses

    (Hakka, Southern and Puxian Min, Hainanese) Reflects reordering and substrate phonology; similar to Hainanese /matkɛng/ in morphemic order and phonotactics as opposed to VS "kiếngmắt" -- which reflects reordering and substrate phonology. "kiếng" is a Southern variant of "kính"Rục /matkɛng/ and Hainanese forms preserve older phonological features, offering “living laboratory” evidence for contact phonology.

    唱和 chànghé

    xướnghoạ,

    xướnghò,

    khoanhò, hòkhoan/
    xướnghoà

    tɕhaŋɦwa

    *tʰjaŋsɡoːls

    chant

    Cf. 你們 倆 一唱一和. Nǐmen liǎ yīchàngyīhè. (Haiđứa mày kẻxướngngườihò.) 'The two of you chant in the chorus collusively.'

    微算機

    wēisuànjī

    máyvitính / vitoáncơ

    mjɪswankɨj

    *mɯjsloːnsʔkɨjkɯl

    micro computer

    微算機 is a Vietnamese compound of [ (vi) – micro] + [  (toán) – compute ] +  () – machine] = 'micro computer', a polysyllabic compound with SV and VS roots.  OC and MC sounds are what to make up the pronunciation.

    魚汁 yúzhī

    sốtcá, nướccá, mắmcá, nướcmắm /  ngưtrấp

    ŋjotɕip

    *ŋakjub

    fish sauce

    There is a noteworthy etymological case in English—the word ketchup (or catsup) (note the syllable cat)—which has a Sino-origin meaning of “fish sauce” (魚汁 yúzhī). The British originally borrowed it from the Fujian region in earlier times, where locals used fermented fish sauce for seasoning. However, when this item was brought back to England, the English language transformed it into “tomato sauce.”

    公雞

    gōngjī vs

    雄雞.

    xióngjī 

    gàcồ / gàtrống côngkê / hùngkê

    kəwŋkiej

    *klo:ŋke: 

    rooster

    They are strong cases of Semantic divergence within lexical chains with variants across numerous Sinitic lects. Cf. 母雞 mǔjī (gàmái, gàmẹ, 'hen'), not to mention ancient local forms such as  gōng雞母 jīmǔ (gàmẹ).

    龍飛鳳舞
    lóngfēi-fèngwǔ

     rồngbay-phượngmúa /

    longphi-phụngvũ

    luawŋpwyj-
    buwŋwǔ 

    *b·roŋpɯl-
    bumsmaʔ

    grand and flamboyant style

    Another case of direct semantic chaining.

    大志
    dà​zhì​ (đạichí)

     chícả,

    chílớn /

    đạichí

    dajtɕɨ

    *da:dstjɯs

    high aims

    Another case of semantic divergence within lexical chains, cf, variant:  胸無大志. xiōngwúdàzhì. (ngườikhôngcóchí.)

    顆心

    kēxīn

    tráitim,

    contim,

    quảtim / 

    khoảtâm

    kʰwasim

    *kʰloːlʔslɯm

    heart

    This should be a case of macro-domain chaining within the semantic divergence within lexical chains. Cf. 果 guǒ (SV quả) 'fruit', hence 'trái' (clipping of VS 'quảtrái' 果實  guǒshí (SV quảthực) that gives rise to a classifier 顆 kē for 'con', 'quả', 'trái' (small round objects), Vietnamized as a morphemic-modifier syllable in 顆心 kēxīn.

    酒席
    jǐuxí

    rượutiệc,

    tiệcrượu /

    tửutịch

    tsɨuziajk

    *ʔsluʔljaːɡ

    banquet

    Example: 富家 一 席酒 窮漢 半年 糧. Fùjiā yì xíjǐu, qiónghàn bànnián liáng. (Một bữa tiệcrượu của ngườigiàu bằng lương nửanăm kẻnghèo.)

    力氣

    lìqì

    sứclực,

    hơisức /

    lựckhí

    lɨkhɨj

    *rɯɡkʰɯds

    strength

    Native parallels in Vietic, while VS 'sức' is, a native Vietnamese Proto-Vietic k-rək, cognate of Chinese 力, not merely a translation or borrowing. It reflects a shared etymological ancestry and semantic continuity, while lực is the Sino-Vietnamese literary reflex directly borrowed from Middle Chinese. In this case, 氣 (SV khí, VS hơi, 'steam') is associated with 'sức'. Cf. 氣力 qìlì (SV khílực, VS hơisức) 'power'.

    本錢

    běnqián

    tiềnvốn

    vốnliếng /

    bảntiền bổntiền


    pwəndziɛn

    *pɯːnʔslenʔ

    root/funds

    This is a case of semantic divergence within lexical chains with phonological innovation on the second syllable, which is commonplace in Sinitic-Vietnamese. Ex.  她的 本錢 是 青春 美麗. Tāde běnqián shì qīngchūn měilì. (Vốnliếng của nàng là thanhxuân trẻđẹp.) 'Her capital is youth and beauty.'

    工役  gōngyì

    côngviệc /

    côngdịch

    ywek

    *wjek

    service / work

    a compound that reflects both Sino-Vietnamese inheritance and vernacular semantic fusion, ahybrid compound typical of Vietnamese lexical layering. For 'việc' (役 yì, SV dịch) semantic layering of 為 (OC *ɢʷal, 'to do') with k-extension, its etymology can be traced directly to the Classical Chinese compound 工役 (gōngyì), though. Cf. 公務 gōngwù (SV côngvụ, VS côngviệc) 'business'

    幸福  xìngfú

    phướclành,

    phúclành /

    hạnhphúc

    hạnhphước

     ɦəɨjŋpʰuw 

    *ɡreːŋʔpɯɡ

    bliss

    幸福 (xìngfú) and Vietnamese expressions like hạnhphúc, phướclành, and phúclành are historically and semantically related, though they represent distinct etymological layers. In fact they are made up of compounding of Sinitic-Vietnamese on Sino-Vietnamese roots. Cf. 良 (liáng, SV lương, VS lành)

    妙法 miàofǎ 

    pháp / phépmàu,

    phépmầu /

    diệupháp

    miawfǎ

    *mewspqab

    miracle

    phép and pháp are reflexes of 法, stratified by register: phép for colloquial/magical, pháp for formal/legal/religious. SV diệupháp, preserves Buddhist doctrinal nuance while in Sinitic-Vietnamese phépmàu, reflecting emotional and magical connotations.

    寡婦

    guǎfù


    goáphụ, 

    goábụa,

    bàgoá,

    bàgiá,

    ởgoá,

    ởvậy /

    quảphụ

    kʷɯabuw

    *kwra:ʔbɯʔ

    widowed woman

    A rich array of Vietnamese reflexes across Sino-Vietnamese, vernacular, and idiomatic registers, each encoding different layers of phonological inheritance, semantic drift, and cultural framing via Vietnamization @ 寡 guă ~ 'ở' 於 yú (vu), 'giá', @ 婦 fù ~ 'vợ', 'bụa', 'bà' 婆 pó (bà)


    Each row is immediately explicated and cross-referenced in chapter analysis, not only in terms of phonology but also register, semantic chain (noting compounds or domain expansions), and, when relevant, polysyllabic annotation (e.g., máyvitính - ‘computer,’ nướcmắm - ‘fish sauce’).

    Digital tools and annotation practices

    The methodological innovations in Chapter 1 extend beyond theoretical linguistics, grounded in explicit annotation protocols and strategic digital tool integration. Key components include:

    • Corpus annotation standards: Utilization of structured templates enables hierarchical, coded annotation of etymological trees, register transitions, and semantic chain pathways. These formats are fully compatible with automated parsing and scalable lexicon development.

    • Spreadsheet and case-by-variable formatting: Items are compiled in tabular form with columns for VS/SV representation, stratum/register classification, Middle and Old Chinese reconstructions, glosses, and comparative notes—facilitating both statistical modeling and graphical visualization.

    • Comparative database linkage: The corpus integrates with external linguistic databases, including classical Chinese rime books, Kangxi Dictionary, Shafer's Sino-Tibetan research, numerous authors' works on Middle Chinese and Old Chinese reconstruction, Austroasiatic etymological inventories, and compiled Vietnamese word lists by international linguists, which enables robust synchronic and diachronic querying across language families.

    • Register and tonogenesis tagging: Annotation protocols explicitly mark register (colloquial vs. literary), lexical stratum (ESV/LSV/RSV), and tonal origin—capturing features such as coda type, pitch contour, and onset class—in alignment with contemporary corpus linguistic standards.

    These practices yield a replicable and extensible annotation framework, essential for long-term comparative research, digital humanities applications, and empirical testing of the AMC and substratum hypotheses advanced in the chapter.

    Linking corpus findings to the chapter’s thesis

    The corpus analysis in Chapter 1 provides compelling, multi-axis support for its central thesis: that the Sinitic-Vietnamese lexical stratum is not a superficial overlay derived from distant literary Chinese, but a deeply embedded, generative system. This system emerged through sustained bilingual contact with regional Sinitic lects and a Yue-derived proto-Vietic substrate—forming a foundational layer in the Vietnamese lexicon that reflects centuries of linguistic convergence, adaptation, and innovation.

    Key points of thesis substantiation:


    Depth and breadth of Sinitic integration: The presence of Early Sino-Vietnamese (ESV) and even pre-ESV elements—particularly within foundational lexical domains—demonstrates that the Sinitic layer extends far beyond scholarly or technical vocabulary. It constitutes a structural substratum of the Vietnamese lexicon. The diachronic continuity from pre-tonal, presyllabic forms preserved in conservative Vietic dialects to modern standardized VS/SV readings reflects not episodic borrowing, but a sustained and dynamic process of linguistic accretion—a gradual “lacquering” of the Vietnamese language over centuries of contact and adaptation.4 8.

    Stratification over replacement: The coexistence of overlapping registers, recurrent doublets, and regionally differentiated variants reflects a layered lexical system in which native Vietic forms, Sinitic borrowings, and bilingual innovations actively interact. Rather than a linear process of substitution or overwriting, the Vietnamese lexicon reveals a dynamic stratification shaped by sustained contact, functional differentiation, and contextual adaptation.10. 

    Creative adaptation and innovation: The use of compounding, semantic chaining, and native coinage with Sinitic material—especially within technical, administrative, and scholarly registers—attests to the active linguistic agency of Vietnamese speakers. Rather than passively transmitting reading glosses from Chinese texts, they domesticated and recontextualized imported morphemes, embedding them within indigenous syntactic and semantic frameworks to generate novel, functional expressions10. 

    Comparative validation of the AMC model: Cross-linguistic data from Southwest Chinese lects, conservative Vietic varieties, and other Austroasiatic branches affirm the Red River Delta as a sustained contact zone. These comparisons substantiate the AMC model’s premise: that a localized Sinitic variety exerted deep phonological, lexical, and semantic influence on emerging Vietnamese, shaping its structure through prolonged and multidirectional interaction12. 

    Synthesis and Conclusion

    At its core, the corpus and its stratified analysis offer extensive, multi-dimensional, and empirically grounded validation of the argument that the Sinitic layer in Vietnamese is not a passive residue of literary Chinese, but a dynamic, creative, and foundational substrate shaped by localized, heterogeneous contact.

    The Chapter 1 corpus—meticulously curated and annotated—serves as a benchmark for analytic and comparative corpus linguistics in the context of East and Southeast Asian language contact. Through structural precision, multi-tiered etymological mapping, detailed register and phonological annotation, and the explicit tracing of semantic chains and polysyllabic innovation, it reveals the embeddedness and generative capacity of the Sinitic-Vietnamese stratum within the broader Vietnamese lexicon.

    Most importantly, the corpus affirms the central thesis: that Sino-Vietnamese (SV) is not merely a superimposed layer of foreign vocabulary, but a deeply indigenized stratum—lacquered into the linguistic fabric of Vietnamese through centuries of adaptation, semantic reconfiguration, and creative agency. This insight not only reframes the historical trajectory of Vietnamese but also establishes a new paradigm for approaching language contact and lexical stratification in global linguistic research.

     


    References 


    1. Vietic languages .Wikipedia
    2. Mark Alves. Early Sino-Vietnamese Lexical Data and the Relative Chronology of Tonogenesis In Chinese And Vietnamese
    3. Mark Alves. 2009. Vietnamese Vocabulary. WOLD
    4. Mark Alves. Identifying Early Sino-Vietnamese Vocabulary via Linguistic, Historical Archaeological and Ethnological Data 
    5. Mark Alves. Notes on Sino-Vietnamese Historical Phonology
    6. John Phan. Lacquered Words: The Evolution Of Vietnamese Under Sinitic Influences From The 1st Century BCE Through The 17Th Century CE
    7. John Phan. Lacquered Words: The Evolution Of Vietnamese Under Sinitic...
    8. The Baxter-Sagart reconstruction of Old Chinese
    9. Mark Alves. From Vietic Presyllables To Vietnamese Simplex Onsets
    10. Sino-Vietnamese vocabulary.Wikipedia
    11. The Etymologies of Vietnamese Numeral Terms and Implications of ...
    12. Historical Ethnolinguistic Notes on Proto-Austroasiatic and Proto ....
    13. Template: etymon.Wikipedia
    14. Baxter, William H. and Sagart, Laurent 2011. (STEDT)

    15. Stefan Th. Gries and Magali Paquot. Chapter 26 Writing up a Corpus-Linguistic Paper

    16. 漢典 zdic.net

    17. Nguyễn, Ngọc San. 1993. Tìm hiểu về Tiếng Việt Lịch sử. TP HCM: NXB Giáo dục.

    18. Shafer, Robert. 1966-1974. Introduction to Sino-Tibetan (4 volumes). Wiesbaden: Otto Harrassowitz.

    19. Thomas, David D. 1966. “Mon-Khmer Subgroupings in Vietnam,” in Norman Zide (ed.) Studies in Comparative Austroasiatic Linguistics. The Hague: Mouton.

    20. Luce, Gordon Hannington. 1965. "Danaw, a Dying Austroasiatic Language" in “Historical Linguistics” Indo-Pacific Linguistic Studies

    21. Han-Viet.com