Sunday, October 12, 2025

The Comparative Wordlists — Method and Cautions

Framing Vietnamese within Yue‑Taic strata



Comparative wordlists are the indispensable tools of historical linguistics. They allow us to align forms across languages, test hypotheses of cognacy, and reconstruct proto‑forms. Yet they are also treacherous: superficial resemblance can seduce us into false conclusions if we do not apply rigorous method. In the case of Vietnamese, where Sinitic, Austroasiatic, and Yue‑Taic strata overlap, the danger of misclassification is especially acute.

1. The promise of wordlists

From the 19th century onward, scholars compiled parallel lists of Vietnamese, Chinese, and Mon‑Khmer words. These lists revealed striking correspondences: Vietnamese mẹ “mother” with Khmer mday; Vietnamese đầu “head” with Chinese 頭 tóu; Vietnamese nước “water” with Proto‑Tai *nam*. Such comparisons suggested that Vietnamese was not a simple isolate but a convergence zone. Wordlists thus provided the first evidence for the layered nature of Vietnamese.

2. Methodological principles

  • Sound correspondences: True cognates show regular phonological patterns, not random similarity.
  • Semantic stability: Core vocabulary (body parts, kinship, natural elements) is more reliable than cultural terms.
  • Register awareness: Vietnamese often preserves both a vernacular form and a Sino‑Vietnamese doublet; both must be tracked.
  • Areal diffusion: Some similarities reflect borrowing across neighbors, not shared ancestry.

3. Comparative tables

The following table illustrates how wordlists must be read critically:

Gloss Viet-namese Sino‑
Viet-namese
Chin. (OC/MC) Mon‑
Khmer
Proto‑
Tai
Notes
head đầu / tróc thủ 頭 tóu < OC *duʔ* Khmer tpoal *thaw* đầu resembles 頭, but phonology suggests borrowing; tróc may preserve older layer.
tooth răng linh 齡 líng
< MC lɛjŋ
< OC *reːŋ
Khmer t’mieng, Mon rang *hnɯŋ* Austroasiatic alignment is stronger; Sino‑Vietnamese nha is literary.
sky trời thiên 天 tiān < OC *l̥ˤin* *hlɯi* trời may reflect Yue‑Taic mediation; thiên is a learned borrowing.

4. False cognates

Wordlists can mislead when superficial similarity masks different origins. For example, Vietnamese sóc “squirrel” resembles Chinese 松鼠 sōngshǔ, but the resemblance is coincidental: sóc is native, while 松鼠 is a descriptive compound (“pine‑rat”).1

5. Semantic grids

To avoid misclassification, we must map semantic domains systematically. The following grid shows how kinship terms stratify:

Gloss Sinitic-Vietnamese Sino‑
Vietnamese
Notes
mother mẹ < mợ < vú < u mẫu, mô 母 mǔ, mú, wǔ, wú < MC məw < OC *mɯʔ 
father bố phụ, phủ 父 fù, fǔ < MC pio < OC *paʔ, *baʔ
wife vợ < bụa phụ 婦 fù < MC buw < OC *bɯʔ

6. Conclusion

Comparative wordlists are indispensable, but they must be read with caution. Vietnamese demonstrates how easily false cognates can mislead, and how register stratification complicates classification. Only by combining phonological correspondences, semantic stability, and historical context can we use wordlists responsibly.

Key takeaways:
  • Wordlists are powerful but dangerous if read superficially.
  • Sound correspondences and semantic stability are the gold standard for identifying cognates.
  • Vietnamese often preserves both vernacular and Sino‑Vietnamese forms, which must be tracked separately.
  • False cognates are common; caution and rigor are essential.

Footnotes

  1. Handel, Zev (1998). On false cognates in Sino‑Vietnamese comparison. Example of sóc vs. 松鼠. 

References

Baxter, William H.; Sagart, Laurent (2014). Old Chinese: A New Reconstruction. Oxford University Press.
Handel, Zev (1998). “False Cognates in Sino‑Vietnamese Studies.” Journal of Chinese Linguistics.
Luce, Gordon H. (1959). Comparative wordlists of Vietnamese, Mon, and Chinese. Rangoon University Press.