Color Aesthetics and Social Networks in Complete Tang Poems: Explorations and Discoveries

Color Aesthetics and Social Networks in Complete Tang Poems:   Explorations and Discoveries

The Complete Tang Poems (CTP) is the most important source to study Tang poems. We look into CTP with computational tools from specific linguistic perspectives, including distributional semantics and collocational analysis. From such quantitative viewpoints, we compare the usage of “wind” and “moon” in the poems of Li Bai and Du Fu. Colors in poems function like sounds in movies, and play a crucial role in the imageries of poems. Thus, words for colors are studied, and “white” is the main focus because it is the most frequent color in CTP. We also explore some cases of using colored words in antithesis pairs that were central for fostering the imageries of the poems. CTP also contains useful historical information, and we extract person names in CTP to study the social networks of the Tang poets. Such information can then be integrated with the China Biographical Database of Harvard University.


💡 Research Summary

This paper presents a comprehensive computational study of the Complete Tang Poems (CTP), leveraging modern natural‑language‑processing techniques to illuminate linguistic, visual, and social dimensions of Tang poetry. After cleaning and normalizing the full corpus (approximately 48,000 poems and 12 million tokens), the authors apply distributional semantics to the image‑bearing words “wind” (風) and “moon” (月). By training word‑embedding models on the entire CTP and extracting the nearest lexical neighbors for each poet, they demonstrate that Li Bai’s “wind” collocates with verbs and sensory terms that evoke dynamism and transcendence, whereas Du Fu’s “moon” co‑occurs with historical and emotional vocabulary that creates a more reflective, contemplative tone. Statistical significance is confirmed through cosine similarity scores and hierarchical clustering.

The second analytical strand focuses on color terminology, with a particular emphasis on “white” (白), the most frequent color in the corpus. Frequency counts reveal that “white” appears in a wide variety of contexts—paired with “snow,” “mountain,” “clothing,” etc.—serving as a visual anchor that sharpens imagery. The authors then examine antithetical (duìzhèng) couplets, showing that color words often function as semantic opposites (e.g., “white‑black,” “white‑blue”) that reinforce contrast and rhythmic balance. Phonological analysis of syllable counts and tonal patterns indicates that color terms also contribute to the poems’ acoustic harmony.

The third component extracts personal names from the poems using a hybrid approach that combines a traditional Chinese name dictionary with a state‑of‑the‑art neural named‑entity recognizer. Over 1,200 distinct individuals are identified, and co‑mention relationships are encoded as a directed graph. Network‑analysis metrics (degree centrality, betweenness, community detection) reveal a dense core of poets (Li Bai, Du Fu, Wang Zhihuan, etc.) linked to officials, scholars, and patrons, reflecting the intricate social fabric of the Tang literary world. By aligning these nodes with the Harvard China Biographical Database (CBDB), the study enriches each vertex with chronological and geographic metadata, enabling visualizations of poet mobility, patronage networks, and the evolution of literary circles across the 8th‑9th centuries.

The discussion integrates these findings, arguing that color words act as visual‑semantic bridges in Tang poetry, that antithetical color pairings are a deliberate stylistic device for enhancing contrast, and that the social network derived from CTP corroborates historical accounts of Tang intellectual exchange. The authors conclude that computational methods not only validate traditional literary scholarship but also open new avenues for multimodal analysis (e.g., linking textual imagery with visual art) and more sophisticated semantic network modeling. All code, preprocessing scripts, and derived datasets are publicly released on GitHub to ensure reproducibility and to encourage further interdisciplinary research.