Computer Science / Digital Libraries Computer Science / NLP

Co-word Analysis using the Chinese Character Set

February 23, 2026

Reading time: 5 minute

...

#Analysis #NLP #Computer Science #Digital Libraries

📝 Original Info

Title: Co-word Analysis using the Chinese Character Set
ArXiv ID: 0911.1451
Date: 2009-11-10
Authors: Researchers from original ArXiv paper

📝 Abstract

Until recently, Chinese texts could not be studied using co-word analysis because the words are not separated by spaces in Chinese (and Japanese). A word can be composed of one or more characters. The online availability of programs that separate Chinese texts makes it possible to analyze them using semantic maps. Chinese characters contain not only information, but also meaning. This may enhance the readability of semantic maps. In this study, we analyze 58 words which occur ten or more times in the 1652 journal titles of the China Scientific and Technical Papers and Citations Database. The word occurrence matrix is visualized and factor-analyzed.

💡 Deep Analysis

Deep Dive into Co-word Analysis using the Chinese Character Set.

📄 Full Content

Co-word Analysis using the Chinese Character Set Loet Leydesdorff a & Ping Zhou b,c Journal of the American Society for Information Science & Technology (forthcoming)

Abstract Until recently, Chinese texts could not be studied using co-word analysis because the words are not separated by spaces in Chinese (and Japanese). A word can be composed of one or more characters. The online availability of programs that separate Chinese texts makes it possible to analyze them using semantic maps. Chinese characters contain not only information, but also meaning. This may enhance the readability of semantic maps. In this study, we analyze 58 words which occur ten or more times in the 1652 journal titles of the China Scientific and Technical Papers and Citations Database. The word occurrence matrix is visualized and factor-analyzed.

Keywords: co-occurrence, co-word, visualization, Chinese, separation, semantic map

a Amsterdam School of Communications Research (ASCoR), University of Amsterdam, Kloveniersburgwal 48, 1012 CX Amsterdam, The Netherlands; loet@leydesdorff.net; http://www.leydesdorff.net
b K.U. Leuven, Steunpunt O&O Indicatoren, Dekenstraat 2, B-3000 Belgium; Ping.Zhou@econ.kuleuven.be c Institute of Scientific and Technical Information of China,15 Fuxing Road, Beijing, 100038, P. R. China.

1 Introduction

Unlike most languages, Chinese does not use spaces to separate characters into words. Co-word analysis (Van Rijsbergen, 1977; Salton & McGill, 1983; Callon et al., 1982 and 1986; Leydesdorff, 1989 and 1997) has therefore been unable to use Chinese texts. Given the increased importance of the Chinese contribution to science (Zhou & Leydesdorff, 2006), the mapping of co-words in this language has been desirable for some time. (In a previous study, Park & Leydesdorff (2004) solved the problem of mapping texts using the Korean character set; a set of freeware programs was brought online for this purpose at http://www.leydesdorff.net/krkwic .)

Recently, Chen (2007) reported a co-word map using the “traditional Chinese” character set based on software developed by the Academica Sinica in Taiwan (at http://ckipsvr.iis.sinica.edu.tw/) . Similar software is available in mainland China for “simplified Chinese.” In an analysis of Japanese policy documents based on parsing of the various character sets in Japanese with dedicated software, Fujigaki & Nagata (1998, at p. 394, note 6) suggested that word and co-word analysis may be more meaningful when using the Chinese character set because these characters (unlike the phonetic ones)1 would contain not only information, but also meaning.

In this brief communication, we explore co-word analysis using Chinese characters and compare the results against our background knowledge of using co-word analysis in

1 In addition to the Chinese (“Kanji”) characters, Japanese uses two other sets (Hiragana and Katakana) containing phonetic characters.

2 English (Leydesdorff, 1989 and 1997). For that purpose, we apply one of these tools—中文智能分词available at http://www.hylanda.com/product/fenci/tiyan/index.html —to the list of journal titles contained in the China Scientific and Technical Papers and Citations Database (CSTPCD) database.

Data and methods

The data consists of the 1652 journal titles contained in the China Scientific and Technical Papers and Citations Database (CSTPCD) of the Institute of Scientific and Technical Information in Beijing in 2005. The CSTPCD is a database of Chinese journals organized in a manner comparable to the Science Citation Index of Thomson Scientific/ISI (Zhou & Leydesdorff, 2007). Of the 1652 journals only 36 were published under titles in English. Ren (2005) estimated that approximately 5,000 scientific journals are published regularly in the People’s Republic of China.

1157 words occur 4509 times in these 1652 titles. Among these words 697 occur only once; the 58 words that occur ten or more times were used in this analysis. Only two words among them (“Chinese” and “Journal”) are in English. These 58 words were cross- tabled against the 1652 titles. The asymmetrical occurrence matrix was normalized using the cosine for the normalization of the word vectors (Ahlgren et al., 2003; Leydesdorff & Vaughan, 2006). The cosine matrix was taken for the visualization using Pajek, and the data matrix was also factor-analyzed using SPSS.

3 Results

Figure 1 provides the resulting co-word map. The threshold is set at cosine > 0.07 which is the average value for the cosine in the matrix when the cells with zeros are not included.2 The size of the nodes is proportional to the logarithm of the number of occurrences of the word, and the thickness of the lines is drawn proportionally to the strength of the relationship (as measured by the cosine).

Figure 1: Cosine-normalized map of

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Co-word Analysis using the Chinese Character Set

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

Automatic derivation of domain terms and concept location based on the analysis of the identifiers

LitStoryTeller: An Interactive System for Visual Exploration of Scientific Papers Leveraging Named entities and Comparative Sentences

NileTMRG at SemEval-2017 Task 4: Arabic Sentiment Analysis

Start searching

No results found