Title: Co-word Analysis using the Chinese Character Set
ArXiv ID: 0911.1451
Date: 2009-11-10
Authors: Researchers from original ArXiv paper
📝 Abstract
Until recently, Chinese texts could not be studied using co-word analysis because the words are not separated by spaces in Chinese (and Japanese). A word can be composed of one or more characters. The online availability of programs that separate Chinese texts makes it possible to analyze them using semantic maps. Chinese characters contain not only information, but also meaning. This may enhance the readability of semantic maps. In this study, we analyze 58 words which occur ten or more times in the 1652 journal titles of the China Scientific and Technical Papers and Citations Database. The word occurrence matrix is visualized and factor-analyzed.
💡 Deep Analysis
Deep Dive into Co-word Analysis using the Chinese Character Set.
Until recently, Chinese texts could not be studied using co-word analysis because the words are not separated by spaces in Chinese (and Japanese). A word can be composed of one or more characters. The online availability of programs that separate Chinese texts makes it possible to analyze them using semantic maps. Chinese characters contain not only information, but also meaning. This may enhance the readability of semantic maps. In this study, we analyze 58 words which occur ten or more times in the 1652 journal titles of the China Scientific and Technical Papers and Citations Database. The word occurrence matrix is visualized and factor-analyzed.
📄 Full Content
Co-word Analysis using the Chinese Character Set
Loet Leydesdorff a & Ping Zhou b,c
Journal of the American Society for Information Science & Technology (forthcoming)
Abstract
Until recently, Chinese texts could not be studied using co-word analysis because the
words are not separated by spaces in Chinese (and Japanese). A word can be composed of
one or more characters. The online availability of programs that separate Chinese texts
makes it possible to analyze them using semantic maps. Chinese characters contain not
only information, but also meaning. This may enhance the readability of semantic maps.
In this study, we analyze 58 words which occur ten or more times in the 1652 journal
titles of the China Scientific and Technical Papers and Citations Database. The word
occurrence matrix is visualized and factor-analyzed.
a Amsterdam School of Communications Research (ASCoR), University of Amsterdam,
Kloveniersburgwal 48, 1012 CX Amsterdam, The Netherlands; loet@leydesdorff.net;
http://www.leydesdorff.net b K.U. Leuven, Steunpunt O&O Indicatoren, Dekenstraat 2, B-3000 Belgium;
Ping.Zhou@econ.kuleuven.be
c Institute of Scientific and Technical Information of China,15 Fuxing Road, Beijing, 100038, P. R. China.
1
Introduction
Unlike most languages, Chinese does not use spaces to separate characters into words.
Co-word analysis (Van Rijsbergen, 1977; Salton & McGill, 1983; Callon et al., 1982 and
1986; Leydesdorff, 1989 and 1997) has therefore been unable to use Chinese texts. Given
the increased importance of the Chinese contribution to science (Zhou & Leydesdorff,
2006), the mapping of co-words in this language has been desirable for some time. (In a
previous study, Park & Leydesdorff (2004) solved the problem of mapping texts using
the Korean character set; a set of freeware programs was brought online for this purpose
at http://www.leydesdorff.net/krkwic
.)
Recently, Chen (2007) reported a co-word map using the “traditional Chinese” character
set based on software developed by the Academica Sinica in Taiwan (at
http://ckipsvr.iis.sinica.edu.tw/)
. Similar software is available in mainland China for
“simplified Chinese.” In an analysis of Japanese policy documents based on parsing of
the various character sets in Japanese with dedicated software, Fujigaki & Nagata (1998,
at p. 394, note 6) suggested that word and co-word analysis may be more meaningful
when using the Chinese character set because these characters (unlike the phonetic ones)1
would contain not only information, but also meaning.
In this brief communication, we explore co-word analysis using Chinese characters and
compare the results against our background knowledge of using co-word analysis in
1 In addition to the Chinese (“Kanji”) characters, Japanese uses two other sets (Hiragana and Katakana)
containing phonetic characters.
2
English (Leydesdorff, 1989 and 1997). For that purpose, we apply one of these tools—中
文智能分词available at http://www.hylanda.com/product/fenci/tiyan/index.html
—to the
list of journal titles contained in the China Scientific and Technical Papers and Citations
Database (CSTPCD) database.
Data and methods
The data consists of the 1652 journal titles contained in the China Scientific and
Technical Papers and Citations Database (CSTPCD) of the Institute of Scientific and
Technical Information in Beijing in 2005. The CSTPCD is a database of Chinese journals
organized in a manner comparable to the Science Citation Index of Thomson
Scientific/ISI (Zhou & Leydesdorff, 2007). Of the 1652 journals only 36 were published
under titles in English. Ren (2005) estimated that approximately 5,000 scientific journals
are published regularly in the People’s Republic of China.
1157 words occur 4509 times in these 1652 titles. Among these words 697 occur only
once; the 58 words that occur ten or more times were used in this analysis. Only two
words among them (“Chinese” and “Journal”) are in English. These 58 words were cross-
tabled against the 1652 titles. The asymmetrical occurrence matrix was normalized using
the cosine for the normalization of the word vectors (Ahlgren et al., 2003; Leydesdorff &
Vaughan, 2006). The cosine matrix was taken for the visualization using Pajek, and the
data matrix was also factor-analyzed using SPSS.
3
Results
Figure 1 provides the resulting co-word map. The threshold is set at cosine > 0.07 which
is the average value for the cosine in the matrix when the cells with zeros are not
included.2 The size of the nodes is proportional to the logarithm of the number of
occurrences of the word, and the thickness of the lines is drawn proportionally to the
strength of the relationship (as measured by the cosine).