Title: Knowledge Organization Research in the last two decades: 1988-2008
ArXiv ID: 1110.5419
Date: 2010-06-01
Authors: : SanJuan, M. & Ibekwe-SanJuan, F.
📝 Abstract
We apply an automatic topic mapping system to records of publications in knowledge organization published between 1988-2008. The data was collected from journals publishing articles in the KO field from Web of Science database (WoS). The results showed that while topics in the first decade (1988-1997) were more traditional, the second decade (1998-2008) was marked by a more technological orientation and by the appearance of more specialized topics driven by the pervasiveness of the Web environment.
💡 Deep Analysis
📄 Full Content
of the methodology are not made explicit. In particular, some studies did not make explicit the criteria for data selection, the analysis method for selecting important facts, and how the synthesis of published works was arrived at. With the notable exception of Lopez-Huertas (2008 andSaumure &Shiri (2008), the other authors did not furnish details on the dataset used and how it was gathered.
Automatic techniques for data analysis and representation have been around for a long time but have rarely if ever been used by the KO community. Smiraglia’s (2009) study represents an attempt to apply such techniques to the KO field. The author applied ACA (Author Co-citation Analysis) to records of papers published between 1993-2009 in the Knowledge Organization (KO) journal. He sought to determine a possible North American (NA) influence in KO research by contrasting the ACA map obtained from NA authors with the one obtained from non-NA authors. ACA sheds light primarily on the intellectual base of a field, i.e, past authors whose works are being cited by publishing authors but not on current publishing authors.
Given the ever growing volume of published works, a manual synthesis of trends in any scientific field requires a superhuman effort. Data analysis and bibliometrics offer an acceptable alternative by providing methods and tools to automatically map out the key topics, authors, journals or documents in a given field. We applied our text analysis system in order to identify key research topics in KO based on a much wider selection of journals (31) and geographic coverage (world). We studied the period between 1988-2008 and focused on the publication content of publishing articles as reflected by their titles and abstracts. To the best of our knowledge, this study represents the first attempt to apply text data analysis methods, in particular natural language processing (NLP), clustering and information visualization techniques to automatically map trends in KO research. Data collection turned out to be a bottleneck issue for KO publications. While collecting records of publications in the KO journal and other journals publishing KOrelated studies was a relatively straight forward matter, collecting the same records for the ISKO conferences was a different kettle of fish. Records of the ISKO conference proceedings are not available in raw text format nor were they indexed in a systematic way. We then had to limit our source to journals only. We collected bibliographic records of publications from Web of Science (WoS). As previous authors had observed (Saumure & Shiri 2008), identifying publications in KO comes with the problem of delimiting the sense of knowledge organization. We manually examined the list of journals obtained from our initial query and selected 31 which published papers on KO in the KO-LIS sense between 1988-2008. This list included the ancestor of the KO journal formerly called “International Classification”. A total of 931 records were obtained out of which 838 came from the KO journal and its ancestor. The list of journals used can be found at http://fidelia1.free.fr/isko2010/data/list-journals.pdf
.
We split the corpus into two periods : 1988-1997; 1998-2008 which we will call respectively 1 st and 2 nd decade (even if the 2 nd period covers 11 years). We then fed titles and abstracts of each period into our text mining platform TermWatch. This platform includes several text processing components. We used essentially three components of the platform in this analysis: term extraction and variant identification, term clustering and information visualization. The whole process is automated. We refer the interested reader to SanJuan & Ibekwe-SanJuan ( 2006) for a detailed and formal presentation of TermWatch.
First, domain terms were extracted based on morph-syntactic rules. Second, a term variant identifier searches for relations amongst the terms. We defined three families of terminological operations that engender semantic relations between terms: orthographic, lexico-syntactic (inclusion, substitution) and semantic (synonymy). Spelling variants and synonyms are acquired by consulting WordNet. Lexico-syntactic variations refer mainly to two linguistic operations: lexical inclusion (aka expansion) and lexical substitution. Lexical inclusion concerns insertions or additions of modifier words in a term as in “classification scheme /universal classification scheme” or of head words like in “knowledge organization / knowledge organization system. Lexical inclusion reflects hierarchical relations between a generic term (hypernym) and its more specific variant (hyponym). Substitutions relate terms of the same length but which vary by the change of only one word, in the same position: head substitution (knowledge organization system / knowledge organization tool) and modifier substitution (generic classification scheme / universal classification scheme).