We are facing a real challenge when coping with the continuous acceleration of scientific production and the increasingly changing nature of science. In this article, we extend the classical framework of co-word analysis to the study of scientific landscape evolution. Capitalizing on formerly introduced science mapping methods with overlapping clustering, we propose methods to reconstruct phylogenetic networks from successive science maps, and give insight into the various dynamics of scientific domains. Two indexes - the pseudo-inclusion and the empirical quality - are introduced to qualify scientific fields and are used for reconstruction validation purpose. Phylogenetic dynamics appear to be strongly correlated to these two indexes, and to a weaker extent, to a third one previously introduced (density index). These results suggest that there exist regular patterns in the "life cycle" of scientific fields. The reconstruction of science phylogeny should improve our global understanding of science evolution and pave the way toward the development of innovative tools for our daily interactions with its productions. Over the long run, these methods should lead quantitative epistemology up to the point to corroborate or falsify theoretical models of science evolution based on large-scale phylogeny reconstruction from databases of scientific literature.
Deep Dive into The Reconstruction of Science Phylogeny.
We are facing a real challenge when coping with the continuous acceleration of scientific production and the increasingly changing nature of science. In this article, we extend the classical framework of co-word analysis to the study of scientific landscape evolution. Capitalizing on formerly introduced science mapping methods with overlapping clustering, we propose methods to reconstruct phylogenetic networks from successive science maps, and give insight into the various dynamics of scientific domains. Two indexes - the pseudo-inclusion and the empirical quality - are introduced to qualify scientific fields and are used for reconstruction validation purpose. Phylogenetic dynamics appear to be strongly correlated to these two indexes, and to a weaker extent, to a third one previously introduced (density index). These results suggest that there exist regular patterns in the “life cycle” of scientific fields. The reconstruction of science phylogeny should improve our global understandin
We are facing a real challenge when coping with the increasingly changing nature of science. First, the millions of papers published every year make clearly impossible for anybody to be aware of all the important breakthroughs and developments in science. This issue is made even more critical by the continuous acceleration of scientific production, which threatens every scholar with information overload (the volume of publications per year has doubled the last 12 years). Second, although science is not carved in marble and would better be defined as an ever-changing enterprise [12], a lively debate has been taken place for more than 10 years around the shift toward a new regime of knowledge production following the transformation of the nature of the research process.
According to [16] science has recently entered a new mode, where knowledge is generated within a wider context of application, making full place to trans-disciplinarity, defined as the circulation of tools, theoretical perspectives, and people. Whatever the causes of such transformations, the frontiers of science indeed appear to be even faster changing and getting blurred as fields and sub-fields are cross-fertilizing, growing or dying. There is an urge to map these fluctuating landscapes.
Science mapping is one of the aims of scientometrics, a young science that took off in the late seventies, fostered by the development of electronic scientific databases and the increasing power of computers. Data-mining methods (in the wide sense) have been developed that make it possible to identify patterns, or meso structures in scientific corpora that make sense to us (e.g. scientific fields or epistemic fields). The articulation between these scientific fields are then displayed on science maps to give overviews of scientific domains.
Part of the utility of science maps, both for theorists (science studies, history and philosophy of science), for users (scientists) or policy makers, comes from their capacity to give meaning to the evolution of science: what are the emergent fields, the continuities and main paradigmatic shifts, and from which scientific fields does a new field inherit its intellectual background. There is thus an important concern about reconstructing these dynamics in such a way that fields of knowledge could be tracked through time. From the theoretical point of view, this entails that the core object in the representation of the evolution of science is a phylogenetic network while most scientometrics studies focus on science snapshots. In this article, we will show that co-word analysis is a suitable approach from this perspective and propose methods for an automated reconstruction of science phylogenies. The core question is: How can we reconstruct science dynamics through automated bottom-up analysis of scientific publications?
1 Science mapping A large proportion of science maps are built upon co-occurrence data, with the assumption that the more likely two elements co-occur in the same article, the more they are related, and the closer they should appear on the map. These co-occurrence data can be of different nature: co-authorship networks, [15], co-citation networks, [23] or co-word networks ( [3], [4]). In what follows, we will focus on these latter in the framework of co-word analysis. In this approach, co-occurrences of terms are indexed in large corpora. A graph structure is then generated, where nodes represent the terms, and strength of links represents their alleged similarity. This similarity measure is computed from cooccurrences data. Higher level structures reflecting domains of science are then derived by analyzing patterns in this graph with clustering methods.
Scientometrics has defined a great number of measures based on co-occurrence data that capture the degree of similarity or proximity between two terms (cf. [9] for a good review). Among others, we can mention two indexes that have been introduced early in scientometrics: the inclusion index nij min(ni,nj ) and the proximity index n 2 ij ni.nj [6]. Here, n i (respectively n j and n ij ) is the number of articles mentioning the term i (respectively j and both i and j).
Further measures where later introduced. However, most of them, by synthesizing the relation between two terms with a single number, fail to convey important information about their use: given two terms i and j, is one more specific or more generic than the other? Is i more specific in the sense that it tends to be used by a sub-community of the community using j?
We assume that the asymmetrical relation between terms is an essential information to get insight into the overall structure of science (fields and subfields). It can be captured by an appropriate choice of proximity measure such that the pseudo-inclusion measure defined over a period T by 1 : P T α (i, j) = ((
) 1/α ) min(α, 1 α ) . This measure has the advantage to convey information about the relative position of two terms
…(Full text truncated)…
This content is AI-processed based on ArXiv data.