Identification and Characterisation of Technological Topics in the Field of Molecular Biology

Identification and Characterisation of Technological Topics in the Field of Molecular Biology Ivana Roche* INIST-CNRS, 2 allée du Parc d e Brabois, CS 10310, 54519 Van doeuvre-les-Nancy, France, Tel.: +33 (0) 383504600, Fax: +33 (0) 383504650, ivana.roche@inist.fr Dominique Besagni, Claire François INIST-CNRS , 2 allée du Par c de Brabois, CS 10310, 54519 Vandoeuvre-les-Nan cy, France, Tel.: +33 (0) 383504600, Fax: +33 (0) 383504650, dominique. besagni@inist .fr, claire.franc ois@inist .fr Marianne Hörlesberger, Edgar Schiebel Austrian Research Centers GmbH, Tech Gate Vienna, Donau-City-Straße 1, 1220 Wien, Austria, Tel.: +43 (0) 50 550-4524, Fax : +43 (0) 50 550-4500, marianne.hoerlesberger@arcs.ac.at , edgar.schiebel@arcs.ac.at * Corresponding author Theme: Primary: S&T indicators for the identificatio n of em erging fields (Them e 2); Secondary : Visualisation and Science Mapping: tools, methods and applications (Them e 5). Keywords: emerging technologies, bibliometric indicat ors, term inology evolution, diffusion model, diachronic clustering . 1 Background This paper focuses on methodological approaches for characterising the specific topics within a technological field based on scientific literature data. We introduce a diachronic clustering analysis approach and some bibliometric indicators. The results are visualised with the software-tool Stanalyst® [1 ]. We are applying our methods to the field “Molecular Biology”. This f ield has grown a great deal in the last decade. 2 Problem / application How can we identify a nd characterise import ant topics in a set of several thousand articles? Which technological aspects can be detected? Which of them are already established an d which of the m are new? How a re the topics l inked to each ot her? We are trying to a nswer these questi ons by a pplying ou r bibliom etric analysi s methods t o a set of scientific literature data recorded in the bibliographic database PASCAL. 3 Methodology The data for our study were extracted fro m PASCAL database in function of their classification categories and keywords. The diffusion model identifies three categories of terms: the established ter ms, terms unusual in this topic, and cross section terms. We apply the indicator TFIDF (text frequency inverse document frequency) adapted to our research question and the GINI coefficient, a m easure of statistical dispersion most prominently used as a measure of inequality of income distribution or inequality of wealth distribution. The diachronic cluster analysis is reali zed with the help of a clustering tool applying first the axial K-means method to produce a non-hierarchical clustering algorithm based on the neuronal formalism of Kohonen’s self-organizing maps and then a principal component analysis to map the obtained clusters. This tool is implemented in the information analysis platform Stanalyst®. Considering two successive time periods, our diachronic a pproach enables us to follow the field time evolution by analysing the tw o obtained cluster sets and maps. The applied methods are linked together as the following table shows. Diff usi on Mo de l Unu su al T er m s E s ta bli sh e d Te rms C ros s Se ctio n Term s Diac hr on ic Cl uste rs in th e F irs t Period Cl ust er Ana l ysis Cl uste rs in t he S econ d Period ne w t er ms te rms wit h root s in th e f irs t pe rio d t er ms wi th ro o t s in t he firs t pe ri od 4 Outcome/findings/results The diachronic cluster analysis is made for two periods and shows which topics of the second period have roots in the fi rst one and which topics are new in the second period. On the map of both periods, as we can see in the above figure, it is possible to singularize an interesting dichotomous configuration represented by two very strongly connected cluster networks associa ting about two thirds of the clusters. However, these networks have quite diffe rent characteristics. On the one hand, the themes present in the biggest one are related to the modelling and simulation of biological phenomena. In the other hand, the little network is very homogeneous and deals essentially with instrumentation topics. The remaining clusters are scattered in the map with no significant links with the two above described networks. Concerning the biggest network, located in the left side of the fi rst period map and in the right side of the second period map, we can observe tha t structurally it remains the same with two sub-networ ks brought together by the cluster “Computerized simulation”. But its content analysis allows us to find some interesting thematic evolutions. In the fi rst period, its content is focused on both the description of physical characteristics of biological structures and the theoretical aspects related to the physiological process model ling using, for example, neural networks. In the second pe riod, we can detect a refocusing of the couple formed by the clusters “Neural networks” and “Stochastic processes” and an impressive densification of the resulting sub-network with the new associated cluster “Brownian motion“. The smallest network, located in the right side of the first period map, deals with instrumentation techniques applied to me asure and therapeutic issues. On the second period map, the network is located in the left side and shows a significant stability despite a reorganization of cluster contents. 1st period clu ster map 2 nd period cluster map 5 Conclusion “Molecular Biology” is a broad field. By applying the complementary methods presented here, it can be characterised and described presenting different views of the features of the field. On the ot her hand, the two methods point out the relationship of the different topics in our investigated field and their evolution. 6 References: FERBER, R. (2003). Informatio n Retrieval, Suchm odelle und Data-Mining-Verfahren für Textsammlungen und das Web, Heidelberg: dpunkt. POLANCO, X.; FRNAÇOIS, C.; ROYAUTE, J. ; BE SAGNI, D.; ROCHE, I. (2001). Stanalyst®: An Integrated Environment for Clustering and Mapping Analysis on Science and Technology, In: Proceedings of the 8th ISSI, S ydney, July 16-20, 2001. ROBERTSON, S. (2004). Understanding Inverse Docu ment Frequency: On theoretical arguments for IDF, Journal of Documentation 60 no. 5, 503- 520. SCHIEBEL, E.; HÖRLESBER GER, M. (2007). About the Identification of Tech nology Specific Keywords in Emerging Technologi es: The Case of "Magnetoelec-tronic". Torres-Salinas, D., Moed, H. F. (Eds), Pr oceedi ngs of ISSI 2007, 11th International Conference of the Inter-n ational Society for Scientometrics and Informetrics, June 25th- 27th, Ma drid, 691-6 9. SPÄRCK, J. K.; ROBERTSON, S. (2006). Inverse Document Frequency - The ID F page. Retrieved November 22, 2006 from: http://www. soi.city.ac.uk/~ser/idf.html . VAN RIJSBERGEN, C.J. (1979). Informati on Retrieval, London: Butterworths.

Identification and Characterisation of Technological Topics in the Field of Molecular Biology

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment