Zipfs law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenias research as an example

Reading time: 6 minute
...

📝 Original Info

  • Title: Zipfs law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenias research as an example
  • ArXiv ID: 1003.1018
  • Date: 2010-06-08
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Slovenia's Current Research Information System (SICRIS) currently hosts 86,443 publications with citation data from 8,359 researchers working on the whole plethora of social and natural sciences from 1970 till present. Using these data, we show that the citation distributions derived from individual publications have Zipfian properties in that they can be fitted by a power law $P(x) \sim x^{-\alpha}$, with $\alpha$ between 2.4 and 3.1 depending on the institution and field of research. Distributions of indexes that quantify the success of researchers rather than individual publications, on the other hand, cannot be associated with a power law. We find that for Egghe's g-index and Hirsch's h-index the log-normal form $P(x) \sim \exp[-a\ln x -b(\ln x)^2]$ applies best, with $a$ and $b$ depending moderately on the underlying set of researchers. In special cases, particularly for institutions with a strongly hierarchical constitution and research fields with high self-citation rates, exponential distributions can be observed as well. Both indexes yield distributions with equivalent statistical properties, which is a strong indicator for their consistency and logical connectedness. At the same time, differences in the assessment of citation histories of individual researchers strengthen their importance for properly evaluating the quality and impact of scientific output.

💡 Deep Analysis

Deep Dive into Zipfs law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenias research as an example.

Slovenia’s Current Research Information System (SICRIS) currently hosts 86,443 publications with citation data from 8,359 researchers working on the whole plethora of social and natural sciences from 1970 till present. Using these data, we show that the citation distributions derived from individual publications have Zipfian properties in that they can be fitted by a power law $P(x) \sim x^{-\alpha}$, with $\alpha$ between 2.4 and 3.1 depending on the institution and field of research. Distributions of indexes that quantify the success of researchers rather than individual publications, on the other hand, cannot be associated with a power law. We find that for Egghe’s g-index and Hirsch’s h-index the log-normal form $P(x) \sim \exp[-a\ln x -b(\ln x)^2]$ applies best, with $a$ and $b$ depending moderately on the underlying set of researchers. In special cases, particularly for institutions with a strongly hierarchical constitution and research fields with high self-citation rates, expon

📄 Full Content

arXiv:1003.1018v1 [physics.data-an] 4 Mar 2010 Zipf’s law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenia’s research as an example Matjaˇz Perc∗∗ Department of Physics, Faculty of Natural Sciences and Mathematics, University of Maribor, Koroˇska cesta 160, SI-2000 Maribor, Slovenia Abstract Slovenia’s Current Research Information System (SICRIS) currently hosts 86,443 publications with citation data from 8,359 researchers working on the whole plethora of social and natural sciences from 1970 till present. Using these data, we show that the citation distributions derived from individual publications have Zipfian properties in that they can be fitted by a power law P(x) ∼x−α, with α between 2.4 and 3.1 depending on the institution and field of research. Distributions of indexes that quantify the success of researchers rather than individual publications, on the other hand, cannot be associated with a power law. We find that for Egghe’s g-index and Hirsch’s h-index the log-normal form P(x) ∼exp[−a ln x −b(ln x)2] applies best, with a and b depending moderately on the underlying set of researchers. In special cases, particularly for institutions with a strongly hierarchical constitution and research fields with high self-citation rates, exponential distributions can be observed as well. Both indexes yield distributions with equivalent statistical properties, which is a strong indicator for their consistency and logical connectedness. At the same time, differences in the assessment of citation histories of individual researchers strengthen their importance for properly evaluating the quality and impact of scientific output. Keywords: Zipf’s law, citation distribution, g-index, h-index, ranking 1. Introduction Raking of researchers is both important as well as interesting. While importance is largely due to the determi- nation of advancement and selection criteria that underly faculty recruitments or the awarding of research grants and funds to individuals with best indicators (Garfield, 1983; Adam, 2002; Ventura and Mombr´u, 2006), the fact that it is interesting has many more aspects worth considering. For one, researchers seem to have a keen interest for de- termining who is the most cited or the most connected or the most influential of them all. Certainly this in part to gratify the personal sense of achievement, but more intricately, there is a lot we don’t yet understand in terms of how and why certain researchers get more attention than others, and why some cannot rise above a given level of recognition. Scientific excellence is definitely a crucial factor to consider, yet that alone cannot explain all the fasci- nating properties that have been revealed in recent years with regards to citation distributions (Egghe and Rousseau, 1990; Laherrere and Sornette, 1998; Redner, 1998, 2005; Radicchi et al., 2008; Vieira and Gomes, 2010), indexes that quantify individual scientific output (Hirsch, 2005; Egghe, 2006, 2008a; Bornmann et al., 2008; Zhang, 2009; Guns and Rousseau, 2009; Cabrerizoa et al., 2010), the importance of first-movers (Newman, 2009) and self-citations (Fowler and Aksnes, 2007; Schreiber, 2007, 2008a), or the structure of scientific collaboration networks (Newman, 2001), to name but a few. Empirical studies are important since they provide fuel for potential attempts at modeling and related theoretical approaches aimed towards deepening our understanding of citation practices, as well as for sharpening criteria and indexes that quantify individual scientific output. Notably, one fact stands quite solid and has been pointed out on ∗Electronic address: matjaz.perc@uni-mb.si; Homepage: http://www.matjazperc.com/ ∗∗Supplementary tables for this paper are accessible via: http://www.matjazperc.com/sicris/stats.html Preprint submitted to Journal of Informetrics November 1, 2018 several occasions [see e.g. Redner (2005)]. Namely that the more one paper is cited, the more likely it is it will attract further citations in the future. This phenomenon is by now known under different names. The Matthew effect (Merton, 1968) is likely the oldest to describe it, but one can come across also cumulative advantage (de Solla Price, 1965, 1976) or preferential attachment (Barab´asi and Albert, 1999), depending on the field of research and motivation of the study. Especially linear preferential attachment models enjoy exceptional popularity in describing the growth and setup of complex networks (Albert and Barab´asi, 2002; Dorogovtsev and Mendes, 2003; Pastor-Satorras and Vespignani, 2004) and have become synonymous for power-law distributions of connections that can be observed in many of them (Faloutsos et al., 1999; Sornette, 2003; Newman, 2005; Clauset et al., 2009). There is evidence suggesting that citation statistics may obey to similar rules, yet deviations from the power-law distribution maintain the reasoning open to amendments (Redner, 2005), especially i

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut