Computer Science / Databases Physics / physics.data-an Statistics / Applications

Zipfs law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenias research as an example

February 23, 2026

Reading time: 6 minute

...

#Computer Science #Physics #Statistics #Applications #Databases

📝 Original Info

Title: Zipfs law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenias research as an example
ArXiv ID: 1003.1018
Date: 2010-06-08
Authors: Researchers from original ArXiv paper

📝 Abstract

Slovenia's Current Research Information System (SICRIS) currently hosts 86,443 publications with citation data from 8,359 researchers working on the whole plethora of social and natural sciences from 1970 till present. Using these data, we show that the citation distributions derived from individual publications have Zipfian properties in that they can be fitted by a power law $P(x) \sim x^{-\alpha}$, with $\alpha$ between 2.4 and 3.1 depending on the institution and field of research. Distributions of indexes that quantify the success of researchers rather than individual publications, on the other hand, cannot be associated with a power law. We find that for Egghe's g-index and Hirsch's h-index the log-normal form $P(x) \sim \exp[-a\ln x -b(\ln x)^2]$ applies best, with $a$ and $b$ depending moderately on the underlying set of researchers. In special cases, particularly for institutions with a strongly hierarchical constitution and research fields with high self-citation rates, exponential distributions can be observed as well. Both indexes yield distributions with equivalent statistical properties, which is a strong indicator for their consistency and logical connectedness. At the same time, differences in the assessment of citation histories of individual researchers strengthen their importance for properly evaluating the quality and impact of scientific output.

💡 Deep Analysis

Deep Dive into Zipfs law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenias research as an example.

Slovenia’s Current Research Information System (SICRIS) currently hosts 86,443 publications with citation data from 8,359 researchers working on the whole plethora of social and natural sciences from 1970 till present. Using these data, we show that the citation distributions derived from individual publications have Zipfian properties in that they can be fitted by a power law $P(x) \sim x^{-\alpha}$, with $\alpha$ between 2.4 and 3.1 depending on the institution and field of research. Distributions of indexes that quantify the success of researchers rather than individual publications, on the other hand, cannot be associated with a power law. We find that for Egghe’s g-index and Hirsch’s h-index the log-normal form $P(x) \sim \exp[-a\ln x -b(\ln x)^2]$ applies best, with $a$ and $b$ depending moderately on the underlying set of researchers. In special cases, particularly for institutions with a strongly hierarchical constitution and research fields with high self-citation rates, expon

📄 Full Content

arXiv:1003.1018v1 [physics.data-an] 4 Mar 2010 Zipf’s law and log-normal distributions in measures of scientiﬁc output across ﬁelds and institutions: 40 years of Slovenia’s research as an example Matjaˇz Perc∗∗ Department of Physics, Faculty of Natural Sciences and Mathematics, University of Maribor, Koroˇska cesta 160, SI-2000 Maribor, Slovenia Abstract Slovenia’s Current Research Information System (SICRIS) currently hosts 86,443 publications with citation data from 8,359 researchers working on the whole plethora of social and natural sciences from 1970 till present. Using these data, we show that the citation distributions derived from individual publications have Zipﬁan properties in that they can be ﬁtted by a power law P(x) ∼x−α, with α between 2.4 and 3.1 depending on the institution and ﬁeld of research. Distributions of indexes that quantify the success of researchers rather than individual publications, on the other hand, cannot be associated with a power law. We ﬁnd that for Egghe’s g-index and Hirsch’s h-index the log-normal form P(x) ∼exp[−a ln x −b(ln x)2] applies best, with a and b depending moderately on the underlying set of researchers. In special cases, particularly for institutions with a strongly hierarchical constitution and research ﬁelds with high self-citation rates, exponential distributions can be observed as well. Both indexes yield distributions with equivalent statistical properties, which is a strong indicator for their consistency and logical connectedness. At the same time, diﬀerences in the assessment of citation histories of individual researchers strengthen their importance for properly evaluating the quality and impact of scientiﬁc output. Keywords: Zipf’s law, citation distribution, g-index, h-index, ranking 1. Introduction Raking of researchers is both important as well as interesting. While importance is largely due to the determi- nation of advancement and selection criteria that underly faculty recruitments or the awarding of research grants and funds to individuals with best indicators (Garﬁeld, 1983; Adam, 2002; Ventura and Mombr´u, 2006), the fact that it is interesting has many more aspects worth considering. For one, researchers seem to have a keen interest for de- termining who is the most cited or the most connected or the most inﬂuential of them all. Certainly this in part to gratify the personal sense of achievement, but more intricately, there is a lot we don’t yet understand in terms of how and why certain researchers get more attention than others, and why some cannot rise above a given level of recognition. Scientiﬁc excellence is deﬁnitely a crucial factor to consider, yet that alone cannot explain all the fasci- nating properties that have been revealed in recent years with regards to citation distributions (Egghe and Rousseau, 1990; Laherrere and Sornette, 1998; Redner, 1998, 2005; Radicchi et al., 2008; Vieira and Gomes, 2010), indexes that quantify individual scientiﬁc output (Hirsch, 2005; Egghe, 2006, 2008a; Bornmann et al., 2008; Zhang, 2009; Guns and Rousseau, 2009; Cabrerizoa et al., 2010), the importance of ﬁrst-movers (Newman, 2009) and self-citations (Fowler and Aksnes, 2007; Schreiber, 2007, 2008a), or the structure of scientiﬁc collaboration networks (Newman, 2001), to name but a few. Empirical studies are important since they provide fuel for potential attempts at modeling and related theoretical approaches aimed towards deepening our understanding of citation practices, as well as for sharpening criteria and indexes that quantify individual scientiﬁc output. Notably, one fact stands quite solid and has been pointed out on ∗Electronic address: matjaz.perc@uni-mb.si; Homepage: http://www.matjazperc.com/ ∗∗Supplementary tables for this paper are accessible via: http://www.matjazperc.com/sicris/stats.html Preprint submitted to Journal of Informetrics November 1, 2018 several occasions [see e.g. Redner (2005)]. Namely that the more one paper is cited, the more likely it is it will attract further citations in the future. This phenomenon is by now known under diﬀerent names. The Matthew eﬀect (Merton, 1968) is likely the oldest to describe it, but one can come across also cumulative advantage (de Solla Price, 1965, 1976) or preferential attachment (Barab´asi and Albert, 1999), depending on the ﬁeld of research and motivation of the study. Especially linear preferential attachment models enjoy exceptional popularity in describing the growth and setup of complex networks (Albert and Barab´asi, 2002; Dorogovtsev and Mendes, 2003; Pastor-Satorras and Vespignani, 2004) and have become synonymous for power-law distributions of connections that can be observed in many of them (Faloutsos et al., 1999; Sornette, 2003; Newman, 2005; Clauset et al., 2009). There is evidence suggesting that citation statistics may obey to similar rules, yet deviations from the power-law distribution maintain the reasoning open to amendments (Redner, 2005), especially i

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Zipfs law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenias research as an example

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

The Directed Closure Process in Hybrid Social-Information Networks, with an Analysis of Link Formation on Twitter

A tighter constraint on Earth-system sensitivity from long-term temperature and carbon-cycle observations

An adsorbed gas estimation model for shale gas reservoirs via statistical learning

Start searching

No results found