Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on Web of Science data

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

Using percentile shares, one can visualize and analyze the skewness in bibliometric data across disciplines and over time. The resulting figures can be intuitively interpreted and are more suitable for detailed analysis of the effects of independent and control variables on distributions than regression analysis. We show this by using percentile shares to analyze so-called “factors influencing citation impact” (FICs; e.g., the impact factor of the publishing journal) across year and disciplines. All articles (n= 2,961,789) covered by WoS in 1990 (n= 637,301), 2000 (n= 919,485), and 2010 (n= 1,405,003) are used. In 2010, nearly half of the citation impact is accounted for by the 10% most-frequently cited papers; the skewness is largest in the humanities (68.5% in the top-10% layer) and lowest in agricultural sciences (40.6%). The comparison of the effects of the different FICs (the number of cited references, number of authors, number of pages, and JIF) on citation impact shows that JIF has indeed the strongest correlations with the citation scores. However, the correlation between FICs and citation impact is lower, if citations are normalized instead of using raw citation counts.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Accepted for publication in the Journal of Informetrics

Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on Web of Science data

Lutz Bornmann* & Loet Leydesdorff**

Division for Science and Innovation Studies Administrative Headquarters of the Max Planck Society Hofgartenstr. 8, 80539 Munich, Germany. E-mail: bornmann@gv.mpg.de

** Amsterdam School of Communication Research (ASCoR) University of Amsterdam PO Box 15793 1001 NG Amsterdam, The Netherlands E-mail: loet@leydesdorff.net

2 Abstract Using percentile shares, one can visualize and analyze the skewness in bibliometric data across disciplines and over time. The resulting figures can be intuitively interpreted and are more suitable for detailed analysis of the effects of independent and control variables on distributions than regression analysis. We show this by using percentile shares to analyze so-called “factors influencing citation impact” (FICs; e.g., the impact factor of the publishing journal) across year and disciplines. All articles (n= 2,961,789) covered by WoS in 1990 (n= 637,301), 2000 (n= 919,485), and 2010 (n= 1,405,003) are used. In 2010, nearly half of the citation impact is accounted for by the 10% most-frequently cited papers; the skewness is largest in the humanities (68.5% in the top-10% layer) and lowest in agricultural sciences (40.6%). The comparison of the effects of the different FICs (the number of cited references, number of authors, number of pages, and JIF) on citation impact shows that JIF has indeed the strongest correlations with the citation scores. However, the correlation between FICs and citation impact is lower, if citations are normalized instead of using raw citation counts.

Key words Citation impact; factors influencing citations; percentile shares; impact factors; normalization

3 1 Introduction van Raan (2014) listed the skewness of citation data as one of several methodological problems in citation analysis. The skewness of bibliometric data has been a topic in this field since its beginnings in the 1920s. The issue is associated with the “laws” of Alfred Lotka, Samuel Bradford, and George Zipf: “the concentration of items on a relatively small stratum of sources” (de Bellis, 2009, p. xxiv). Since then a large number of papers have appeared demonstrating the skewness of citation data. Seglen (1992), for example, argued that “50% of the citations and the most cited half of the articles account for nearly 90% of the citations” (p. 628). He concluded that citation distributions follow approximately an inverse power-law distribution (the number of citations larger than x is proportional to -log(x) (Katz, 2000). Albarrán and Ruiz-Castillo (2011) showed empirically that the “existence of a power law cannot be rejected in ALL SCIENCES taken together as well as in 17 of 22 fields whose articles represent 74.7% of the total” (p. 48). Using a replication and scale invariant technique – the Characteristic Scores and Scales (CSS) (Glänzel, 2011) – the results of Albarrán, Crespo, Ortuño, and Ruiz-Castillo (2011) show that citation distributions are highly skewed: „the mean is 20 points above the median, while 9–10% of all articles in the upper tail account for about 44% of all citations“ (p. 385). In this study, we analyze the skewness of citation impact data in six major disciplines (natural sciences, engineering and technology, medical and health sciences, agricultural sciences, social sciences, and humanities) based on all articles in Web of Science (WoS) published in 1990, 2000, and 2010. First, we use percentile shares – a recently introduced visualization and analysis technique – to quantify the proportions of total citation impact that go into different groups (e.g., the 10% most-frequently-cited papers). Percentile shares can be intuitively and appealingly

4 interpreted and are especially suitable “for the detailed analysis of distributional changes” (Jann, 2016, p. 3). In a next step, we use percentile shares to analyze covariates of the citation distributions. Journal Impact Factors (JIF) are often used as proxies for the citation impact of papers published in the respective journals. Are JIFs indeed a factor influencing citation impact? We show the advantages of using percentile shares in the case of a number of co-variates of citation scores indicated in the literature as “factors influencing citation impact” (e.g., the number of authors, see Bornmann & Daniel, 2008). We compare the association of JIFs as co-variates with citation scores at the level of individual papers with other FICs mentioned in the literature, such as number of co-authors, the numbers of pages, and the number of cited references. How much does each co-variate enhance the likelihood of being cited in the top-10% layer of citation scores? Finally, we address the question of whether norm

View Original ArXiv

This content is AI-processed based on ArXiv data.

Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on Web of Science data

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found