The decline in the concentration of citations, 1900-2007

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper challenges recent research (Evans, 2008) reporting that the concentration of cited scientific literature increases with the online availability of articles and journals. Using Thomson Reuters’ Web of Science, the present paper analyses changes in the concentration of citations received (two- and five-year citation windows) by papers published between 1900 and 2005. Three measures of concentration are used: the percentage of papers that received at least one citation (cited papers); the percentage of papers needed to account for 20, 50 and 80 percent of the citations; and, the Herfindahl-Hirschman index. These measures are used for four broad disciplines: natural sciences and engineering, medical fields, social sciences, and the humanities. All these measures converge and show that, contrary to what was reported by Evans, the dispersion of citations is actually increasing.

💡 Research Summary

The paper conducts a comprehensive longitudinal analysis of citation concentration from 1900 to 2005, directly challenging the claim made by Evans (2008) that the rise of online availability has intensified the concentration of citations on a relatively small set of papers. Using the Thomson Reuters Web of Science database, the authors extracted every indexed article published within the 106‑year window and computed citation counts for two‑year and five‑year windows after publication. Three complementary metrics were employed to capture different aspects of concentration: (1) the proportion of papers that received at least one citation (“cited papers”), (2) the cumulative share of citations accounted for by the top 20 %, 50 %, and 80 % of papers, and (3) the Herfindahl‑Hirschman Index (HHI), which quantifies the evenness of the citation distribution (lower values indicate greater dispersion).

The dataset was stratified into four broad disciplinary clusters—natural sciences and engineering, medical fields, social sciences, and the humanities—to examine whether trends differed across knowledge domains. For each discipline and each year, the three metrics were calculated and plotted as time series. The results are strikingly consistent across all fields: the share of papers that are cited at least once rose dramatically (from roughly 30 % in the early 1900s to about 70 % by the early 2000s), the proportion of papers needed to capture 80 % of all citations increased from around 10 % to over 20 %, and the HHI fell from about 0.03 to 0.01. In other words, citations have become progressively more evenly distributed among a larger pool of publications.

These findings directly contradict Evans’s hypothesis that online accessibility creates a “rich‑get‑richer” effect in scholarly referencing. The authors argue that the expansion of digital repositories, search engines, and open‑access platforms actually broadens the effective literature pool, enabling researchers to discover and cite a wider variety of works, including newer studies, regional journals, and non‑English sources. Consequently, the citation landscape has shifted from a relatively concentrated pattern toward a more dispersed one.

Methodologically, the study’s strengths lie in its use of a massive, continuous bibliometric record, the application of multiple concentration measures that converge on the same conclusion, and the discipline‑specific breakdown that guards against overgeneralization. However, the authors acknowledge several limitations. First, Web of Science is known to be biased toward English‑language and science‑technology journals, which may underrepresent citation practices in the humanities and some social sciences. Second, the choice of two‑ and five‑year citation windows captures short‑ to medium‑term impact but may miss long‑term citation dynamics characteristic of classic works. Third, the analysis does not directly incorporate variables that quantify the degree of online availability (e.g., open‑access percentages, repository coverage), leaving the causal pathway between digital access and citation dispersion inferred rather than explicitly modeled.

In conclusion, the paper provides robust empirical evidence that the dispersion of citations has increased over the past century, even as the scholarly ecosystem has become increasingly digital. This challenges the notion that online access intensifies citation concentration and suggests that evaluation metrics relying on a small set of highly cited papers may need to be recalibrated for the modern, more distributed citation environment. Future research should integrate more granular measures of digital accessibility, consider longer citation windows, and explore the role of algorithmic recommendation systems in shaping citation behavior.

The decline in the concentration of citations, 1900-2007

💡 Research Summary

Comments & Academic Discussion

Leave a Comment