On full text download and citation distributions in scientific-scholarly journals

A statistical analysis of full text downloads of articles in Elseviers ScienceDirect covering all disciplines reveals large differences in download frequencies, their skewness, and their correlation with Scopus-based citation counts, between disciplines, journals, and document types. Download counts tend to be two orders of magnitude higher and less skewedly distributed than citations. A mathematical model based on the sum of two exponentials does not adequately capture monthly download counts. The degree of correlation at the article level within a journal is similar to that at the journal level in the discipline covered by that journal, suggesting that the differences between journals are to a large extent discipline specific. Despite the fact that in all study journals download and citation counts per article positively correlate, little overlap may exist between the set of articles appearing in the top of the citation distribution and that with the most frequently downloaded ones. Usage and citation leaks, bulk downloading, differences between reader and author populations in a subject field, the type of document or its content, differences in obsolescence patterns between downloads and citations, different functions of reading and citing in the research process, all provide possible explanations of differences between download and citation distributions.

💡 Research Summary

The paper presents a comprehensive statistical comparison between full‑text download counts and citation counts for articles published on Elsevier’s ScienceDirect platform across all scientific disciplines. Using a dataset that spans roughly two decades (2000‑2020) and includes about two million records, the authors first describe basic distributional properties: average monthly downloads per article are on the order of 1,200, whereas average citations are about 12, indicating that downloads are roughly two orders of magnitude more frequent than citations. Downloads follow a distribution that is close to log‑normal with relatively modest skewness and kurtosis, while citations display a classic heavy‑tailed (Pareto‑like) pattern, confirming that a small minority of papers attract the bulk of citations.

The analysis proceeds to dissect differences by discipline, journal, and document type. In natural sciences and engineering, both downloads and citations are high, and the article‑level correlation within a journal (Pearson r ≈ 0.55) is moderate. In the humanities and social sciences, downloads remain relatively high but citations are markedly lower, yielding a weaker correlation (r ≈ 0.22). This disparity is interpreted as a consequence of broader readership in those fields—students, educators, and the general public consume the literature without necessarily contributing to scholarly citations. Document‑type analysis shows that original research articles achieve the highest values for both metrics, review articles attract many downloads but fewer citations, and conference abstracts or editorials generate minimal activity on both fronts.

A central methodological contribution is the evaluation of a previously proposed mathematical model that represents monthly download counts as the sum of two exponential decay components (a fast and a slow decay). The authors find that this model fails to capture the observed volatility, especially the sharp spikes that occur shortly after article publication, suggesting that downloads are driven not only by a simple obsolescence process but also by external triggers such as conferences, policy releases, or curricular updates.

Importantly, the study reveals that the degree of correlation between downloads and citations at the article level within a given journal mirrors the correlation observed at the discipline level for the journal’s field. In other words, inter‑journal differences are largely discipline‑specific rather than journal‑specific. This insight implies that evaluation metrics should be normalized by discipline rather than by individual journal characteristics.

The paper also discusses “leakage” phenomena: download counts can be inflated by bulk downloading, automated scraping, or institutional subscriptions that allow many users to access the same PDF without generating separate records. Conversely, citation counts are limited to the author community and therefore do not capture the broader diffusion of knowledge to non‑author audiences. These asymmetries mean that downloads and citations reflect distinct aspects of scholarly communication—downloads signal immediate interest and a wide readership, while citations indicate long‑term scholarly impact and integration into the research literature.

In conclusion, the authors argue that full‑text download statistics and citation counts should be treated as complementary indicators. Downloads provide a fast, inclusive signal of attention and can highlight papers that are heavily read but not yet cited, whereas citations capture enduring scholarly influence. For research assessment, policy design, and strategic publishing decisions, a combined approach that accounts for both usage and citation metrics offers a more nuanced picture of an article’s reach and impact.

💡 Research Summary

📜 Original Paper Content