Testing the fairness of citation indicators for comparison across scientific domains: the case of fractional citation counts

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Citation numbers are extensively used for assessing the quality of scientific research. The use of raw citation counts is generally misleading, especially when applied to cross-disciplinary comparisons, since the average number of citations received is strongly dependent on the scientific discipline of reference of the paper. Measuring and eliminating biases in citation patterns is crucial for a fair use of citation numbers. Several numerical indicators have been introduced with this aim, but so far a specific statistical test for estimating the fairness of these numerical indicators has not been developed. Here we present a statistical method aimed at estimating the effectiveness of numerical indicators in the suppression of citation biases. The method is simple to implement and can be easily generalized to various scenarios. As a practical example we test, in a controlled case, the fairness of fractional citation count, which has been recently proposed as a tool for cross-discipline comparison. We show that this indicator is not able to remove biases in citation patterns and performs much worse than the rescaling of citation counts with average values.

💡 Research Summary

The paper addresses a fundamental problem in bibliometrics: raw citation counts are heavily biased by scientific discipline, making cross‑disciplinary comparisons unreliable. While many normalization schemes have been proposed, there has been no systematic statistical test to assess whether a given indicator truly eliminates disciplinary bias. Radicchi and Castellano therefore develop a simple, generalizable method for testing the “fairness” of citation indicators and apply it to evaluate the recently introduced fractional citation count (FCC) against a more established rescaled citation count (RCC).

Fairness is defined as independence of the indicator from the discipline of the paper; a fair indicator should produce a distribution of values that matches the underlying proportion of papers in each discipline. To test this, the authors use a two‑step selection process: (1) randomly sample a fixed fraction of papers from each discipline, preserving the overall disciplinary composition; (2) compute the indicator for the sampled papers and compare the observed frequency of indicator values with the expected frequency under the null hypothesis of no bias. The comparison is performed using a chi‑square‑type statistic; if the statistic falls below a pre‑determined threshold, the indicator is deemed unbiased.

The empirical test is carried out on a comprehensive dataset of 307,992 APS (American Physical Society) papers published between 1985 and 2009. Each paper is classified by its principal PACS code, which groups papers into ten broad physics sub‑fields. Citation data are obtained from the Web of Science, covering both citations from APS journals and from 139 additional journals, thereby accounting for roughly 74 % of all citations received by the APS papers.

Two normalization schemes are examined. The FCC, proposed by Leydesdorff and Opthof, weights each incoming citation by 1/n, where n is the number of references in the citing paper. The underlying assumption is that disciplines differ mainly in typical reference‑list length, so dividing by n should automatically normalize across fields without requiring explicit field classification. The RCC, introduced by Radicchi et al., rescales a paper’s raw citation count c by the average citation count c₀ of papers published in the same discipline and year, yielding a relative indicator c_f = c/c₀. This method explicitly uses field and year information but is straightforward to compute.

Applying the fairness test, the RCC passes: the distribution of c_f values across the ten PACS groups matches the expected unbiased distribution, and the chi‑square statistic is well below the significance threshold. In contrast, the FCC fails dramatically. Even after fractional weighting, substantial differences remain among fields, especially between mathematics‑heavy and biology‑heavy sub‑disciplines where reference‑list lengths differ most. The FCC’s chi‑square statistic exceeds the critical value by a large margin, indicating that the indicator retains a strong disciplinary bias.

The authors emphasize that their notion of fairness assumes all disciplines contribute equally to scientific progress; alternative definitions could assign different weights to fields, but such variations are outside the scope of this work. They also note that the FCC’s main advantage—no need for explicit field classification—does not compensate for its inability to remove bias, while the RCC’s reliance on accurate field/year categorization is mitigated by the high quality of PACS codes in the APS dataset.

In conclusion, the paper provides a practical statistical framework for evaluating citation‑normalization methods and demonstrates that the fractional citation count, despite its conceptual appeal, does not achieve discipline‑independent fairness. The rescaled citation count, based on dividing by field‑year averages, remains the more effective tool for cross‑disciplinary bibliometric comparisons. This work offers a template for future assessments of bibliometric indicators and informs policymakers and research evaluators seeking unbiased metrics.

Testing the fairness of citation indicators for comparison across scientific domains: the case of fractional citation counts

💡 Research Summary

Comments & Academic Discussion

Leave a Comment