A Rejoinder on Energy versus Impact Indicators
Citation distributions are so skewed that using the mean or any other central tendency measure is ill-advised. Unlike G. Prathap’s scalar measures (Energy, Exergy, and Entropy or EEE), the Integrated Impact Indicator (I3) is based on non-parametric statistics using the (100) percentiles of the distribution. Observed values can be tested against expected ones; impact can be qualified at the article level and then aggregated.
💡 Research Summary
The paper by Leydesdorff and Opthof is a methodological rebuttal to G. Prathap’s Energy‑Exergy‑Entropy (EEE) framework for bibliometric evaluation. The authors begin by emphasizing that citation counts are highly skewed, following a long‑tailed distribution that renders central tendency measures such as the mean or median unreliable for assessing scientific impact. They argue that Prathap’s approach, which imports thermodynamic concepts (energy, exergy, entropy) into bibliometrics, relies on averages of Journal Citation Scores (JCS) and Field Citation Scores (FCS). This reliance on averages implicitly assumes the Central Limit Theorem, an assumption that is violated in the presence of extreme skewness. Moreover, the authors contend that the physical equation “Energy – Exergy = Entropy” cannot be directly transferred to citation data because the dimensions of thermodynamic entropy (Watts per Kelvin) do not correspond to any meaningful bibliometric quantity.
In contrast, the Integrated Impact Indicator (I3) proposed by the authors is built on non‑parametric statistics. Each article is assigned a percentile rank (0–100) based on its citation count within a reference set. The I3 value is then calculated as the sum over all percentile classes of the product of the number of papers in each class (xᵢ) and the percentile value (f(xᵢ)): I3 = Σ xᵢ · f(xᵢ). Because the calculation uses the actual shape of the citation distribution, it preserves information that would be lost in an average‑based approach. The step‑function nature of the integral reflects the discrete nature of citation events. Importantly, I3 enables statistical testing against theoretically derived expectations using non‑parametric tests (e.g., Mann‑Whitney, Kruskal‑Wallis), allowing researchers to determine whether observed impact deviates significantly from what would be expected under a null model.
The authors also discuss the limitations of traditional field delineations based on ISI Subject Categories, noting that these classifications are often imprecise and can distort field‑normalized indicators. They propose fractional counting of citations—allocating each citation proportionally to the citing document’s reference list—as a more robust way to adjust for differences in citation potential across fields. This approach aligns with earlier work by Garfield (1979) on citation potential and with more recent normalization techniques advocated by Moed (2010) and Leydesdorff & Bornmann (2011b).
A key part of the argument concerns the “Rates of Averages versus Averages of Rates” debate. The authors criticize the use of the Mean Observed Citation Ratio (MOCR) divided by the Mean Expected Citation Ratio (MECR) to produce a Relative Citation Ratio (RCR), labeling it a mathematically inconsistent quotient that cannot be subjected to significance testing. Instead, they advocate for directly comparing observed and expected values within the I3 framework, which yields a single percentage representing the share of total impact contributed by a given set of papers.
Empirical illustrations are provided: when applying I3 to a group of seven principal investigators at the Academic Medical Center of the University of Amsterdam, the ranking of researchers changes dramatically compared with traditional average‑based metrics. The top‑ranked researcher under the average method falls to fifth place under I3, while the sixth‑ranked researcher becomes the highest‑ranked. This example underscores how average‑based indicators can misrepresent performance, especially in highly skewed datasets.
In summary, Leydesdorff and Opthof make a strong case that bibliometric evaluation should move away from thermodynamic analogies and average‑based indicators toward non‑parametric, percentile‑based measures like I3. Such measures respect the underlying distribution of citations, allow for rigorous statistical testing, and provide a flexible basis for aggregating impact across articles, journals, institutions, countries, or other units of analysis. The paper concludes that I3 offers a more reliable, transparent, and policy‑relevant tool for assessing scientific impact in the presence of skewed citation data.
Comments & Academic Discussion
Loading comments...
Leave a Comment