Caveats for the journal and field normalizations in the CWTS ("Leiden") evaluations of research performance

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The Center for Science and Technology Studies at Leiden University advocates the use of specific normalizations for assessing research performance with reference to a world average. The Journal Citation Score (JCS) and Field Citation Score (FCS) are averaged for the research group or individual researcher under study, and then these values are used as denominators of the (mean) Citations per publication (CPP). Thus, this normalization is based on dividing two averages. This procedure only generates a legitimate indicator in the case of underlying normal distributions. Given the skewed distributions under study, one should average the observed versus expected values which are to be divided first for each publication. We show the effects of the Leiden normalization for a recent evaluation where we happened to have access to the underlying data.

💡 Research Summary

The paper provides a rigorous statistical critique of the normalization procedures employed by the Centre for Science and Technology Studies (CWTS) at Leiden University, commonly referred to as the “Leiden” or “CWTS” evaluation methodology. In the Leiden approach, the performance of a researcher or research group is assessed by first calculating the mean number of citations per publication (CPP) and then dividing this figure by the average Journal Citation Score (JCS) and the average Field Citation Score (FCS) of the journals and fields in which the publications appear. In effect, the method computes a ratio of two averages: (mean CPP) / (mean JCS) or (mean CPP) / (mean FCS).

The authors argue that this operation is statistically sound only when the underlying distributions are normal (symmetrical, with finite variance). Citation data, however, are notoriously skewed: a small minority of papers receive a very large number of citations while the majority receive few or none, producing a heavy‑tailed, right‑skewed distribution. In such circumstances, the arithmetic mean is heavily influenced by outliers, and the ratio of two means does not represent the average of the individual citation‑to‑expected ratios. In other words, “the average of ratios is not equal to the ratio of averages.”

To address this flaw, the authors propose an alternative normalization that respects the distributional properties of citation data. For each individual publication, one should compute the observed citation count divided by its expected citation count (the expected value being the JCS or FCS appropriate to that journal and field). These per‑paper ratios are then averaged across all publications of the unit under evaluation. This approach yields the mean of the observed‑over‑expected ratios, which is a legitimate indicator even under highly skewed distributions because it does not rely on the stability of the arithmetic mean of the denominator.

The paper substantiates the theoretical argument with an empirical case study. The authors obtained the raw citation data used in a recent Leiden evaluation (the exact evaluation is not named, but the data were accessible to the authors). They applied both the traditional Leiden normalization (ratio of averages) and their proposed per‑paper ratio averaging. The results demonstrate substantial differences. Under the traditional method, publications in a few high‑impact journals inflate the overall normalized score, sometimes pushing the final indicator well above the world average (e.g., 1.2 × world average). By contrast, the per‑paper ratio method produces scores that cluster much more tightly around 1.0, with values ranging roughly between 0.95 and 1.05 for the same set of publications. This indicates that the traditional method over‑rewards researchers who publish in a small number of highly cited venues and under‑rewards those whose work appears in lower‑cited journals, even when the latter’s papers perform well relative to their field expectations.

Beyond the technical correction, the authors discuss broader implications for research assessment policy. The Leiden “world average” benchmark treats all fields as if they share a common citation culture, ignoring well‑documented differences in citation practices across disciplines (e.g., faster citation accumulation in biomedical sciences versus slower accrual in humanities). By normalizing at the level of individual papers, the proposed method inherently adjusts for these field‑specific dynamics, because each paper’s expected citation count is already field‑ and journal‑specific.

The paper concludes with several recommendations: (1) CWTS should replace the current ratio‑of‑averages formula with the mean‑of‑ratios approach; (2) evaluation reports should present both the per‑paper ratio average and a distributional summary (e.g., median, inter‑quartile range) to convey the skewness of the underlying data; (3) the “world average” reference point should be re‑defined on a field‑by‑field basis rather than as a single global figure; and (4) future methodological research should explore robust statistical alternatives (e.g., percentile‑based indicators) that are less sensitive to extreme values.

Overall, the paper convincingly demonstrates that the Leiden normalization, as currently practiced, rests on a flawed statistical premise when applied to citation data. By adopting a per‑publication observed‑over‑expected averaging scheme, evaluators can obtain a more accurate, fair, and field‑sensitive measure of research performance, thereby improving the credibility of bibliometric assessments used for funding decisions, hiring, and policy formulation.

Caveats for the journal and field normalizations in the CWTS ("Leiden") evaluations of research performance

💡 Research Summary

Comments & Academic Discussion

Leave a Comment