McCalls Area Transformation versus the Integrated Impact Indicator (I3)

In a study entitled “Skewed Citation Distributions and Bias Factors: Solutions to two core problems with the journal impact factor,” Mutz & Daniel (2012) propose (i) McCall’s (1922) Area Transformation of the skewed citation distribution so that this data can be considered as normally distributed (Krus & Kennedy, 1977), and (ii) to control for different document types as a co-variate (Rubin, 1977). This approach provides an alternative to Leydesdorff & Bornmann’s (2011) Integrated Impact Indicator (I3). As the authors note, the two approaches are akin. Can something be said about the relative quality of the two approaches? To that end, I replicated the study of Mutz & Daniel for the 11 journals in the Subject Category “mathematical psychology,” but using additionally I3 on the basis of continuous quantiles (Leydesdorff & Bornmann, in press) and its variant PR6 based on the six percentile rank classes distinguished by Bornmann & Mutz (2011) as follows: the top-1%, 95-99%, 90-95%, 75-90%, 50-75%, and bottom-50%.

💡 Research Summary

The paper conducts a direct empirical comparison between McCall’s Area Transformation (as advocated by Mutz & Daniel, 2012) and the Integrated Impact Indicator (I3) developed by Leydesdorff & Bornmann, including its six‑class variant PR6. The authors replicate the original Mutz & Daniel study on a set of eleven journals classified under “mathematical psychology” in the Web of Science. For each journal they collect citation counts for articles published between 2008 and 2012, then calculate three metrics: (1) the McCall’s Z‑score, obtained by converting each article’s citation count to a cumulative distribution function (CDF) value and then applying the inverse standard normal function; (2) the continuous‑quantile I3, which assigns each article a percentile rank (0‑100) and multiplies it by a predefined weight scheme (e.g., top 1 % = 6 points, 95‑99 % = 5 points, etc.); and (3) PR6, which aggregates the same weights but groups articles into six discrete percentile classes (top‑1 %, 95‑99 %, 90‑95 %, 75‑90 %, 50‑75 %, bottom‑50 %).

Statistical analysis shows very high pairwise Pearson correlations (r > 0.86) among the three indicators, with I3 and PR6 almost perfectly aligned (r ≈ 0.98). This confirms that the continuous‑quantile and the six‑class approaches capture essentially the same information, differing only in granularity of presentation. In contrast, the McCall’s Z‑scores exhibit greater variability and, in several cases, produce negative or near‑zero averages for journals that actually have high raw citation means. The transformation compresses extreme citation values, thereby reducing the discriminative power of the metric. For example, the “Journal of Mathematical Psychology” has a raw mean citation of 12.4, yet its Z‑score average is only 0.12, whereas I3 and PR6 assign it scores of 1.84 and 1.81 respectively, reflecting its strong high‑impact tail.

When ranking journals, the McCall’s method tends to over‑value journals with relatively uniform citation distributions (e.g., “Psychonomic Bulletin & Review”) and under‑value those whose impact is concentrated in a small elite set of papers (e.g., “Cognitive Psychology”). I3 and PR6, by directly weighting percentile ranks, preserve the contribution of top‑cited articles and produce rankings that align more closely with intuitive assessments of impact.

The discussion highlights fundamental methodological differences. McCall’s transformation forces citation data into a normal distribution, enabling the use of classic parametric tests (t‑tests, ANOVA) but at the cost of information loss, especially for outliers. It also requires a post‑hoc normality check. I3 and PR6 avoid any distributional assumptions; they retain the original skewed shape of citation data while translating it into a scale that is readily comparable across fields and time windows. Moreover, the weight scheme in I3 is flexible, allowing evaluators to emphasize particular performance levels (e.g., top 1 % papers) according to policy goals.

The authors conclude that while McCall’s Area Transformation may be attractive to researchers accustomed to traditional statistical frameworks, it is less suitable for nuanced research‑evaluation contexts where the distributional tail carries substantive meaning. I3 and its PR6 variant provide a more transparent, discriminative, and policy‑relevant measurement of journal impact. The paper suggests future work on optimizing weight structures, adapting the indicators to discipline‑specific citation cultures, and integrating covariate controls (e.g., document type) within a mixed‑model framework to further enhance evaluative robustness.