The Accuracy of Confidence Intervals for Field Normalised Indicators
📝 Abstract
When comparing the average citation impact of research groups, universities and countries, field normalisation reduces the influence of discipline and time. Confidence intervals for these indicators can help with attempts to infer whether differences between sets of publications are due to chance factors. Although both bootstrapping and formulae have been proposed for these, their accuracy is unknown. In response, this article uses simulated data to systematically compare the accuracy of confidence limits in the simplest possible case, a single field and year. The results suggest that the MNLCS (Mean Normalised Log-transformed Citation Score) confidence interval formula is conservative for large groups but almost always safe, whereas bootstrap MNLCS confidence intervals tend to be accurate but can be unsafe for smaller world or group sample sizes. In contrast, bootstrap MNCS (Mean Normalised Citation Score) confidence intervals can be very unsafe, although their accuracy increases with sample sizes.
💡 Analysis
When comparing the average citation impact of research groups, universities and countries, field normalisation reduces the influence of discipline and time. Confidence intervals for these indicators can help with attempts to infer whether differences between sets of publications are due to chance factors. Although both bootstrapping and formulae have been proposed for these, their accuracy is unknown. In response, this article uses simulated data to systematically compare the accuracy of confidence limits in the simplest possible case, a single field and year. The results suggest that the MNLCS (Mean Normalised Log-transformed Citation Score) confidence interval formula is conservative for large groups but almost always safe, whereas bootstrap MNLCS confidence intervals tend to be accurate but can be unsafe for smaller world or group sample sizes. In contrast, bootstrap MNCS (Mean Normalised Citation Score) confidence intervals can be very unsafe, although their accuracy increases with sample sizes.
📄 Content
1
The Accuracy of Confidence Intervals for Field Normalised Indicators1 Mike Thelwall, Ruth Fairclough Statistical Cybermetrics Research Group, University of Wolverhampton, UK.
When comparing the average citation impact of research groups, universities and countries,
field normalisation reduces the influence of discipline and time. Confidence intervals for
these indicators can help with attempts to infer whether differences between sets of
publications are due to chance factors. Although both bootstrapping and formulae have
been proposed for these, their accuracy is unknown. In response, this article uses simulated
data to systematically compare the accuracy of confidence limits in the simplest possible
case, a single field and year. The results suggest that the MNLCS (Mean Normalised Log-
transformed Citation Score) confidence interval formula is conservative for large groups but
almost always safe, whereas bootstrap MNLCS confidence intervals tend to be accurate but
can be unsafe for smaller world or group sample sizes. In contrast, bootstrap MNCS (Mean
Normalised Citation Score) confidence intervals can be very unsafe, although their accuracy
increases with sample sizes.
Keywords: Citation analysis; field normalised citation indicators; confidence intervals
1 Introduction
Citation indicators that estimate the average citation rate of articles produced by a group
are widely used in research assessment and for ranking universities, countries and
departments (Aksnes, Schneider, & Gunnarsson, 2012; Albarrán, Perianes‐Rodríguez, &
Ruiz‐Castillo, 2015; Braun, Glänzel, & Grupp, 1995; Elsevier, 2013; Fairclough & Thelwall,
2015). For example, in the U.K., they have been proposed for the national Research
Excellence Framework (REF) to cross-check peer review judgements (Stern, 2016). If average
citation indicators are to be used in such a role, then they must be calculated in a fair way
and accompanied with an estimate of statistical variability so that strong conclusions are not
drawn from small or biased differences.
Field normalised citation impact indicators adjust average citation counts for the
field and year of publication to allow fair comparisons of citation impact between sets of
articles that were published in different combinations of fields and years. For example, if
group A published 100 medical humanities articles in 2014 with an average of 4 citations
each but group B published 100 oncology articles in 2013 with an average of 30 citations
each then it is not clear which had generated the most impactful research. Group B has two
advantages: its articles are older, with longer to attract citations, and it publishes in an area
where citations accrue rapidly. A field normalised indicator may divide by the average
number of citations for the field and year so that the normalised counts are 1 if the average
citation impact is equal to the world average. After this, it would be reasonable to compare
the field normalised values of A and B. Nevertheless, confidence intervals or statistical
hypothesis tests are needed to be able to judge whether the difference between A and B is
likely to reflect an underlying trend rather than a random fluctuation of the data.
1Thelwall, M. & Fairclough, R. (in press). The accuracy of confidence intervals for field normalised indicators Journal of Informetrics. doi:10.1016/j.joi.2017.03.004 This manuscript version is made available under the CC- BY-NCND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ 2
The use of statistical inference or confidence intervals to compare the average citation impact is uncommon within scientometrics and there are arguments against it, such as a lack of clarity about what exactly is being sampled (Waltman, 2016). Statistical inference is typically used when data is available about a sample whereas in scientometrics, relatively complete sets of publications are normally analysed and so there is no necessity to infer population properties from a sample, at least in the obvious sense. Nevertheless, research is a social process and therefore each citation is the product of activities that are affected by processes that can be thought of as random in the sense of not predictable in advance (Williams & Bornmann, 2016). The exact citation count of an article is therefore partly a result of chance factors rather than just the quality or value of an article. For example, if two essentially identical papers are published at the same time then one may become more highly cited than the other for spurious reasons, such as the prestige of the publishing journal (Larivière & Gingras, 2010), or the extent to which the citing literature is covered by the database used for the counts (Harzing & Alakangas, 2016; Table 3 in: Kousha & Thelwall, 2008). Thus, it seems impossible to regard citation counting as precisely measuring the im
This content is AI-processed based on ArXiv data.