National research assessment exercises: a comparison of peer review and bibliometrics rankings

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Development of bibliometric techniques has reached such a level as to suggest their integration or total substitution for classic peer review in the national research assessment exercises, as far as the hard sciences are concerned. In this work we compare rankings lists of universities captured by the first Italian evaluation exercise, through peer review, with the results of bibliometric simulations. The comparison shows the great differences between peer review and bibliometric rankings for excellence and productivity.

💡 Research Summary

The paper investigates the discrepancies between peer‑review based rankings and bibliometric simulations for Italian universities, using the first national research assessment exercise (VTR, 2006) as a case study. VTR, organized by the Ministry of Universities and Research (MIUR) and the National Directory Committee for the Evaluation of Research (CIVR), required each university to submit a limited set of research outputs—four publications per researcher—covering the 2001‑2003 period. A panel of 183 experts evaluated roughly 18,000 outputs, assigning them to one of four quality categories (excellent, good, acceptable, limited). The resulting quality index (QI) produced university rankings within each of the 14 University Disciplinary Areas (UDAs). The exercise involved two years of work and cost about €3.5 million.

In parallel, the authors constructed a bibliometric dataset from the Thomson Reuters Web of Science, processed through the Observatory of Public Research (ORP). This dataset includes all Italian public‑sector publications (articles, reviews, conference papers) for the same triennium, amounting to 84,289 items across eight hard‑science UDAs (Mathematics & Computer Science, Physics, Chemistry, Earth Sciences, Biology, Medicine, Agricultural & Veterinary Sciences, Industrial & Information Engineering). Each paper is assigned to the relevant university and UDA(s) based on author affiliations and disciplinary codes, allowing for multiple assignments when co‑authorship spans institutions or fields.

The bibliometric evaluation proceeds in two scenarios. The first follows international practice: papers are ranked by the Article Impact Index (AII), defined as the ratio of a paper’s citations to the median citations of all Italian papers published in the same year and WoS subject category. Papers in the top 10 % of AII within each UDA are deemed “excellent.” The second scenario mirrors VTR’s submission constraint, limiting the number of evaluated papers per university to the same proportion used in the peer‑review exercise. For each university i in UDA u, an “excellence indicator” is calculated as

I_i,u = (Ne_i,u / Ne_u) / (RS_i,u / RS_u),

where Ne_i,u is the number of excellent papers authored by university i, Ne_u is the total number of excellent papers nationally in UDA u, RS_i,u is the research staff of university i in that UDA, and RS_u is the national staff total. This indicator captures both the share of excellent outputs and the efficiency of production relative to staff size, thereby incorporating productivity alongside quality.

Comparative analysis reveals substantial divergence between the two ranking systems. Peer‑review rankings are heavily influenced by the selection process: universities must choose which outputs to submit, and the limited sample (≈9 % of total output) often fails to represent the institution’s true research performance. Selection inefficiencies, strategic behavior, and disciplinary differences introduce bias, leading to cases where institutions with a high volume of high‑impact work are under‑ranked because they submitted fewer or less representative papers. Moreover, peer‑review evaluates a fixed number of outputs per institution regardless of actual research capacity, resulting in a “bottom‑up” approach that cannot capture overall productivity.

In contrast, the bibliometric approach evaluates the entire publication corpus, allowing a “top‑down” assessment that rewards both the quantity of excellent work and its efficiency per researcher. The bibliometric rankings are more stable, less costly, and less time‑consuming, as they rely on automated citation data rather than extensive expert panels. The differences are especially pronounced in fields such as Medicine and Biology, where the volume of publications is large and the peer‑review sample is a tiny fraction of the total output.

The authors conclude that while peer review remains valuable for nuanced qualitative judgment, its structural limitations—selection bias, limited coverage, high administrative cost, and inability to measure productivity—make it a suboptimal sole instrument for national research assessment in the hard sciences. Bibliometric methods, particularly when calibrated with field‑normalized citation indicators like AII, provide a more robust, cost‑effective, and comprehensive basis for ranking institutions. The paper suggests that future assessment frameworks should either adopt a fully bibliometric model for the hard sciences or develop hybrid systems that combine the qualitative strengths of peer review with the quantitative breadth of bibliometrics, ensuring that both excellence and productivity are accurately captured for policy‑making and resource allocation.

National research assessment exercises: a comparison of peer review and bibliometrics rankings

💡 Research Summary

Comments & Academic Discussion

Leave a Comment