Assessing and Comparing the Coverage of Publications of Italian Universities in OpenCitations

Assessing and Comparing the Coverage of Publications of Italian Universities in OpenCitations
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent initiatives advocating responsible, transparent research assessment have intensified the call to use open research information rather than proprietary databases. This study evaluates the coverage and citation representation of publications recorded in the Current Research Information Systems (CRIS), all instances of the IRIS software platform, of six Italian universities within OpenCitations, a community-owned open infrastructure. Using persistent identifiers (DOIs, PMIDs, and ISBNs) specified in the IRIS installations involved, we matched the publications recorded in OpenCitations Meta and extracted the related citation links from the OpenCitations Index. Results show that OpenCitations covers, on average, over 40% of IRIS publications, which is quantitatively comparable to those reported by Scopus and Web of Science in another study. However, gaps persist, particularly for publication types prevalent in the Social Sciences and Humanities, such as monographs and critical editions. Overall, the findings demonstrate the growing maturity of OpenCitations and, more broadly, of Open Science infrastructures as viable alternatives as sources of research information, while highlighting areas where further metadata enrichment and interoperability efforts are needed.


💡 Research Summary

This paper evaluates how well publications recorded in the Current Research Information Systems (CRIS) of six Italian universities are represented in OpenCitations, an open, community‑owned citation infrastructure. The six institutions—University of Bologna, University of Milan, University of Turin, University of Padua, University of Eastern Piedmont, and Scuola Normale Superiore in Pisa—use the IRIS CRIS platform. The authors collected IRIS extracts via seven predefined SQL queries, obtaining CSV files that contain author information, publication identifiers (DOI, PMID, ISBN), titles, dates, publishers, and a mapping of each institution’s internal publication‑type taxonomy to the national MIUR classification.

The methodology consists of three main steps. First, the authors extracted all syntactically valid persistent identifiers (PIDs) from each IRIS dump, discarding records without a DOI, PMID, or ISBN. ISBNs received an additional validation: they were kept only for publication types that legitimately receive an ISBN (e.g., monographs, critical editions, translations). Second, the extracted PIDs were matched against the OpenCitations Meta dump (June 2025, >124 million bibliographic entities) to retrieve the corresponding OpenCitation Meta Identifiers (OMIDs). Third, using the OpenCitations Index dump (July 2025, >2.2 billion citation links), all citation relations involving any of the matched OMIDs were extracted. The authors implemented a careful deduplication routine: when multiple OMIDs corresponded to the same IRIS record, the entry with the most granular publication date was retained; remaining ties were broken by selecting the highest‑valued OMID. Citation records were also deduplicated by their OpenCitation Identifier (OCI).

In total, more than 1.6 million IRIS records were processed. After PID extraction and validation, 260,618 unique identifiers remained for searching in OpenCitations. The matching results show that, on average, 40.33 % of IRIS records (median 40.70 %) are present in OpenCitations Meta. Coverage varies by institution: University of Milan reaches 48.1 %, while University of Turin is at 31.8 %. The number of citation links involving matched records is substantial: each matched entity participates in roughly 33 outgoing citations and receives about 35 incoming citations, yielding over 30 million citation links across the six universities. Internal citations (IRIS records citing other IRIS records) amount to roughly half a million links, confirming that OpenCitations can support intra‑institutional citation network analyses.

The analysis of unmapped records reveals systematic gaps. The most frequently missing types are journal articles, book chapters, and conference papers—despite these categories having relatively high coverage where identifiers are present. The most pronounced deficiency concerns the humanities and social sciences, where monographs, critical editions, and edited volumes are common. Many of these items are only identified by an ISBN that refers to the whole volume, while individual chapters lack DOIs, leading to their exclusion from the OpenCitations dataset. Consequently, the OpenCitations coverage for these publication types remains lower than for journal articles.

To facilitate reproducibility and future monitoring, the authors released a Python package called iris‑oc‑mapper. This command‑line tool automates the conversion of IRIS dumps to structured CSV, maps local publication types to the MIUR taxonomy, extracts and validates PIDs, performs the OpenCitations matching, deduplicates results, and generates HTML reports with visual summaries. The software is intended to enable each university to repeat the analysis autonomously and to track improvements over time.

In conclusion, the study demonstrates that OpenCitations has reached a level of maturity comparable to proprietary databases such as Scopus and Web of Science in terms of sheer coverage of Italian university outputs. However, the authors stress that further work is needed to enrich metadata, especially for ISBN‑based publications, and to improve interoperability between CRIS platforms and open citation infrastructures. By addressing these gaps, OpenCitations could become a fully viable, open alternative for responsible research assessment across all disciplines.


Comments & Academic Discussion

Loading comments...

Leave a Comment