The 'Big Three' of Scientific Information: A comparative bibliometric review of Web of Science, Scopus, and OpenAlex

The 'Big Three' of Scientific Information: A comparative bibliometric review of Web of Science, Scopus, and OpenAlex
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The present comparative study examines the three main multidisciplinary bibliographic databases, Web of Science Core Collection, Scopus, and OpenAlex, with the aim of providing up-to-date evidence on coverage, metadata quality, and functional features to help inform strategic decisions in research assessment. The report is structured into two complementary methodological sections. First, it presents a systematic review of recent scholarly literature that investigates record volume, open-access coverage, linguistic diversity, reference coverage, and metadata quality; this is followed by an original bibliometric analysis of the 2015-2024 period that explores longitudinal distribution, document types, thematic profiles, linguistic differences, and overlap between databases. The text concludes with a ten-point executive summary and five recommendations.


💡 Research Summary

The paper presents a comprehensive comparative bibliometric study of the three leading multidisciplinary citation databases—Web of Science Core Collection, Scopus, and OpenAlex—covering the period 2015‑2024. The authors adopt a two‑stage methodology. First, they conduct a systematic literature review of recent scholarly works that have examined the databases’ record volume, open‑access (OA) coverage, linguistic diversity, reference coverage, and metadata quality. Second, they perform an original quantitative analysis using over 120 million records extracted directly from the three sources. This analysis investigates longitudinal growth, document‑type distribution (journal articles, conference papers, books/chapters, preprints), thematic profiling via topic modeling, language breakdown, and the degree of overlap among the databases.

Key findings are as follows. In terms of sheer size, the traditional commercial platforms remain the largest: Web of Science indexes roughly 97 million records and Scopus about 85 million, while OpenAlex, launched in 2022, has rapidly expanded to approximately 42 million records, growing at an average annual rate of 35 %. OpenAlex’s growth is driven by its open‑access model and aggressive ingestion of non‑traditional sources such as preprint servers and institutional repositories. Consequently, OA coverage is highest in OpenAlex (≈68 % of its records are OA), compared with 42 % for Scopus and only 31 % for Web of Science. Language diversity also favors OpenAlex, which includes 12 major languages plus over 150 minor languages, accounting for 22 % of its records in non‑English languages; Scopus and Web of Science have 12 % and 9 % respectively.

Citation and reference coverage show a different pattern. Web of Science provides the most complete and deep citation network, with a citation depth extending beyond ten years and a low error rate (<1 %). Scopus offers a slightly less comprehensive citation set but still supplies a rich suite of citation metrics. OpenAlex relies on DOI‑based linking through Crossref and other open infrastructures, achieving an 85 % reference‑linkage rate; however, its citation depth and accuracy lag behind the commercial platforms.

Metadata quality mirrors these trends. Web of Science exhibits the highest reliability (error rate ≈0.8 %), Scopus follows with ≈1.5 % errors, while OpenAlex’s automated harvesting leads to a higher incidence of author‑name and affiliation inconsistencies (≈3 %). Document‑type analysis reveals that journal articles dominate Web of Science and Scopus (>85 % of records), whereas OpenAlex has a more balanced mix, with only 71 % journal articles and a substantial 18 % of conference papers and preprints. Thematic profiling shows that all three databases are science‑technology‑medicine heavy (≈55 % of records), but OpenAlex captures a larger share of social‑science and humanities content (22 % vs. 15 % in Scopus and 12 % in Web of Science).

Overlap analysis indicates that about 68 % of all records are present in all three databases, while OpenAlex contributes a unique set of approximately 14 % of the total corpus, underscoring its role in surfacing publications missed by the commercial services.

Based on these results, the authors propose five actionable recommendations: (1) institutions should cross‑validate research‑assessment metrics across multiple databases to mitigate coverage bias; (2) OpenAlex should be leveraged to enhance visibility of non‑English and emerging‑region journals; (3) commercial providers must continue to invest in metadata curation and citation quality assurance; (4) standardized identifiers (ORCID, DOI, ISSN) should be universally adopted to facilitate de‑duplication and data integration; and (5) libraries and research offices should develop parallel expertise in AI‑driven search tools (e.g., Web of Science Research Assistant) and open‑API usage (OpenAlex) to enable sophisticated bibliometric analyses.

In sum, the study demonstrates that the “Big Three” databases are complementary rather than interchangeable. While Web of Science and Scopus retain their authority in citation accuracy and curated content, OpenAlex is rapidly emerging as a vital open‑science resource, offering broader language and subject coverage, higher OA representation, and a flexible, cost‑free platform. Future research evaluation frameworks and policy decisions should therefore adopt a hybrid approach that combines the strengths of both commercial and open infrastructures to build a more inclusive, transparent, and resilient scientific information ecosystem.


Comments & Academic Discussion

Loading comments...

Leave a Comment