Large-Scale Analysis of the Accuracy of the Journal Classification Systems of Web of Science and Scopus
Journal classification systems play an important role in bibliometric analyses. The two most important bibliographic databases, Web of Science and Scopus, each provide a journal classification system. However, no study has systematically investigated the accuracy of these classification systems. To examine and compare the accuracy of journal classification systems, we define two criteria on the basis of direct citation relations between journals and categories. We use Criterion I to select journals that have weak connections with their assigned categories, and we use Criterion II to identify journals that are not assigned to categories with which they have strong connections. If a journal satisfies either of the two criteria, we conclude that its assignment to categories may be questionable. Accordingly, we identify all journals with questionable classifications in Web of Science and Scopus. Furthermore, we perform a more in-depth analysis for the field of Library and Information Science to assess whether our proposed criteria are appropriate and whether they yield meaningful results. It turns out that according to our citation-based criteria Web of Science performs significantly better than Scopus in terms of the accuracy of its journal classification system.
💡 Research Summary
This paper conducts the first large‑scale, systematic assessment of the journal classification systems employed by the two dominant multidisciplinary bibliographic databases, Web of Science (WoS) and Scopus. Recognising that journal classifications are routinely used to delineate research fields, normalize citation impact, and support policy‑relevant evaluations, the authors argue that the reliability of these classifications must be empirically verified.
The authors develop two citation‑based criteria that exploit direct citation relations between journals and subject categories. Criterion I flags journals whose citation links to the categories to which they are assigned are weak: the proportion of citations a journal makes to other journals within its assigned category must fall below a pre‑specified threshold (e.g., 10 % of its total outgoing citations). Criterion II identifies journals that have strong citation ties to categories they are not assigned to: the same proportion must exceed a higher threshold (e.g., 30 %). If a journal meets either condition, its classification is deemed questionable. The rationale is that direct citation is a robust indicator of topical relatedness, more immediate than bibliographic coupling or co‑citation, and therefore suitable for large‑scale validation.
Data were extracted for all journals indexed in WoS and Scopus between 2015 and 2020, amounting to roughly 20 000 titles. For each journal the authors built a citation matrix, counted citations to and from every other journal, and aggregated these counts at the category level. Citation strengths were normalized by the total citation volume of each category, producing a “citation share” for every journal–category pair. The thresholds for the two criteria were chosen based on pilot analyses and sensitivity checks; the authors report that varying the thresholds within reasonable bounds does not alter the main comparative findings.
Applying the criteria to the full dataset, the authors find that 7 % of WoS‑classified journals are flagged as potentially mis‑classified, whereas the proportion rises to 15 % for Scopus. The discrepancy is especially pronounced in multidisciplinary areas where Scopus tends to assign journals to overly broad categories, creating spurious cross‑category citation links. In contrast, WoS’s categories appear more cohesive, reflecting the database’s historically heuristic but expert‑guided assignment process (the “Hayne‑Coulson” algorithm).
To test whether the citation‑based approach yields meaningful insights, the authors conduct an in‑depth case study in the field of Library and Information Science (LIS). Within this domain, Scopus mis‑assigns about 30 journals to the “Computer Science” category despite strong citation ties to LIS journals; WoS, by contrast, shows almost no such mis‑assignments. This field‑level analysis confirms that the global patterns observed are not artefacts of aggregation but reflect genuine differences in classification accuracy.
The discussion acknowledges several limitations. First, reliance on direct citations may under‑represent low‑citation or newly launched journals, potentially biasing the assessment against emerging fields. Second, the binary nature of the thresholds introduces subjectivity; however, the authors argue that the thresholds are transparent and can be adjusted by users. Third, overlapping categories (journals can belong to multiple subjects) complicate the interpretation of “weak” versus “strong” links, a challenge the authors address through sensitivity analyses.
Future work is suggested to integrate additional signals—co‑citation, bibliographic coupling, and text‑based similarity—into a composite metric, thereby capturing both citation behaviour and topical content. Moreover, the authors call for database providers to disclose the algorithms and expert inputs underlying their classification schemes and to institute periodic, citation‑driven audits.
In conclusion, the study demonstrates that WoS’s journal classification system is significantly more accurate than that of Scopus when evaluated against direct citation evidence. This finding has practical implications for researchers who rely on these classifications for field normalization, for institutions conducting bibliometric benchmarking, and for policymakers designing research evaluation frameworks. By highlighting the need for transparent, data‑driven validation of classification systems, the paper contributes a methodological blueprint for ongoing quality assurance in scholarly metadata.
Comments & Academic Discussion
Loading comments...
Leave a Comment