A measure of similarity between scientific journals and of diversity of a list of publications

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The aim of this note is to propose a definition of the scientific diversity and corollarly, a measure of the “interdisciplinarity” of collaborations. With respect to previous studies, the proposed approach consists of 2 steps : first, the definition of similarity between journals and second, these similarities are used to characterize the homogeneity (or, on the contrary the diversity) of a publication list (that can be for one individual or a team).

💡 Research Summary

The paper proposes a two‑step quantitative framework for assessing scientific diversity, i.e., the degree of interdisciplinarity, of a set of publications belonging to an individual researcher or a research team. In the first step the authors construct a similarity matrix S between journals. The core idea is simple yet original: if two journals appear together on the same article, they are considered “similar” because the same research team has chosen to publish in both venues. For each article p with nₚ authors, a weight wₚ = 1/(nₚ·(nₚ − 1)) is assigned to every unordered pair of journals (j,k) that co‑occur on that article. The weight is added to S(j,k). This weighting scheme normalizes for team size, giving smaller collaborations proportionally larger influence while preventing large consortium papers from dominating the similarity scores. After processing the entire corpus, S becomes a symmetric matrix with entries ranging from 0 (never co‑published) to 1 (maximally co‑published).

In the second step the similarity matrix is used to evaluate the homogeneity of a specific publication list L. The authors define a diversity index D(L) as the average pairwise similarity among all journals represented in L:

D(L) = (2 / |L|(|L| − 1)) ∑_{j≠k∈L} S(j,k).

A high D(L) indicates that the researcher’s output is concentrated in a narrow set of closely related journals (low interdisciplinarity), whereas a low D(L) signals that the output spans journals that are only weakly linked, reflecting a more interdisciplinary profile.

Technical strengths of the approach include: (1) Direct exploitation of co‑authorship networks, which captures real collaborative behavior rather than relying solely on citation or keyword similarity; (2) Normalization by author count, which mitigates the bias introduced by large, multi‑institutional projects; (3) Reusability of the pre‑computed similarity matrix across many users, making the method scalable to large bibliographic databases such as Scopus or Web of Science.

However, the paper also acknowledges several limitations. First, the occurrence of multiple journals on a single article is relatively rare, especially in fields where articles are typically assigned to a single venue. Consequently, the similarity matrix S can be extremely sparse, limiting its discriminative power. The authors suggest augmenting S with auxiliary information (citation links, shared keywords, or journal subject classifications) to alleviate sparsity. Second, the weight wₚ is a crude proxy for author contribution; it does not differentiate between first, corresponding, or senior authors, nor does it account for disciplinary conventions about author order. Incorporating more nuanced contribution models could improve the fidelity of the similarity scores. Third, journal subject scopes evolve over time and many journals are multidisciplinary; a static S may therefore misrepresent true intellectual distances. A dynamic, time‑aware version of S would better capture evolving interdisciplinary trends.

Potential applications are numerous. The diversity index D(L) could serve as an objective metric in tenure and promotion reviews, grant evaluations, or institutional benchmarking to encourage interdisciplinary research. Editorial boards might use the similarity matrix to understand how their journal connects to other fields, informing strategic decisions about special issues or scope expansions. Moreover, tracking changes in S over successive years would provide a quantitative lens on the emergence of new interdisciplinary domains and the diffusion of ideas across traditional boundaries.

In summary, the paper introduces a novel, collaboration‑centric method for quantifying journal similarity and, by extension, the interdisciplinary breadth of a researcher’s publication portfolio. While promising, the approach requires further refinement—particularly in handling matrix sparsity, incorporating richer author contribution data, and adapting to the fluid nature of journal scopes—to become a robust tool for scientometric analysis and research policy.

A measure of similarity between scientific journals and of diversity of a list of publications

💡 Research Summary

Comments & Academic Discussion

Leave a Comment