Excellence networks in science: A Web-based application based on Bayesian multilevel logistic regression (BMLR) for the identification of institutions collaborating successfully
In this study we present an application which can be accessed via www.excellence-networks.net and which represents networks of scientific institutions worldwide. The application is based on papers (articles, reviews and conference papers) published between 2007 and 2011. It uses (network) data, on which the SCImago Institutions Ranking is based (Scopus data from Elsevier). Using this data, institutional networks have been estimated with statistical models (Bayesian multilevel logistic regression, BMLR) for a number of Scopus subject areas. Within single subject areas, we have investigated and visualized how successfully overall an institution (reference institution) has collaborated (compared to all the other institutions in a subject area), and with which other institutions (network institutions) a reference institution has collaborated particularly successfully. The “best paper rate” (statistically estimated) was used as an indicator for evaluating the collaboration success of an institution. This gives the proportion of highly cited papers from an institution, and is considered generally as an indicator for measuring impact in bibliometrics.
💡 Research Summary
The paper introduces a web‑based platform (www.excellence‑networks.net) that visualizes worldwide scientific institutional collaboration networks using publications from 2007 to 2011 indexed in Scopus. The underlying data are the same institution‑paper matrix employed by the SCImago Institutions Ranking, comprising articles, reviews, and conference papers across a broad range of Scopus subject areas. The central analytical engine is a Bayesian multilevel logistic regression (BMLR) model. At the first level, each paper is treated as a binary outcome indicating whether it belongs to the top‑10 % most‑cited papers (the “best paper”). At the second level, papers are nested within institution pairs: a “reference institution” (the focal unit) and a “network institution” (its collaborator). This hierarchical structure captures both the frequency of co‑authorship between a specific pair and the field‑specific citation environment. By specifying weakly informative priors and employing Markov chain Monte Carlo sampling, the model yields posterior distributions for the success probability of each institution pair, expressed as an estimated best‑paper rate (BPR) together with 95 % credible intervals.
The BPR serves as a quality‑adjusted measure of collaboration success, moving beyond raw co‑authorship counts or average citation counts. Because the Bayesian framework explicitly incorporates uncertainty, the resulting rankings are statistically more robust than traditional deterministic metrics. The application translates these complex statistical outputs into an intuitive network visualization. Nodes represent institutions; node size reflects the overall BPR of the reference institution across all its collaborations within a given subject area, while node colour indicates its relative standing (e.g., top decile). Edges connect a reference institution to each network institution; edge thickness encodes the volume of joint publications, and edge colour encodes the pair‑specific BPR and its credible interval. Users can select any institution to view both its global collaboration performance and the detailed performance with each partner, filter by subject area, time window, or citation window, and download the underlying statistics.
Methodologically, the study makes three notable contributions. First, it demonstrates how hierarchical Bayesian modeling can properly account for the nested nature of bibliometric data, correcting for over‑dispersion and dependence that plague simple proportion calculations. Second, it adopts the best‑paper rate—a proportion of highly cited papers—as a proxy for impact, thereby focusing on the quality dimension of collaboration rather than sheer quantity. Third, it integrates sophisticated statistical inference into a user‑friendly web interface, enabling policymakers, research managers, and scholars without advanced statistical training to explore evidence‑based insights.
The authors acknowledge several limitations. The dataset is fixed to 2007‑2011, and the citation window (typically three to five years) may not capture long‑term influence. Reliance on Scopus introduces coverage bias (e.g., under‑representation of regional journals) and potential errors in institutional name disambiguation. The analysis treats co‑authorship as the sole indicator of collaboration, overlooking other forms such as joint grant applications, shared patents, or data‑set co‑creation. Moreover, subject classification follows Scopus’s ASJC scheme, which can obscure interdisciplinary collaborations that cross category boundaries.
Potential applications include: (a) informing national or institutional research‑policy strategies by identifying institutions that achieve high‑impact collaborations in specific fields; (b) assisting researchers and administrators in selecting new partners based on statistically validated success probabilities; and (c) guiding funding agencies to allocate resources toward collaborative configurations that demonstrably yield high‑impact outcomes. The paper suggests future extensions such as real‑time data updates, incorporation of additional collaboration metrics (grant co‑funding, patent co‑ownership, data sharing), and the development of multi‑response hierarchical models to capture the multidimensional nature of scientific partnership. By expanding the methodological toolkit and data sources, the platform could evolve into a comprehensive decision‑support system for the science of science policy.
Comments & Academic Discussion
Loading comments...
Leave a Comment