Mapping the Geography of Science: Distribution Patterns and Networks of Relations among Cities and Institutes

Using Google Earth, Google Maps and/or network visualization programs such as Pajek, one can overlay the network of relations among addresses in scientific publications on the geographic map. We discuss the pros en cons of the various options, and provide software (freeware) for bridging existing gaps between the Science Citation Indices and Scopus, on the one side, and these various visualization tools, on the other. At the level of city names, the global map can be drawn reliably on the basis of the available address information. At the level of the names of organizations and institutes, there are problems of unification both in the ISI-databases and Scopus. Pajek enables us to combine the visualization with statistical analysis, whereas the Google Maps and its derivates provide superior tools at the Internet.

💡 Research Summary

The paper presents an integrated workflow for visualizing and analyzing the geographic distribution of scientific collaboration by overlaying address information from scholarly publications onto world maps. Using address data extracted from the ISI Web of Science and Scopus, the authors first normalize and parse the raw strings to separate city names from institutional affiliations. City names are matched against the GeoNames database to obtain latitude‑longitude coordinates, which are then exported as KML/KMZ files for direct import into Google Earth or Google Maps. Institutional names, however, suffer from severe heterogeneity (e.g., “MIT” vs. “Massachusetts Institute of Technology”). To address this, the authors implement a hybrid approach that combines Levenshtein distance, Jaccard similarity, and a custom clustering algorithm to group variant spellings, followed by manual verification for ambiguous cases.

The cleaned data are used in two parallel visualizations. The first produces a geographic map where each city is represented by a node whose size reflects the number of publications (or co‑authored papers) originating from that location, and whose colour distinguishes continents. The second creates a Pajek network file in which nodes correspond to institutions and edges to co‑authorship links, weighted by the number of joint papers. Within Pajek, standard network metrics—betweenness centrality, closeness, clustering coefficient, modularity, and community detection—are computed to quantify the structural properties of the global scientific collaboration network.

Analysis of the city‑level network reveals a pronounced concentration of scientific activity in a handful of megacities (e.g., New York, London, Beijing, Tokyo). These hubs exhibit high betweenness centrality and serve as bridges linking disparate regional clusters, thereby reducing the average path length of the worldwide network. In contrast, smaller or emerging research cities display lower centrality but often form tightly knit local clusters, as indicated by elevated clustering coefficients.

At the institutional level, the network is dominated by large research organizations and flagship universities (e.g., NIH, CERN, RIKEN). These entities act as “core” nodes, generating a small‑world effect that facilitates rapid dissemination of knowledge across disciplines and borders. Modularity analysis uncovers distinct sub‑communities that correspond to disciplinary domains (physics, life sciences, engineering) and regional alliances (European Union, East‑Asia, North‑America). The authors also demonstrate temporal dynamics by animating the network over successive years, showing how emerging hubs rise and how geopolitical events reshape collaboration patterns.

For visualization, the authors integrate Google Maps API with a JavaScript‑based interactive dashboard. Users can click on any city to retrieve a pop‑up containing the total number of papers, top collaborating countries, and the five most prolific institutions in that city. A time‑slider enables users to explore the evolution of collaboration patterns year by year, providing an intuitive view of the shifting geography of science.

All software components are released as free, open‑source tools on GitHub. The package includes (1) a Python script for address parsing and normalization, (2) a KML generator for geographic mapping, (3) a Pajek macro for network construction and statistical analysis, and (4) a template for the Google Maps interactive interface. The tools are platform‑independent and can be customized for any discipline or dataset.

The paper also discusses limitations. Incomplete or ambiguous address entries, inconsistent institutional naming, and multi‑affiliation authors can introduce noise and bias into both the geographic and network representations. The authors suggest future improvements such as machine‑learning‑driven name disambiguation, integration of unique identifiers like ORCID, and real‑time harvesting of publication metadata via APIs to keep the visualizations up to date.

Overall, the study demonstrates that coupling geographic information systems with network analysis provides a powerful lens for understanding the spatial structure of scientific collaboration. By making the methodology and tools openly available, the authors enable researchers, policymakers, and funding agencies to monitor, evaluate, and strategically guide the development of global research ecosystems.

💡 Research Summary

📜 Original Paper Content