Which cities paper output and citation impact are above expectation in information science? Some improvements of our previous mapping approaches

Which cities paper output and citation impact are above expectation in   information science? Some improvements of our previous mapping approaches
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Bornmann and Leydesdorff (in press) proposed methods based on Web-of-Science data to identify field-specific excellence in cities where highly-cited papers were published more frequently than can be expected. Top performers in output are cities in which authors are located who publish a number of highly-cited papers that is statistically significantly higher than can be expected for these cities. Using papers published between 1989 and 2009 in information science improvements to the methods of Bornmann and Leydesdorff (in press) are presented and an alternative mapping approach based on the indicator I3 is introduced here. The I3 indicator was introduced by Leydesdorff and Bornmann (in press).


💡 Research Summary

This paper builds on the earlier work of Bornmann and Leydesdorff, who introduced a statistical mapping technique to identify cities that produce more highly‑cited papers than would be expected by chance. Using a comprehensive set of information‑science articles published between 1989 and 2009, the authors refine the original methodology and add a new mapping approach based on the Integrated Impact Indicator (I3).

Data collection: The authors retrieved all “Article” records from the Social Sciences Citation Index (SSCI) for a selection of core information‑science journals (e.g., Annual Review of Information Science and Technology, Information Processing & Management, Journal of the American Society for Information Science and Technology, among others). This yielded 6,242 papers. For each paper the citation count was taken from its publication year up to July 2011, providing a uniform citation window. Highly‑cited papers were defined as those in the top 10 % of citations within the same publication year and document type.

Statistical test for “top‑10 % papers”: For each city the expected number of top‑10 % papers is simply 10 % of the total papers authored from that city. When the expected count is at least five, a two‑proportion z‑test (Sheskin, 2007) is applied to compare observed versus expected counts. The sign of the z‑value indicates whether the city exceeds (positive) or falls below (negative) expectation. Absolute z > 1.96 denotes p < 0.05, > 2.58 denotes p < 0.01, and > 3.29 denotes p < 0.001; these significance levels are marked with *, **, and *** respectively. The visual output uses circles whose radii are |observed – expected| + 1, coloured green for excess, red/orange for deficit, and grey or lime‑green when the expected count is too low for a reliable test.

I3‑based mapping: The Integrated Impact Indicator (I3) replaces simple averages with an integration of the citation distribution. Each paper receives a percentile score (0–100) based on its position within the citation rank of its field; a top‑1 % paper gets 100 points, an average paper 50 points, etc. Summing these scores across a set yields the I3 value. The expected I3 for a city is proportional to its number of papers, allowing a direct comparison of observed versus expected impact. The same z‑test is used to assess the significance of the I3 deviation. Additionally, the authors introduce RI3R = I3 / n (impact per paper) to compare average impact while still testing against the expected value; a minimum of five papers per city is required to avoid small‑sample bias.

Mapping workflow: The authors provide a step‑by‑step pipeline using custom executables (cities1.exe, cities2.exe, topcity4.exe, etc.). Cities1 extracts city names from the WoS data; the list is geocoded via GPS Visualizer or the Sci2 tool to obtain latitude/longitude. Cities2 merges the geocoded data with the original records. Topcity4 prompts the user for the percentile threshold (10 % in this study) and the minimum city size (five papers). It then produces a “ztest.txt” file for mapping and a “ucities.dbf” file for further statistical inspection. The resulting files can be uploaded to GPS Visualizer, which generates an interactive Google Map where each city is represented by a coloured, sized circle as described above.

I3 calculation: The same article set is processed with ISI.exe (to create relational DBF files) and isi2i3.exe (to compute percentile fields i3f – field‑normalized, i3j – journal‑normalized, and the six NSF percentile ranks r6f, r6j). Rousseau’s “≤” correction is applied, ensuring that papers with citation counts equal to the threshold are included in the top percentile, which is especially important for small reference sets.

Improvements over the original method:

  1. Multi‑year handling – topcity4 can analyse a 20‑year span in a single run, whereas earlier tools required yearly splits.
  2. Inclusion of I3 and RI3R provides a complementary view of impact that accounts for the skewed nature of citation distributions, rather than relying solely on counts of highly‑cited papers.
  3. Updated software accommodates the newer WoS interface (version 5) and automates several previously manual steps (e.g., geocoding).

Limitations and future work: The authors acknowledge issues such as city‑name disambiguation, the use of integer counting (each address counted once per paper regardless of author count), and the lack of Bonferroni correction despite multiple comparisons (they argue aggregation, not repeated testing, mitigates the problem). They also note that RI3R favours smaller cities because the denominator (number of papers) can be small, prompting the five‑paper threshold.

Overall, the paper delivers a robust, reproducible workflow for visualising both the quantity (top‑10 % papers) and quality (I3 impact) of scientific output at the city level, illustrated with the information‑science field as a case study. The tools and scripts are freely available online, enabling other researchers to apply the methodology to different disciplines or time periods.


Comments & Academic Discussion

Loading comments...

Leave a Comment