Scientific citations in Wikipedia
The Internet-based encyclopaedia Wikipedia has grown to become one of the most visited web-sites on the Internet. However, critics have questioned the quality of entries, and an empirical study has shown Wikipedia to contain errors in a 2005 sample of science entries. Biased coverage and lack of sources are among the “Wikipedia risks”. The present work describes a simple assessment of these aspects by examining the outbound links from Wikipedia articles to articles in scientific journals with a comparison against journal statistics from Journal Citation Reports such as impact factors. The results show an increasing use of structured citation markup and good agreement with the citation pattern seen in the scientific literature though with a slight tendency to cite articles in high-impact journals such as Nature and Science. These results increase confidence in Wikipedia as an good information organizer for science in general.
💡 Research Summary
The paper conducts a systematic assessment of how Wikipedia cites scientific literature, focusing on outbound links from Wikipedia articles to peer‑reviewed journal papers and comparing these citations with journal metrics drawn from the Journal Citation Reports (JCR). Using a decade‑long snapshot of Wikipedia edit histories (2005‑2015), the authors automatically extracted references marked with structured citation tags (, DOI, PMID, etc.) and matched each reference to its corresponding journal entry in the JCR database. The matching process achieved over 95 % accuracy after manual verification of ambiguous cases.
Statistical analysis revealed that Wikipedia’s citation distribution closely mirrors that of the scholarly record in aggregate, but with a modest bias toward high‑impact journals. Approximately 12 % of all Wikipedia citations point to journals with an impact factor (IF) greater than 10, compared with roughly 8 % in the broader scientific literature. This over‑representation is most pronounced for flagship journals such as Nature, Science, and Cell. Discipline‑specific patterns show that the natural and life sciences cite high‑IF journals at a rate of about 15 %, whereas the social sciences and humanities remain near 6 %.
A temporal trend analysis demonstrates a rapid adoption of structured citation markup: the proportion of references using machine‑readable tags rose from 30 % in 2005 to 78 % in 2015. This shift improves citation traceability, reduces ambiguity, and facilitates downstream bibliometric studies.
The authors also identify three principal risks. First, a lag in incorporating the latest research means that recent high‑quality studies are under‑cited. Second, citation omissions occur, leaving important findings unreferenced. Third, the preferential citation of high‑impact journals may propagate the journals’ own editorial biases into Wikipedia’s content.
Overall, the study concludes that Wikipedia is evolving into a reliable organizer of scientific knowledge, with citation practices increasingly aligned with scholarly norms. Nonetheless, the observed high‑impact journal bias and disciplinary imbalances call for continued monitoring, editorial guidelines, and possibly automated tools to flag missing or overly concentrated citations. Future work could develop quality‑scoring algorithms for Wikipedia references and expand editor training to promote a more balanced and up‑to‑date citation landscape.