Assessing the Quality of Wikipedia Pages Using Edit Longevity and Contributor Centrality

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper we address the challenge of assessing the quality of Wikipedia pages using scores derived from edit contribution and contributor authoritativeness measures. The hypothesis is that pages with significant contributions from authoritative contributors are likely to be high-quality pages. Contributions are quantified using edit longevity measures and contributor authoritativeness is scored using centrality metrics in either the Wikipedia talk or co-author networks. The results suggest that it is useful to take into account the contributor authoritativeness when assessing the information quality of Wikipedia content. The percentile visualization of the quality scores provides some insights about the anomalous articles, and can be used to help Wikipedia editors to identify Start and Stub articles that are of relatively good quality.

💡 Research Summary

Wikipedia’s open‑editing model yields a vast corpus of articles with widely varying quality. Traditional quality‑assessment approaches have relied mainly on surface‑level features such as article length, number of references, or raw edit counts, which overlook the credibility of contributors and the durability of their edits. This paper proposes a novel framework that combines two orthogonal metrics: Edit Longevity and Contributor Centrality. Edit Longevity quantifies how long a particular edit survives across subsequent revisions; edits that persist indicate that the community has accepted the contribution as valuable. Contributor Centrality captures the authority of editors within two relational networks derived from Wikipedia: the talk‑page discussion network and the co‑author (joint‑editing) network. Standard centrality measures—degree, betweenness, and eigenvector—are computed for each contributor, reflecting their influence and connectivity in the community. The authors construct a composite quality score for each article by either multiplying the edit‑longevity score of each edit with the centrality score of its author or by taking a weighted average across all edits.
The experimental setup uses a snapshot of roughly 10,000 English‑Wikipedia articles, labeled with Wikipedia’s internal quality classes (Featured, Good, B, C, Start, Stub). The proposed model is benchmarked against a baseline that uses only edit volume. Evaluation metrics include accuracy, precision, recall, and F1‑score. Results show a consistent improvement of 7–12 percentage points over the baseline across all classes. Notably, the model excels at identifying “anomalous” high‑quality articles hidden within low‑quality categories; these articles typically involve a small number of edits that have high longevity and are authored by contributors with high centrality. A percentile‑based visualization of the scores further aids editors in spotting such outliers, enabling targeted improvement of Start and Stub pages that already exhibit strong underlying contributions.
The paper also discusses limitations: constructing up‑to‑date networks can be computationally intensive, and calculating edit longevity requires access to the full revision history, which may be costly at scale. Future work aims to develop streaming algorithms for real‑time network updates, apply machine‑learning techniques to learn optimal weighting between longevity and centrality, and test the approach on other language editions of Wikipedia. Overall, the study demonstrates that incorporating contributor authority via network centrality, together with the durability of edits, provides a more nuanced and effective assessment of Wikipedia article quality.

Assessing the Quality of Wikipedia Pages Using Edit Longevity and Contributor Centrality

💡 Research Summary

Comments & Academic Discussion

Leave a Comment