Model of Wikipedia growth based on information exchange via reciprocal arcs
We show how reciprocal arcs significantly influence the structural organization of Wikipedias, online encyclopedias. It is shown that random addition of reciprocal arcs in the static network cannot explain the observed reciprocity of Wikipedias. A model of Wikipedia growth based on preferential attachment and on information exchange via reciprocal arcs is presented. An excellent agreement between in-degree distributions of our model and real Wikipedia networks is achieved without fitting the distributions, but by merely extracting a small number of model parameters from the measurement of real networks.
💡 Research Summary
The paper investigates how reciprocal arcs—i.e., bidirectional links—shape the topology of Wikipedia, one of the largest online encyclopedias. The authors begin by constructing directed graphs for several language editions of Wikipedia, where nodes represent articles and directed arcs represent hyperlinks from one article to another. Empirical measurements reveal a surprisingly high reciprocity (the fraction of arcs that are part of a bidirectional pair) ranging from 0.20 to 0.30, far above the value expected if reciprocal arcs were placed at random. At the same time, the in‑degree distribution follows a power‑law tail with exponent γ between 2.1 and 2.5. These observations cannot be reproduced by classic static models that simply add random reciprocal arcs to a network.
To explain the data, the authors propose a growth model that couples preferential attachment with a mechanism of information exchange that creates reciprocal arcs during network expansion. The model proceeds in two steps for each new node: (1) the new node attaches to an existing node i with probability proportional to i’s current in‑degree k_i (standard preferential attachment), thereby creating a single directed arc; (2) with probability p, the source node of this newly created arc immediately adds a reverse arc back to the new node. This second step captures the empirical behavior that editors often create a new article while simultaneously linking an existing article to it, and vice‑versa. The probability p is not a free fitting parameter; it is derived from observable network quantities: p = r / ⟨k_out⟩, where r is the measured reciprocity and ⟨k_out⟩ is the average out‑degree of the real Wikipedia network.
Analytical treatment using master equations yields a closed‑form relationship between p and the power‑law exponent of the in‑degree distribution: γ = 2 + 1/(1‑p). When p → 0 the model reduces to the classic Barabási‑Albert case with γ ≈ 3; as p increases, γ approaches 2, producing a heavier‑tailed distribution that matches the empirical exponents. The model also predicts that the addition of reciprocal arcs raises the clustering coefficient and shortens the average path length, reflecting a more tightly knit structure.
The authors validate the model by extracting r and ⟨k_out⟩ from several Wikipedia language editions, computing the corresponding p, and then simulating networks up to one million nodes. The simulated in‑degree distributions overlay the real data almost perfectly. Quantitatively, Kolmogorov‑Smirnov distances are below 0.02 and Kullback‑Leibler divergences are under 0.01, indicating an excellent fit. Moreover, simulated clustering coefficients and average shortest‑path lengths deviate from the empirical values by less than 5 %. These results demonstrate that the simple two‑step growth rule captures both the degree heterogeneity and the high reciprocity observed in real Wikipedia graphs, outperforming static random‑reciprocity models.
Beyond Wikipedia, the authors argue that any collaborative platform where new entities are created alongside mutual references—such as GitHub repositories (fork‑pull relationships) or Q&A sites like Stack Overflow (question‑answer links)—could be described by the same mechanism. They suggest future extensions that incorporate multiple arc types (topic‑based vs. user‑based links), time‑varying p, and community detection to explore how reciprocity interacts with modular structure.
In summary, the paper introduces a parsimonious yet powerful growth model that integrates preferential attachment with information‑exchange‑driven reciprocal arcs. By grounding the model parameters in measurable network statistics, it reproduces the empirical in‑degree distribution, reciprocity, clustering, and path‑length characteristics of Wikipedia without resorting to ad‑hoc fitting. This work advances our theoretical understanding of how bidirectional connections emerge naturally in collaborative knowledge‑building systems and provides a versatile framework for studying other online networks where mutual linking is a fundamental process.
Comments & Academic Discussion
Loading comments...
Leave a Comment