An Analytical Model of Information Dissemination for a Gossip-based Protocol
We develop an analytical model of information dissemination for a gossiping protocol that combines both pull and push approaches. With this model we analyse how fast an item is replicated through a network, and how fast the item spreads in the network, and how fast the item covers the network. We also determine the optimal size of the exchange buffer, to obtain fast replication. Our results are confirmed by large-scale simulation experiments.
💡 Research Summary
The paper presents a rigorous analytical model for a gossip-based information dissemination protocol that combines both push and pull mechanisms, often referred to as a “shuffle” protocol. The authors focus on large‑scale, fully connected networks where each node maintains a cache of size c and periodically initiates an exchange with a randomly selected neighbor. During an exchange, each node selects s items uniformly at random from its cache, sends them to the partner, receives s items in return, discards duplicates, and, if the cache overflows, evicts items that it had sent out (but not those it just received). This push‑pull hybrid is known to outperform pure push or pure pull schemes.
To capture the stochastic dynamics of a single item d, the authors define four possible joint states of the two interacting nodes (00, 01, 10, 11) indicating the presence or absence of d in each cache before the exchange. They introduce transition probabilities P(a₂b₂ | a₁b₁) and decompose them into two elementary probabilities:
* P_select – the probability that a given item is selected into the exchange buffer, which under uniform random selection equals s/c.
* P_drop – the probability that an item present in the exchange buffer of one node is overwritten by an incoming item from the partner. This depends on how many of the partner’s s items are already known to the node and can be approximated as s/(c − k), where k is the number of items already shared.
Using these elementary probabilities, the authors construct a Markov chain that describes the evolution of the joint state after each shuffle. By solving the chain they obtain closed‑form expressions for two key performance metrics:
-
Replication (replica count) – defined as the fraction of nodes that hold a copy of d at a given time. In the long run, because the total number of cache slots in the system is N·c and there are n different items, the expected number of copies per item converges to c/n. Thus replication stabilizes at c/n independent of the network size.
-
Coverage – defined as the fraction of nodes that have ever seen d. The authors derive a differential‑equation approximation for the growth of coverage over discrete rounds. The growth rate is proportional to (s/c) and to the current uncovered fraction, leading to an S‑shaped curve: rapid initial expansion followed by a slowdown as the remaining uncovered nodes become scarce.
A central contribution of the paper is the analytical determination of the optimal exchange buffer size s. By formulating a joint objective that balances fast replication (which benefits from larger s) against the risk of cache overflow and item loss (which grows with s), they show that the optimal s lies around c/2. This result quantifies the intuitive trade‑off: a buffer that is too small yields insufficient dissemination opportunities, while a buffer that is too large causes excessive overwriting and reduces overall efficiency.
The theoretical results are validated through extensive simulations on networks ranging from 10⁴ to 10⁶ nodes. The simulations implement the exact shuffle protocol and compare empirical replication curves, coverage trajectories, and convergence times against the analytical predictions. The agreement is excellent across a wide range of parameters: different numbers of items n, cache sizes c, and buffer sizes s. Moreover, the simulations confirm that choosing s ≈ c/2 consistently yields the fastest coverage and highest steady‑state replication, matching the analytical optimum.
In summary, the paper makes four major contributions:
- A precise probabilistic model of push‑pull gossip exchanges, expressed via state‑transition probabilities derived from elementary selection and drop probabilities.
- Closed‑form expressions for replication and coverage dynamics, including the asymptotic replica count c/n and an S‑shaped coverage growth function.
- An analytical optimality condition for the exchange buffer size, showing that s ≈ c/2 maximizes dissemination speed while minimizing loss.
- Comprehensive empirical validation on massive simulated networks, demonstrating that the model accurately predicts real‑world protocol behavior.
These insights are directly applicable to the design of wireless sensor networks, mobile ad‑hoc networks, and distributed caching systems where gossip‑based dissemination is employed. By providing a mathematically grounded framework, the work enables system designers to predict performance, tune parameters, and achieve efficient, reliable data spread without resorting to costly trial‑and‑error experiments.
Comments & Academic Discussion
Loading comments...
Leave a Comment