Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks

We consider large-scale sensor networks with n nodes, out of which k are in possession, (e.g., have sensed or collected in some other way) k information packets. In the scenarios in which network nodes are vulnerable because of, for example, limited energy or a hostile environment, it is desirable to disseminate the acquired information throughout the network so that each of the n nodes stores one (possibly coded) packet and the original k source packets can be recovered later in a computationally simple way from any (1 + \epsilon)k nodes for some small \epsilon > 0. We developed two distributed algorithms for solving this problem based on simple random walks and Fountain codes. Unlike all previously developed schemes, our solution is truly distributed, that is, nodes do not know n, k or connectivity in the network, except in their own neighborhoods, and they do not maintain any routing tables. In the first algorithm, all the sensors have the knowledge of n and k. In the second algorithm, each sensor estimates these parameters through the random walk dissemination. We present analysis of the communication/transmission and encoding/decoding complexity of these two algorithms, and provide extensive simulation results as well

💡 Research Summary

The paper addresses the problem of reliable data storage in large‑scale wireless sensor networks (WSNs) where a subset of k source packets must be preserved despite node failures caused by limited energy or hostile environments. The authors propose two fully distributed algorithms that disseminate the k packets throughout an n‑node network so that each node stores exactly one (possibly coded) packet, and the original k packets can be recovered from any (1 + ε) k nodes with high probability. Both algorithms rely on simple random walks for packet propagation and on Fountain codes (LT or Raptor) for encoding. In the first algorithm each sensor is assumed to know the global parameters n and k; random walks of length O(n log n) ensure that every node receives a sufficiently diverse set of source packets, after which each node selects a degree according to a predefined distribution, linearly combines the received packets, and stores the resulting coded symbol. Recovery is performed by collecting (1 + ε) k coded symbols, forming a sparse linear system, and solving it with algorithms whose complexity is O(k log k), far lower than that of traditional Reed‑Solomon decoding. The second algorithm removes the need for a priori knowledge of n and k. Nodes estimate these values locally by monitoring visit frequencies, duplicate counts, and other statistics gathered during the random‑walk dissemination. Once the estimates converge, the same encoding procedure as in the first algorithm is applied. The authors analytically bound the communication cost (total transmissions ≈ O(n log n)), the encoding/decoding complexity, and the probability of successful recovery as a function of ε and the estimation error. Extensive simulations over a range of network sizes (hundreds to thousands of nodes) and packet loss rates (0 %–30 %) confirm that both schemes achieve near‑optimal recovery probability (> 95 % for ε ≈ 0.1) while keeping the number of transmitted messages low. The work’s key contributions are (1) a truly distributed storage mechanism that requires no routing tables or global knowledge, (2) the integration of Fountain coding to reduce redundancy and decoding effort, and (3) a practical self‑estimation technique that makes the approach viable in real, dynamic WSN deployments. The paper concludes with suggestions for future extensions, including handling mobility, asynchronous walks, and hardware‑level validation.