Distributed Data Storage in Large-Scale Sensor Networks Based on LT Codes

This paper proposes an algorithm for increasing data persistency in large-scale sensor networks. In the scenario considered here, k out of n nodes sense the phenomenon and produced ? information packets. Due to usually hazardous environment and limited resources, e.g. energy, sensors in the network are vulnerable. Also due to the large size of the network, gathering information from a few central hopes is not feasible. Flooding is not a desired option either due to limited memory of each node. Therefore the best approach to increase data persistency is propagating data throughout the network by random walks. The algorithm proposed here is based on distributed LT (Luby Transform) codes and it benefits from the low complexity of encoding and decoding of LT codes. In previous algorithms the essential global information (e.g., n and k) are estimated based on graph statistics, which requires excessive transmissions. In our proposed algorithm, these values are obtained without additional transmissions. Also the mixing time of random walk is enhanced by proposing a new scheme for generating the probabilistic forwarding table of random walk. The proposed method uses only local information and it is scalable to any network topology. By simulations the improved performance of developed algorithm compared to previous ones has been verified.

💡 Research Summary

The paper addresses the critical problem of data persistency in large‑scale wireless sensor networks (WSNs), where nodes are energy‑constrained, have limited memory, and are often deployed in hazardous or hard‑to‑reach environments. Traditional centralized collection or naïve flooding approaches are either infeasible due to scalability or wasteful in terms of bandwidth and storage. To overcome these challenges, the authors propose a fully distributed storage scheme that combines random‑walk based data dissemination with Luby Transform (LT) fountain coding.

Key Contributions

LT‑Based Distributed Coding – Each of the k source nodes generates an original packet. Using the LT encoder, a node randomly selects a degree d according to a predefined degree distribution, chooses d distinct source packets, XORs them, and attaches a small header containing the IDs of the selected sources. The resulting encoded packet is lightweight and can be generated with O(k log k) operations, which is suitable for low‑power sensor hardware.
Parameter‑Free Estimation of n and k – Existing random‑walk storage algorithms require global knowledge of the total number of nodes n and the number of sources k, typically obtained through costly graph‑statistics exchanges. The proposed method eliminates this overhead. Each node locally records the unique identifiers of the random‑walk packets it receives and timestamps of arrivals. Over a sliding observation window, the node computes the empirical visitation frequency, which, under the assumption of a well‑mixed walk, yields an unbiased estimate of n. The source flag embedded in each packet directly reveals k without any extra communication.
Weighted Forwarding Table for Faster Mixing – Standard random walks forward packets uniformly to any neighbor, leading to long mixing times, especially in irregular topologies. The authors introduce a locally computed weight for each neighbor i:
  w_i = α·(remaining memory_i) + β·(inverse of packets received_i).
The parameters α and β are tunable constants that balance memory availability against load balancing. Normalizing these weights yields a probability distribution that preferentially forwards packets toward less‑loaded, memory‑rich nodes. This strategy accelerates the diffusion of encoded packets across the network while reducing redundant transmissions.
Algorithmic Flow – (a) Sources generate LT‑encoded packets; (b) each packet initiates a random walk; (c) at each hop the current node selects the next hop using the weighted probabilities; (d) nodes continuously update their local estimates of n and k; (e) any node that collects roughly k·(1 + ε) distinct encoded packets runs the LT decoder (the classic “peeling” algorithm) to recover all original packets.

Performance Evaluation
Simulations were conducted on networks of size n = 1,000, 2,000, and 5,000 with average node degrees ranging from 4 to 8. The authors varied the node failure rate from 0 % to 30 % and used an IEEE 802.15.4‑style energy model. Three baselines were compared: (i) a conventional random‑walk + fountain‑code scheme that requires global parameter estimation, (ii) pure flooding, and (iii) the proposed weighted‑walk LT scheme.

Transmission Overhead – The proposed method reduced total packet transmissions by an average of 30 % relative to the baseline random‑walk approach. The reduction was more pronounced in denser topologies where the weighted forwarding efficiently avoids over‑loading any single neighbor.
Decoding Success Rate – For the same number of transmitted packets, the new algorithm achieved a 15 %–20 % higher probability of successful LT decoding. Even when 20 % of nodes failed, the recovery probability remained above 95 %.
Energy Consumption – Because transmissions dominate energy use in WSNs, the 30 % transmission cut translated into roughly a 28 % reduction in overall energy expenditure.
Mixing Time & Latency – The weighted forwarding decreased the mixing time by about 20 % on average, leading to faster data dissemination and lower end‑to‑end latency for reconstruction.

Strengths and Limitations
The primary strength lies in eliminating any extra control traffic for global parameter estimation, which directly addresses the energy and bandwidth constraints of sensor networks. The use of purely local information makes the scheme topology‑agnostic and scalable. LT coding’s low computational complexity enables on‑node encoding/decoding without specialized hardware. However, the approach still requires a sufficient number of encoded packets to be collected before decoding can succeed; in sparse networks or under extreme node loss, the initial packet generation phase may need to be prolonged. The weighted forwarding relies on accurate, timely knowledge of each neighbor’s remaining memory and recent packet receipt count; stale information could lead to sub‑optimal load distribution. Finally, while the random‑walk model works well in moderately connected graphs, extremely sparse or partitioned networks could experience prolonged mixing times.

Future Directions
The authors suggest several extensions: (1) adaptive tuning of the weighting coefficients α and β based on real‑time battery levels or traffic congestion; (2) parallelizing multiple independent random walks to further reduce mixing time and mitigate collisions; (3) hybridizing LT with more advanced fountain codes (e.g., Raptor) to lower the overhead required for successful decoding; and (4) implementing the protocol on real sensor platforms (e.g., TinyOS or Contiki) to validate performance under realistic radio interference and environmental conditions.

Conclusion
By integrating LT fountain coding with a locally optimized random‑walk dissemination strategy, the paper presents a practical, energy‑efficient solution for robust data storage in large‑scale sensor networks. The method achieves significant reductions in transmission overhead, improves decoding reliability, and remains resilient under high node‑failure rates—all without requiring any global coordination messages. The simulation results convincingly demonstrate superiority over existing random‑walk and flooding schemes, making the approach a strong candidate for deployment in disaster‑response, environmental monitoring, and other mission‑critical WSN applications where data persistence is paramount.