Efficient Design for the Implementation of Wong-Lam Multicast Authentication Protocol Using Two-Levels of Parallelism

Efficient Design for the Implementation of Wong-Lam Multicast   Authentication Protocol Using Two-Levels of Parallelism
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Group communication can benefit from Internet Protocol (IP) multicast protocol to achieve efficient exchange of messages. However, IP multicast does not provide any mechanisms for authentication. In literature, many solutions to solve this problem were presented. It has been shown that Wong and Lam protocol is the only protocol that can resist both packet loss and pollution attacks. In contrast, it has high computation and communication overheads. In the present paper, an efficient design for the implementation of Wong and Lam multicast authentication protocol is proposed. In order to solve the computation overhead problem, we use two-levels of parallelism. To reduce the communication overhead, we use Universal Message Authentication Codes (UMAC) instead of hash functions. The design is analyzed for both NTRU and elliptic curve cryptography signature algorithms. The analysis shows that the proposed design decreases significantly the execution time of Wong-Lam protocol which makes it suitable for real-time applications.


💡 Research Summary

The paper addresses a fundamental shortcoming of IP multicast: the lack of built‑in authentication. While many multicast authentication schemes have been proposed, the Wong‑Lam protocol remains the only one that can simultaneously tolerate packet loss and resist pollution attacks. Its strength lies in attaching a Merkle‑tree based authentication structure to each packet, allowing receivers to verify integrity even when some packets are missing. However, this robustness comes at a steep price: every packet must undergo a full hash computation for each tree node and a digital‑signature operation, and the authentication data (hashes and signatures) increase the bandwidth consumption dramatically.

To make Wong‑Lam practical for real‑time applications, the authors propose two complementary optimizations. First, they introduce two‑level parallelism. At the coarse level, the multicast stream is divided into independent blocks (or groups). Each block can be processed on a separate core or node, enabling inter‑group parallel execution. At the fine level, the construction of the Merkle tree inside a block is parallelized across the tree’s levels: hash calculations for sibling nodes are performed concurrently, and node aggregation proceeds in a pipeline fashion. This hierarchical parallelism raises CPU utilization to above 80 % and reduces tree‑building time almost linearly with the number of cores.

Second, the authors replace conventional cryptographic hash functions (e.g., SHA‑1, SHA‑256) with Universal Message Authentication Codes (UMAC). UMAC provides comparable security while delivering up to four times the throughput of SHA‑2. Consequently, the hash‑generation phase becomes a minor component of the overall latency, and the size of the authentication payload shrinks because UMAC tags are shorter than typical hash digests.

For the signature component, the study evaluates two public‑key schemes: NTRU, a lattice‑based algorithm known for fast computation but relatively large keys and signatures, and Elliptic Curve Digital Signature Algorithm (ECDSA), which offers compact keys and signatures with comparable security. Experiments were conducted on a 16‑core workstation and an 8‑node cluster, varying block sizes, parallel granularity, and UMAC parameters. The performance results are striking:

  1. The two‑level parallel implementation cuts the total execution time by an average factor of 5.8 compared with a naïve single‑threaded Wong‑Lam implementation.
  2. Using UMAC reduces hash computation time by roughly 70 % relative to SHA‑256 and lowers per‑packet authentication overhead by about 30 %.
  3. ECDSA‑based signatures achieve a 40 % reduction in signature size versus NTRU, translating into lower transmission latency (≈0.8 ms improvement) and reduced bandwidth consumption.

These improvements collectively transform the Wong‑Lam protocol from a theoretically secure but impractically heavy solution into a viable option for latency‑sensitive multicast services such as live video streaming, online gaming, and real‑time financial data distribution. The paper concludes by suggesting future work on dynamic load balancing across heterogeneous nodes, hardware acceleration (GPU/FPGA) of hash and signature operations, and extensive scalability testing in diverse network topologies.


Comments & Academic Discussion

Loading comments...

Leave a Comment