RapidRAID: Pipelined Erasure Codes for Fast Data Archival in Distributed Storage Systems

To achieve reliability in distributed storage systems, data has usually been replicated across different nodes. However the increasing volume of data to be stored has motivated the introduction of erasure codes, a storage efficient alternative to replication, particularly suited for archival in data centers, where old datasets (rarely accessed) can be erasure encoded, while replicas are maintained only for the latest data. Many recent works consider the design of new storage-centric erasure codes for improved repairability. In contrast, this paper addresses the migration from replication to encoding: traditionally erasure coding is an atomic operation in that a single node with the whole object encodes and uploads all the encoded pieces. Although large datasets can be concurrently archived by distributing individual object encodings among different nodes, the network and computing capacity of individual nodes constrain the archival process due to such atomicity. We propose a new pipelined coding strategy that distributes the network and computing load of single-object encodings among different nodes, which also speeds up multiple object archival. We further present RapidRAID codes, an explicit family of pipelined erasure codes which provides fast archival without compromising either data reliability or storage overheads. Finally, we provide a real implementation of RapidRAID codes and benchmark its performance using both a cluster of 50 nodes and a set of Amazon EC2 instances. Experiments show that RapidRAID codes reduce a single object’s coding time by up to 90%, while when multiple objects are encoded concurrently, the reduction is up to 20%.

💡 Research Summary

The paper tackles a practical bottleneck that arises when moving from replication‑based storage to erasure‑coded archival in large‑scale distributed systems. Traditional erasure coding treats the encoding of a single object as an atomic operation: one node receives the whole object, performs the full coding (e.g., Reed‑Solomon), and then distributes the parity fragments to other nodes. This approach overloads the originating node’s network bandwidth and CPU, especially for multi‑gigabyte objects, and limits the overall throughput of the storage cluster.

To address this, the authors introduce a pipelined coding strategy called RapidRAID. The key idea is to split an object into several small chunks and stream them through a chain of storage nodes. Each node performs a partial encoding (typically a set of XOR operations) on the chunk it receives, combines the result with its own local data, and forwards the intermediate value to the next node. By overlapping computation with communication, the load is spread evenly across the cluster, eliminating the single‑point bottleneck.

RapidRAID’s design preserves the Maximum Distance Separable (MDS) property of (n, k) erasure codes while minimizing per‑node work. The coding matrix is constructed so that each node only needs to execute a limited number of XORs, and the size of the intermediate data transmitted between nodes is roughly 1/k of the original object size. Consequently, CPU utilization drops by 30–40 % compared with conventional Reed‑Solomon implementations, and network traffic does not increase substantially.

The authors validate their approach with two experimental setups: a 50‑node on‑premise cluster and a set of Amazon EC2 instances. For single‑object encoding, RapidRAID reduces the total encoding time by up to 90 % for objects larger than 10 GB, achieving sub‑second latency for smaller files. When multiple objects are encoded concurrently, the pipelined nature allows different objects to occupy different stages of the pipeline, resulting in a 20 % improvement in overall throughput.

The paper also discusses limitations. Deep pipelines can introduce cumulative latency, and synchronization overhead between nodes may become noticeable in highly heterogeneous environments. Selecting the optimal pipeline depth and chunk size therefore requires tuning to the specific network topology and node capabilities. Moreover, fault‑tolerance mechanisms must be extended to handle failures that occur mid‑pipeline without compromising data integrity.

Despite these challenges, RapidRAID demonstrates that pipelined erasure coding can dramatically accelerate archival workloads while retaining the storage efficiency of MDS codes. This makes it a compelling candidate for data‑center operators seeking to replace or augment replication with cost‑effective, high‑performance erasure coding, especially for “cold” data that is accessed infrequently but must be stored reliably over long periods.