A Non-MDS Erasure Code Scheme For Storage Applications

A Non-MDS Erasure Code Scheme For Storage Applications
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper investigates the use of redundancy and self repairing against node failures in distributed storage systems, using various strategies. In replication method, access to one replication node is sufficient to reconstruct a lost node, while in MDS erasure coded systems which are optimal in terms of redundancy-reliability tradeoff, a single node failure is repaired after recovering the entire stored data. Moreover, regenerating codes yield a tradeoff curve between storage capacity and repair bandwidth. The current paper aims at investigating a new storage code. Specifically, we propose a non-MDS (2k, k) code that tolerates any three node failures and more importantly, it is shown using our code a single node failure can be repaired through access to only three nodes.


💡 Research Summary

The paper introduces a novel non‑Maximum‑Distance‑Separable (non‑MDS) erasure‑coding scheme for distributed storage, denoted as a (2k, k) XOR‑based code. The system consists of 2k storage nodes: k systematic nodes S₁,…,S_k each store a data fragment d_i, and k parity nodes P₁,…,P_k each store the XOR of all data fragments except the one associated with the same index (p_i = ⊕_{j≠i} d_j). The fragments are of equal size M/k, so the total stored data is 2M, identical to a (2k, k) MDS code in terms of raw storage.

The main contributions are twofold. First, the code can tolerate any three simultaneous node failures. The authors argue that this holds for any combination of three nodes, and even for up to k‑1 failures provided that each failed node belongs to a distinct partition (a partition being the pair (S_i, P_i)). Second, a single node failure can be repaired by contacting only three other nodes, regardless of the value of k. The repair procedure depends on whether the partner parity node of the failed systematic node is still alive.

If the parity node P_i is available, the failed systematic node S_i can be reconstructed by downloading:

  1. the parity packet p_i from P_i,
  2. one data fragment d_j from a systematic node in a different partition,
  3. another data fragment d_k from a second systematic node in a third partition. Because p_i = d_j ⊕ d_k ⊕ d_i, the newcomer can compute d_i = p_i ⊕ d_j ⊕ d_k. The total download volume is 3·(M/k), i.e., three times the size of a single fragment, which is far smaller than the naive MDS repair that requires the whole file (M).

If the parity node P_i is also failed, the repair proceeds in two stages. First, P_i is reconstructed by contacting a subset of the remaining nodes: 2m parity nodes and (k‑1‑2m) systematic nodes from the other k‑1 partitions, where m can range from 0 to ⌊(k‑1)/2⌋. The number of possible node‑selection patterns for this stage is Σ_{m=0}^{⌊(k‑1)/2⌋} C(k‑1, 2m) = 2^{k‑2}. Once P_i is recovered, the same three‑node repair as above is applied to recover S_i. The total bandwidth for repairing both a parity and its associated systematic node is (k‑1)·(M/k) + 3·(M/k) = (k+2)·(M/k). For the example (n, k) = (10, 5), repairing two failed packets consumes 7·(M/5) of bandwidth, i.e., 3.5·M/5 per packet.

The authors compare their scheme with classical MDS codes, regenerating codes (both MSR and MBR points), and existing XOR‑based array codes. They note that while MSR codes achieve the optimal storage‑bandwidth trade‑off, they require contacting at least d ≥ k nodes (often d = k+1 or more) and involve random linear network coding over large finite fields. In contrast, the proposed code uses only XOR operations over the binary field, making it computationally cheap and suitable for low‑power or hardware‑constrained environments.

However, the paper also reveals several limitations. The storage overhead is 2× (total stored data = 2M), which is higher than the optimal MDS point for the same reliability level. The fault‑tolerance is limited: although any three nodes can fail, the code cannot survive arbitrary k failures, unlike a (2k, k) MDS code that tolerates k failures. Moreover, the repair guarantee assumes that the partner parity node of a failed systematic node is alive; if both members of a partition fail simultaneously, the repair must fall back to the more expensive two‑stage process, increasing bandwidth and latency. The “partition” concept introduces additional metadata management and may lead to load imbalance if failures are not uniformly distributed across partitions.

The paper does not provide experimental evaluation, simulations, or real‑world benchmarks. Consequently, the impact of network latency, node selection overhead, and the effect of uneven failure patterns on repair time remain unclear. The combinatorial analysis of possible node‑selection sets (2^{k‑2} for parity repair, 2^{k‑1} for data repair) is presented, but practical algorithms for choosing a “good” subset are not discussed.

In summary, the work demonstrates that a simple XOR‑based non‑MDS code can achieve very low repair bandwidth for single‑node failures (three‑node contact) while tolerating any three simultaneous failures. The trade‑off is reduced fault tolerance and higher total storage compared with optimal MDS or regenerating codes. The scheme may be attractive as a local repair layer in hierarchical storage systems, where fast, low‑cost repairs are needed for the most common single‑node failures, while more robust MDS or regenerating codes handle rare, larger failure events. Careful system‑level design, including partition management and failure‑model analysis, would be required to integrate this code into production‑grade distributed storage platforms.


Comments & Academic Discussion

Loading comments...

Leave a Comment