Computer Science / Distributed, Parallel, and Cluster Computing

Pipelined Repair Techniques for Erasure-Coded Storage Algorithms and Analysis

February 05, 2026

Reading time: 3 minute

...

#paper #research

📝 Original Paper Info

- Title: Repair Pipelining for Erasure-Coded Storage Algorithms and Evaluation
- ArXiv ID: 1908.01527
- Date: 2020-11-23
- Authors: Xiaolu Li, Zuoru Yang, Jinhong Li, Runhui Li, Patrick P. C. Lee, Qun Huang, Yuchong Hu

📝 Abstract

We propose repair pipelining, a technique that speeds up the repair performance in general erasure-coded storage. By carefully scheduling the repair of failed data in small-size units across storage nodes in a pipelined manner, repair pipelining reduces the single-block repair time to approximately the same as the normal read time for a single block in homogeneous environments. We further design different extensions of repair pipelining algorithms for heterogeneous environments and multi-block repair operations. We implement a repair pipelining prototype, called ECPipe, and integrate it as a middleware system into two versions of Hadoop Distributed File System (HDFS) (namely HDFS-RAID and HDFS-3) as well as Quantcast File System (QFS). Experiments on a local testbed and Amazon EC2 show that repair pipelining significantly improves the performance of degraded reads and full-node recovery over existing repair techniques.

💡 Summary & Analysis

The paper discusses a new technique called "repair pipelining" aimed at enhancing the repair efficiency in erasure-coded storage systems. This method addresses a critical challenge faced by distributed file systems where node failures necessitate time-consuming and resource-intensive recovery processes, often leading to reduced overall system performance. Repair pipelining works by breaking down the damaged data into small units and scheduling their restoration across multiple nodes in a parallel fashion. The result is that the repair time for each block becomes comparable to normal read times, significantly improving efficiency.

The authors extended this approach to handle heterogeneous environments and multi-block repairs, ensuring robust performance across different conditions. They implemented a prototype called ECPipe, integrating it into various versions of Hadoop Distributed File System (HDFS) and Quantcast File System (QFS). Through experiments conducted on local testbeds and Amazon EC2, they demonstrated that repair pipelining outperforms traditional recovery methods in terms of degraded read performance and full-node recovery. This work is crucial for enhancing the efficiency of distributed storage systems, particularly beneficial in large-scale cloud environments where data integrity and rapid recovery are paramount.

📄 Full Paper Content (ArXiv Source)

[^1]: An earlier version of this article appeared in . In this extended version, we extend repair pipelining for hierarchical data centers and multi-block repair operations. We also implement and evaluate repair pipelining in Hadoop 3.1.1 HDFS. This work was supported in part by the Research Grants Council of Hong Kong (GRF 14216316 and AoE/P-404/18) and National Natural Science Foundation of China (61872414, 61502191, and 61802365). Corresponding author: Patrick P. C. Lee (pclee@cse.cuhk.edu.hk)

📄 Read Full PDF on ArXiv

📊 논문 시각자료 (Figures)

A Note of Gratitude

The copyright of this content belongs to the respective researchers. We deeply appreciate their hard work and contribution to the advancement of human civilization.

Pipelined Repair Techniques for Erasure-Coded Storage Algorithms and Analysis

📝 Original Paper Info

📝 Abstract

💡 Summary & Analysis

📄 Full Paper Content (ArXiv Source)

📊 논문 시각자료 (Figures)

A Note of Gratitude

Table of Contents

Table of Contents

📝 Original Paper Info

📝 Abstract

💡 Summary & Analysis

📄 Full Paper Content (ArXiv Source)

📊 논문 시각자료 (Figures)

A Note of Gratitude

Related Posts

Sabrina Modeling and Visualizing Economic Data with Incremental Domain Knowledge

EXPSPACE-Completeness of Logics K4xS5, S4xS5, and the Logic of Subset Spaces, Part 2 EXPSPACE-Hardness

PAStime Progress-Aware Scheduling for Time-Critical Computing

Start searching

No results found