Liquid Cloud Storage
A liquid system provides durable object storage based on spreading redundantly generated data across a network of hundreds to thousands of potentially unreliable storage nodes. A liquid system uses a combination of a large code, lazy repair, and a flow storage organization. We show that a liquid system can be operated to enable flexible and essentially optimal combinations of storage durability, storage overhead, repair bandwidth usage, and access performance.
💡 Research Summary
The paper “Liquid Cloud Storage” introduces a novel paradigm for durable object storage in distributed systems, designed to overcome the “repair bottleneck” inherent in traditional architectures. This bottleneck refers to the high network bandwidth required to quickly repair data lost from node failures in systems using small erasure codes (like Reed-Solomon) to maintain high durability.
The core innovation is the “Liquid System,” built upon three interconnected principles:
- Large Codes: Instead of using small (n, k, r) parameters (e.g., (14,10,4)) that spread an object’s fragments across a handful of nodes, liquid systems employ very large codes. Using fountain codes like RaptorQ, a single object is encoded into a vast number of fragments (n can be in the hundreds or thousands, ideally matching the number of nodes M). This provides extreme dispersion.
- Lazy Repair: Contrary to the reactive repair strategy of small-code systems—which immediately regenerates a lost fragment upon a node failure—liquid systems adopt a lazy, background repair process. Lost fragments from many failed nodes are aggregated into a “liquid” flow and repaired at a steady, controlled rate. Repair is triggered not for each individual loss but when a significant portion of an object’s redundancy is depleted. This dramatically improves repair efficiency (amount of data read per fragment regenerated) and flattens the peak repair bandwidth requirement.
- Flow Storage Organization: This is the architectural framework comprising a standard client interface (e.g., S3 API), an access tier of proxy servers that handle encoding/decoding using the RaptorQ codec, and a storage tier of servers that hold the fragments on disk. This separation allows for scalable and efficient data access and management.
The paper argues that this combination enables flexible and near-optimal trade-offs between storage overhead, repair bandwidth, durability (measured as Mean Time To Data Loss - MTTDL), and access performance. The large code size simplifies data placement (effectively one placement group for the entire system) and minimizes the impact of any single failure domain. Lazy repair capitalizes on the law of large numbers; with many fragments per object, the system can tolerate many simultaneous losses without immediate risk, allowing for slow, efficient repair.
Key practical advantages highlighted include:
- Elimination of unnecessary repair for transient node failures, saving bandwidth.
- No need for proactive sector failure scrubbing, as the system’s inherent lazy repair and large redundancy naturally handle sector errors.
- Potential for improved read access performance due to high parallelism from fragments spread across all nodes.
- Resilience against adversarial or correlated failure patterns when repair is properly regulated.
The paper positions liquid systems in relation to information-theoretic limits, citing prior work
Comments & Academic Discussion
Loading comments...
Leave a Comment