Building Resilient Cloud Over Unreliable Commodity Infrastructure
Cloud Computing has emerged as a successful computing paradigm for efficiently utilizing managed compute infrastructure such as high speed rack-mounted servers, connected with high speed networking, and reliable storage. Usually such infrastructure is dedicated, physically secured and has reliable power and networking infrastructure. However, much of our idle compute capacity is present in unmanaged infrastructure like idle desktops, lab machines, physically distant server machines, and laptops. We present a scheme to utilize this idle compute capacity on a best-effort basis and provide high availability even in face of failure of individual components or facilities. We run virtual machines on the commodity infrastructure and present a cloud interface to our end users. The primary challenge is to maintain availability in the presence of node failures, network failures, and power failures. We run multiple copies of a Virtual Machine (VM) redundantly on geographically dispersed physical machines to achieve availability. If one of the running copies of a VM fails, we seamlessly switchover to another running copy. We use Virtual Machine Record/Replay capability to implement this redundancy and switchover. In current progress, we have implemented VM Record/Replay for uniprocessor machines over Linux/KVM and are currently working on VM Record/Replay on shared-memory multiprocessor machines. We report initial experimental results based on our implementation.
💡 Research Summary
The paper addresses the problem of harnessing idle, unmanaged compute resources—such as desktop PCs, lab machines, remote servers, and laptops—to provide cloud‑style services with high availability, despite the inherent unreliability of the underlying commodity infrastructure. The authors propose a two‑layer solution: (1) redundant execution of each virtual machine (VM) on multiple geographically dispersed physical hosts, and (2) a record‑and‑replay mechanism that captures the complete execution state of a VM (CPU registers, memory page changes, disk and network I/O) in a sequential log.
In the normal operating mode, each VM instance runs simultaneously on several hosts. The hypervisor (KVM on Linux) intercepts events and writes them to a log stream that is either transmitted in real time to the other replicas or stored as periodic snapshots. A lightweight heartbeat protocol exchanges liveness information among the replicas; if a host fails to respond within a configurable timeout, it is declared dead. The system then selects the replica that holds the most recent log and immediately begins replaying that log to reconstruct the VM’s state. Because replay proceeds deterministically from the recorded events, the fail‑over can be completed in a matter of seconds, far faster than traditional live migration, which typically requires tens of seconds to minutes to copy an entire memory image and restart the VM.
The authors have implemented the record/replay subsystem for single‑core VMs on Linux/KVM. Their measurements show an average CPU overhead of 5–10 % during recording, and a modest network bandwidth consumption of 2–3 Mbps for log propagation. In fault‑injection experiments, the system restored service within roughly 2 seconds after a host failure. The paper also discusses the challenges of extending the technique to shared‑memory multiprocessor (SMP) VMs, where maintaining memory consistency and cache coherence across cores adds significant complexity. Ongoing work includes per‑core logging, explicit memory‑ordering points, and scalable log compression.
Security considerations are addressed by authenticating replica‑to‑replica communication with TLS and protecting log integrity using hash chains and digital signatures, preventing an adversary from tampering with the recorded execution trace. Access control is enforced consistently across all hosts via role‑based policies.
Limitations identified by the authors include: (a) the current prototype’s focus on single‑core workloads, which limits applicability to database servers and other multi‑threaded applications; (b) the growth of log storage, which can increase storage costs and management overhead unless compression or incremental transmission techniques are employed; and (c) the risk of simultaneous loss of all replicas in a large‑scale disaster (e.g., regional power outage). To mitigate the latter, the authors suggest deploying replicas across multiple continents and incorporating geo‑redundant storage.
In summary, the paper presents a practical framework for turning otherwise wasted commodity compute capacity into a resilient cloud service. By leveraging VM redundancy and deterministic record/replay, the system achieves rapid fail‑over without the need for expensive, highly reliable data‑center hardware. The experimental results demonstrate modest performance penalties and sub‑second recovery times for single‑core workloads. Future work will focus on extending the approach to multi‑core VMs, improving log scalability, and strengthening disaster‑recovery guarantees, thereby moving the concept closer to production‑grade deployment in enterprise environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment