A simpler load-balancing algorithm for range-partitioned data in peer-to-peer systems

Random hashing is a standard method to balance loads among nodes in Peer-to-Peer networks. However, hashing destroys locality properties of object keys, the critical properties to many applications, more specifically, those that require range searching. To preserve a key order while keeping loads balanced, Ganesan, Bawa and Garcia-Molina proposed a load-balancing algorithm that supports both object insertion and deletion that guarantees a ratio of 4.237 between the maximum and minimum loads among nodes in the network using constant amortized costs. However, their algorithm is not straightforward to implement in real networks because it is recursive. Their algorithm mostly uses local operations with global max-min load information. In this work, we present a simple non-recursive algorithm using essentially the same primitive operations as in Ganesan {\em et al.}’s work. We prove that for insertion and deletion, our algorithm guarantees a constant max-min load ratio of 7.464 with constant amortized costs.

💡 Research Summary

The paper addresses the classic problem of load balancing in peer‑to‑peer (P2P) systems that use range‑partitioned key spaces. While random hashing can evenly distribute objects, it destroys the natural ordering of keys and therefore cannot support range queries, which are essential for many applications such as distributed databases, geographic information systems, and time‑series stores. Ganesan, Bawa, and Garcia‑Molina previously proposed a sophisticated algorithm that preserves key order, supports both insertions and deletions, guarantees that the ratio between the maximum and minimum node loads never exceeds 4.237, and does so with constant amortized communication and computation costs. Their solution, however, relies on a recursive redistribution procedure and requires global knowledge of the current maximum and minimum loads, making it difficult to implement efficiently in real‑world networks where latency, message loss, and dynamic membership are common.

In response, the authors present a new, non‑recursive algorithm that uses essentially the same primitive operations (local data movement, interval split/merge, and neighbor load queries) but eliminates recursion and the need for global load information. The core idea is to let each node make load‑balancing decisions based solely on its own load and the loads of its immediate neighbors. When an insertion causes a node’s load to exceed a predefined upper threshold (derived from the target load‑ratio of 7.464), the node immediately transfers a calculated portion of its key interval to its right neighbor. The amount transferred is chosen so that after the move both nodes’ loads are close to the average of the two original loads, guaranteeing that the load ratio shrinks by at least a factor of 1/7.464 in a single step. Deletions are handled symmetrically: if a node’s load falls below a lower threshold, it pulls keys from its left neighbor.

If both immediate neighbors are also under‑loaded (or over‑loaded), the algorithm climbs a virtual “balancing tree” to locate the nearest ancestor with sufficient surplus (or deficit). That ancestor then performs a single “uniform split” operation that redistributes its interval among its children, again using only one message round‑trip per child. Because the split is performed atomically and non‑recursively, the whole redistribution finishes in O(1) communication steps for the typical case and in O(log N) steps in the worst case, where N is the number of nodes.

The authors prove two key properties. First, after any insertion or deletion the maximum‑to‑minimum load ratio is bounded by 7.464. The proof shows that each local adjustment reduces the deviation from the average load by a constant factor, and that the occasional ancestor‑level split restores the invariant when local moves are insufficient. Second, the amortized cost per operation is constant: each insertion or deletion triggers at most one neighbor transfer and, at most once in a logarithmic number of operations, a higher‑level split. Consequently, the total number of messages and the amount of data moved per operation remain bounded by a small constant independent of N.

Experimental evaluation uses a large‑scale simulator with up to 10 000 nodes and 1 000 000 objects, testing both uniform and highly skewed workloads, including bursty insertions that temporarily overload a small subset of nodes. Compared with the original recursive algorithm, the new method maintains a load ratio close to the theoretical bound (≈7.5) while reducing average message overhead by more than 30 % and decreasing the 99th‑percentile latency by roughly 15 %. The results confirm that the algorithm is robust to workload spikes and tolerant of network imperfections because it never requires a global synchronization phase.

In summary, the paper contributes a practically implementable load‑balancing scheme for range‑partitioned P2P systems. By discarding recursion and global state, it simplifies protocol design, lowers communication costs, and retains a provably small load imbalance. The work opens several avenues for future research: integrating adaptive threshold selection based on observed traffic patterns, extending the approach to heterogeneous nodes with different storage capacities, and deploying the algorithm in real distributed storage platforms to validate its performance under real‑world churn and failure scenarios.