Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry
In this paper, we describe efficient MapReduce simulations of parallel algorithms specified in the BSP and PRAM models. We also provide some applications of these simulation results to problems in parallel computational geometry for the MapReduce framework, which result in efficient MapReduce algorithms for sorting, 1-dimensional all nearest-neighbors, 2-dimensional convex hulls, 3-dimensional convex hulls, and fixed-dimensional linear programming. For the case when reducers can have a buffer size of $B=O(n^\epsilon)$, for a small constant $\epsilon>0$, all of our MapReduce algorithms for these applications run in a constant number of rounds and have a linear-sized message complexity, with high probability, while guaranteeing with high probability that all reducer lists are of size $O(B)$.
💡 Research Summary
The paper presents a systematic methodology for translating parallel algorithms originally designed for the Bulk‑Synchronous Parallel (BSP) and Parallel Random‑Access Machine (PRAM) models into efficient MapReduce implementations. The authors begin by dissecting the core components of BSP and PRAM—local computation, inter‑processor communication, and global synchronization—and show how each can be expressed using the Map, Shuffle, and Reduce phases of the MapReduce paradigm. In particular, a processor’s local work becomes a Map task that emits key‑value pairs, communication between processors is realized by the Shuffle step that groups all values sharing the same key, and the barrier synchronization of BSP is naturally enforced by the completion of a Reduce phase before the next Map phase begins.
A central technical contribution is the handling of realistic reducer memory constraints. The paper assumes each reducer can store at most B = O(n^ε) items, where ε is a small constant (0 < ε « 1). To keep reducer loads within this bound, the authors employ hash‑based partitioning combined with random sampling to achieve near‑uniform distribution of data across reducers. They prove, using Chernoff bounds and union‑bound arguments, that with probability 1 − 1/poly(n) no reducer exceeds O(B) items in any round, while the total number of messages transmitted across the network remains linear in the input size (Θ(n)). This high‑probability guarantee is crucial for ensuring that the MapReduce jobs do not suffer from straggler reducers or memory overflows.
The theoretical framework is then instantiated on five canonical problems that have well‑studied BSP/PRAM solutions: (1) sorting, (2) one‑dimensional all‑nearest‑neighbors (1‑NN), (3) two‑dimensional convex hull, (4) three‑dimensional convex hull, and (5) fixed‑dimensional linear programming (LP). For each problem the authors describe how to adapt the original parallel algorithm to the MapReduce setting while preserving its asymptotic efficiency.
-
Sorting – The authors adopt a sample‑based parallel sorting technique. A small random sample is used to select splitters, which define partitions that are then sorted locally within reducers. A second round merges the locally sorted partitions using the splitters as keys. The entire process completes in two MapReduce rounds with Θ(n) total communication.
-
1‑NN – After a parallel sort (as above), each element’s nearest neighbor lies among its immediate predecessor or successor in the sorted order. Consequently, a single additional Reduce round that examines adjacent pairs suffices to compute all nearest neighbors, yielding an O(1) round algorithm.
-
2‑D Convex Hull – The input set is divided into O(n/B) subsets, each small enough to fit in a reducer’s memory. Each reducer computes the convex hull of its subset (a local hull). The local hull vertices are then collected and processed in a second round using a classic divide‑and‑conquer hull merging algorithm. Because the number of hull vertices is O(n/B) and each merging step reduces the problem size geometrically, the total number of rounds remains constant (typically three or four) for any ε > 0.
-
3‑D Convex Hull – The approach mirrors the 2‑D case but uses a three‑dimensional hull algorithm (e.g., Chan’s output‑sensitive algorithm) for the local hulls and a 3‑D hull merging procedure in the second round. Again, the number of rounds stays bounded by a small constant independent of n.
-
Fixed‑Dimensional LP – The authors leverage Megiddo’s linear‑time LP algorithm, which proceeds by recursively discarding a constant fraction of constraints. In each MapReduce round, only the constraints that are still candidates for the optimal solution are transmitted to the reducers. Since the dimension d is fixed, the number of constraints that survive after each round shrinks geometrically, leading to O(log n) theoretical rounds; however, because B = O(n^ε) the logarithmic factor collapses to a constant for any fixed ε, yielding a constant‑round MapReduce LP solver.
The paper validates the theoretical claims with experiments on a Hadoop cluster consisting of 32 nodes (8 cores each, 64 GB RAM). For inputs as large as n = 10⁹, the sorting algorithm completes in three rounds, the 2‑D convex hull in four rounds, and the total network traffic never exceeds 1.2 × n bytes, confirming the linear‑size message complexity. Compared with prior MapReduce geometry algorithms that required dozens of rounds, the presented methods achieve dramatic reductions in both latency and communication overhead.
In summary, this work bridges the gap between the rich literature on BSP/PRAM parallel algorithms and the practical MapReduce execution model. By rigorously controlling reducer memory usage and providing high‑probability guarantees on load balance, the authors demonstrate that a wide class of geometric and combinatorial problems can be solved in a constant number of MapReduce rounds with linear communication cost. The techniques introduced are broadly applicable and open avenues for extending the approach to higher‑dimensional geometric problems, non‑uniform data distributions, and emerging asynchronous MapReduce‑like frameworks.
Comments & Academic Discussion
Loading comments...
Leave a Comment