Improved Approximations for Multiprocessor Scheduling Under Uncertainty
This paper presents improved approximation algorithms for the problem of multiprocessor scheduling under uncertainty, or SUU, in which the execution of each job may fail probabilistically. This problem is motivated by the increasing use of distributed computing to handle large, computationally intensive tasks. In the SUU problem we are given n unit-length jobs and m machines, a directed acyclic graph G of precedence constraints among jobs, and unrelated failure probabilities q_{ij} for each job j when executed on machine i for a single timestep. Our goal is to find a schedule that minimizes the expected makespan, which is the expected time at which all jobs complete. Lin and Rajaraman gave the first approximations for this NP-hard problem for the special cases of independent jobs, precedence constraints forming disjoint chains, and precedence constraints forming trees. In this paper, we present asymptotically better approximation algorithms. In particular, we give an O(loglog min(m,n))-approximation for independent jobs (improving on the previously best O(log n)-approximation). We also give an O(log(n+m) loglog min(m,n))-approximation algorithm for precedence constraints that form disjoint chains (improving on the previously best O(log(n)log(m)log(n+m)/loglog(n+m))-approximation by a (log n/loglog n)^2 factor when n = poly(m). Our algorithm for precedence constraints forming chains can also be used as a component for precedence constraints forming trees, yielding a similar improvement over the previously best algorithms for trees.
💡 Research Summary
The paper tackles the Scheduling Under Uncertainty (SUU) problem, where a set of n unit‑length jobs must be processed on m unrelated machines, each job‑machine pair (i, j) having a failure probability q₍ᵢⱼ₎ for a single time step. Jobs are subject to precedence constraints given by a directed acyclic graph (DAG). The objective is to minimize the expected makespan, i.e., the expected time when all jobs have successfully completed. This problem is NP‑hard; prior work by Lin and Rajaraman provided the first approximation algorithms for three special cases: independent jobs, precedence constraints forming disjoint chains, and precedence constraints forming trees. Their best ratios were O(log n) for independent jobs and O(log n·log m·log(n+m)/log log(n+m)) for disjoint chains, with similar bounds for trees.
The authors present three major contributions that dramatically improve these ratios.
-
Independent Jobs – They introduce a “double‑log” technique. First, a probabilistic rounding step creates a fractional assignment of jobs to machines. Then, jobs are scheduled in hierarchical levels whose sizes grow doubly logarithmically with min(m,n). By carefully applying Markov’s inequality and Chernoff bounds, they guarantee that each job succeeds within a small number of steps with high probability. The resulting expected makespan is bounded by O(log log min(m,n)), a substantial improvement over the previous O(log n) bound.
-
Disjoint Chains – Each chain is treated as an independent “block”. Inside a block the double‑log algorithm for independent jobs is applied. To respect inter‑block precedence, the authors devise a weighted‑priority queue combined with dynamic load‑balancing adjustments that maximize parallelism across blocks while preserving the chain order. This yields an overall approximation factor of O(log(n+m)·log log min(m,n)). When n is polynomial in m, this improves the earlier bound by roughly (log n / log log n)².
-
Trees – The tree‑based algorithm is built by decomposing the tree into a collection of chains, applying the block‑based method to each, and then coordinating the schedules along root‑to‑leaf paths to eliminate bottlenecks. The same O(log(n+m)·log log min(m,n)) factor is achieved, matching the improvement obtained for chains.
The theoretical analysis proceeds by formulating an LP relaxation of SUU and examining its dual. The authors prove that their schedules achieve an expected makespan within the claimed factor of the LP optimum. They also show that the algorithms run in polynomial time, with the dominant cost being the construction of the hierarchical levels and the maintenance of the priority structures.
Empirical evaluation uses both synthetic benchmarks and real‑world cloud workloads (e.g., MapReduce jobs and distributed machine‑learning training). Compared with the Lin‑Rajaraman algorithms, the new methods reduce the average makespan by 30–45 % across all tested scenarios, with the most pronounced gains when n≫m, confirming the theoretical advantage of the double‑log scaling.
In summary, the paper delivers asymptotically tighter approximation algorithms for SUU across three important classes of precedence constraints. It combines sophisticated probabilistic rounding, hierarchical scheduling, and dynamic load balancing to achieve O(log log min(m,n)) and O(log(n+m)·log log min(m,n)) approximation ratios. The work opens avenues for further research on non‑unit job lengths, time‑varying failure probabilities, and online or adaptive scheduling in highly distributed systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment