Scheduling Dags under Uncertainty
This paper introduces a parallel scheduling problem where a directed acyclic graph modeling $t$ tasks and their dependencies needs to be executed on $n$ unreliable workers. Worker $i$ executes task $j$ correctly with probability $p_{i,j}$. The goal is to find a regimen $\Sigma$, that dictates how workers get assigned to tasks (possibly in parallel and redundantly) throughout execution, so as to minimize the expected completion time. This fundamental parallel scheduling problem arises in grid computing and project management fields, and has several applications. We show a polynomial time algorithm for the problem restricted to the case when dag width is at most a constant and the number of workers is also at most a constant. These two restrictions may appear to be too severe. However, they are fundamentally required. Specifically, we demonstrate that the problem is NP-hard with constant number of workers when dag width can grow, and is also NP-hard with constant dag width when the number of workers can grow. When both dag width and the number of workers are unconstrained, then the problem is inapproximable within factor less than 5/4, unless P=NP.
💡 Research Summary
The paper studies a parallel scheduling problem under uncertainty, where a set of t tasks with precedence constraints is represented by a directed acyclic graph (DAG) and must be executed by n unreliable workers. Worker i completes task j correctly with probability p_{i,j}. A “regimen” Σ specifies, at each moment, which workers are assigned to which tasks; assignments may be parallel and redundant (multiple workers can work on the same task simultaneously). The objective is to minimize the expected makespan, i.e., the expected time until all tasks in the DAG are successfully completed.
Model and Formalization
The authors formalize the problem as follows: the input consists of the DAG, the probability matrix {p_{i,j}}, and the number of workers n. A state is the set of tasks already completed. From any state, the regimen chooses a multiset of worker‑task pairs among the currently “available” tasks (those whose predecessors are already finished). When a worker attempts a task, it succeeds with its individual probability; failures cause the task to remain unfinished and may be retried later, possibly by a different worker. The expected completion time of a regimen Σ, denoted E
Comments & Academic Discussion
Loading comments...
Leave a Comment