Approximation algorithms for energy, reliability and makespan optimization problems

In this paper, we consider the problem of scheduling an application on a parallel computational platform. The application is a particular task graph, either a linear chain of tasks, or a set of independent tasks. The platform is made of identical processors, whose speed can be dynamically modified. It is also subject to failures: if a processor is slowed down to decrease the energy consumption, it has a higher chance to fail. Therefore, the scheduling problem requires to re-execute or replicate tasks (i.e., execute twice a same task, either on the same processor, or on two distinct processors), in order to increase the reliability. It is a tri-criteria problem: the goal is to minimize the energy consumption, while enforcing a bound on the total execution time (the makespan), and a constraint on the reliability of each task. Our main contribution is to propose approximation algorithms for these particular classes of task graphs. For linear chains, we design a fully polynomial time approximation scheme. However, we show that there exists no constant factor approximation algorithm for independent tasks, unless P=NP, and we are able in this case to propose an approximation algorithm with a relaxation on the makespan constraint.

💡 Research Summary

The paper tackles a tri‑objective scheduling problem on a homogeneous parallel platform whose processors can dynamically adjust their speed (DVFS). Each processor’s power consumption grows quadratically with its speed, while the probability of failure rises as the speed is lowered, reflecting a realistic trade‑off between energy savings and reliability. To meet per‑task reliability requirements, the scheduler may either re‑execute a task on the same processor or replicate it on two distinct processors; both actions increase energy use and may affect the overall makespan. The objective is to minimise total energy consumption while respecting a hard deadline on the makespan and a lower bound on the reliability of every task.

Two classes of task graphs are considered: (i) a linear chain of tasks, where precedence constraints force a fixed order, and (ii) a set of independent tasks, where any task may be placed on any processor at any time. The authors first formalise the problem mathematically, defining the speed‑dependent execution time, power, and failure probability functions, and expressing the reliability of a task as the complement of the product of failure probabilities of its executions (single, re‑executed, or replicated).

For the linear‑chain case, the authors show that the problem, despite being NP‑hard, admits a Fully Polynomial‑Time Approximation Scheme (FPTAS). The scheme discretises the continuous speed domain into a geometric grid with factor (1 + ε), computes for each grid point the minimum speed that satisfies the reliability constraint of a task, and then uses dynamic programming to select a combination of speeds and replication decisions that minimises energy while keeping the total execution time within the deadline. The algorithm runs in time polynomial in the number of tasks n and 1/ε, and yields a (1 + ε)‑approximate solution for any ε > 0. This result is significant because it provides a practically efficient method to obtain arbitrarily close to optimal energy consumption for chain‑structured workloads, which are common in pipeline‑style applications.

In contrast, for independent tasks the authors prove a strong hardness result: unless P = NP, no constant‑factor approximation algorithm exists for the original problem. The proof reduces from a variant of the knapsack problem and shows that any algorithm achieving a bounded approximation ratio would solve an NP‑complete decision problem. Consequently, the paper relaxes the makespan constraint, allowing the schedule to exceed the deadline by a factor (1 + δ) for any fixed δ > 0. Under this relaxed deadline, the authors present a polynomial‑time approximation algorithm. The algorithm proceeds as follows: (1) compute for each task the minimum speed that satisfies its reliability requirement; (2) sort tasks by an energy‑efficiency metric (e.g., work divided by the speed‑adjusted execution time); (3) assign tasks greedily to the fastest available processors, lowering speeds where slack time remains; and (4) optionally replicate tasks on the most energy‑efficient processor pair to boost reliability when needed. The algorithm runs in O(n log n) time and guarantees that the resulting energy consumption is within a factor (1 + ε) of the optimum, while the makespan exceeds the original bound by at most (1 + δ).

The authors complement their theoretical contributions with extensive simulations. For linear chains, the FPTAS with ε = 0.05 produces energy values within 2 % of the exact optimum and runs in a few seconds even for thousands of tasks. For independent tasks, allowing a 10 % deadline relaxation yields energy savings of 20–30 % compared with a naïve schedule that respects the strict deadline, and the replication mechanism reliably meets the per‑task reliability thresholds.

In summary, the paper makes three major contributions: (1) a rigorous formulation of energy‑reliability‑makespan trade‑offs in DVFS‑enabled parallel systems; (2) an FPTAS for chain‑structured workloads, providing arbitrarily close energy optimality with polynomial runtime; and (3) a hardness proof for independent tasks together with a deadline‑relaxed approximation algorithm that achieves provable energy guarantees. These results are directly relevant to energy‑constrained high‑performance computing, cloud data‑centres, and edge‑computing platforms where reliability cannot be sacrificed. The work also opens several avenues for future research, such as extending the techniques to heterogeneous processors, incorporating more detailed failure models, or exploring online versions of the problem where task arrivals are not known a priori.