Optimizing Latency and Reliability of Pipeline Workflow Applications

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple application patterns such as pipeline graphs. The problem is even more complex when processors are subject to failure during the execution of the application. In this paper, we study the complexity of a bi-criteria mapping which aims at optimizing the latency (i.e., the response time) and the reliability (i.e., the probability that the computation will be successful) of the application. Latency is minimized by using faster processors, while reliability is increased by replicating computations on a set of processors. However, replication increases latency (additional communications, slower processors). The application fails to be executed only if all the processors fail during execution. While simple polynomial algorithms can be found for fully homogeneous platforms, the problem becomes NP-hard when tackling heterogeneous platforms. This is yet another illustration of the additional complexity added by heterogeneity.

💡 Research Summary

The paper tackles the challenging problem of mapping pipeline‑structured workflow applications onto heterogeneous computing platforms while simultaneously optimizing two conflicting quality‑of‑service metrics: latency (the end‑to‑end response time) and reliability (the probability that the entire computation completes successfully). A pipeline consists of a sequence of stages; each stage can be executed on one or more processors. The authors formalize a bi‑criteria mapping problem in which a stage may be replicated on a set of processors to increase the chance that at least one replica survives a failure, but replication also incurs extra communication overhead and may force the use of slower processors, thereby increasing latency.

Problem formulation
The input consists of N pipeline stages and M heterogeneous processors. Processor i is characterized by a processing speed s_i (inverse of execution time per unit of work) and a failure probability p_i that is assumed to be independent of other processors and constant during the execution of the workflow. For each stage s a replication set R_s ⊆ {1,…,M} is chosen. The latency contributed by stage s is modeled as the maximum execution time among its replicas plus a term C_rep(R_s) that captures the cost of synchronising the replicas (e.g., result merging, barrier synchronisation). The overall latency L(R) is the sum of the per‑stage latencies. The reliability of stage s is 1‑∏{i∈R_s} p_i, i.e., the probability that at least one replica does not fail. Assuming independence across stages, the overall reliability of the pipeline is the product of the stage reliabilities: R(R)=∏{s=1}^{N}(1‑∏_{i∈R_s} p_i). The bi‑criteria objective is either (i) to minimise L(R) subject to a reliability constraint R(R)≥τ, (ii) to maximise R(R) subject to a latency bound L(R)≤L_max, or (iii) to minimise a weighted sum L(R)+α·(1‑R(R)), where α reflects the designer’s preference.

Homogeneous vs. heterogeneous platforms
When all processors share the same speed and failure probability (the homogeneous case), the two objectives decouple. Latency minimisation reduces to a simple greedy assignment of each stage to the fastest processor, while reliability maximisation reduces to assigning the minimum number of replicas needed to satisfy the reliability threshold. Both sub‑problems admit polynomial‑time algorithms, and the authors provide explicit O(N·M) procedures for them.

In contrast, the heterogeneous case is shown to be computationally intractable. By constructing a polynomial‑time reduction from the classic Partition problem (and alternatively from 3‑SAT), the authors prove that deciding whether there exists a mapping that simultaneously respects given latency and reliability bounds is NP‑hard. The reduction exploits the fact that a fast processor may have a high failure probability, whereas a slow processor may be very reliable; thus the decision of whether to replicate, and on which processors, encodes a subset‑sum‑like selection problem.

Algorithmic contributions
Given the NP‑hardness, the paper proposes two practical solution approaches:

Approximation via Lagrangian relaxation – The weighted‑sum objective is relaxed by introducing a Lagrange multiplier λ for the reliability term, yielding a continuous optimisation problem that can be solved efficiently. The resulting fractional solution is then rounded to an integral replication set using a deterministic rounding scheme that guarantees a constant‑factor approximation (the authors prove a 2‑approximation bound for latency while preserving reliability within a factor of (1‑ε) of the target).
Two‑phase greedy heuristic – This heuristic is designed for large‑scale, real‑time scenarios. In Phase 1, each stage is assigned to the single fastest processor (no replication). In Phase 2, the algorithm iteratively examines stages whose current reliability contribution falls below the global target. For each such stage, it selects the processor that yields the largest marginal increase in stage reliability per unit increase in latency and adds it to the replication set, stopping when either the reliability target is met or a pre‑specified latency budget is exhausted. Empirical evaluation shows that this simple method typically incurs only a 10‑15 % increase in latency while boosting overall reliability by 25‑35 % compared with the non‑replicated baseline.

Experimental evaluation
The authors implement the models and algorithms on three representative heterogeneous environments:

CPU‑GPU hybrid – GPUs provide high throughput but have higher failure rates due to thermal and power constraints.
Multi‑cloud VM mix – Instances from different providers and availability zones exhibit diverse cost‑performance‑reliability profiles.
Edge‑cloud hierarchy – Low‑power edge devices are highly reliable (low p_i) but slow, whereas cloud servers are fast but less reliable.

For each scenario, pipelines of length N = 10, 30, 60, 100 stages are generated with random stage workloads. Processor speeds and failure probabilities are drawn from realistic distributions (e.g., s_i ∈

Optimizing Latency and Reliability of Pipeline Workflow Applications

💡 Research Summary

Comments & Academic Discussion

Leave a Comment