A reliability- and latency-driven task allocation framework for workflow applications in the edge-hub-cloud continuum

A reliability- and latency-driven task allocation framework for workflow applications in the edge-hub-cloud continuum
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A growing number of critical workflow applications leverage a streamlined edge-hub-cloud architecture, which diverges from the conventional edge computing paradigm. An edge device, in collaboration with a hub device and a cloud server, often suffices for their reliable and efficient execution. However, task allocation in this streamlined architecture is challenging due to device limitations and diverse operating conditions. Given the inherent criticality of such workflow applications, where reliability and latency are vital yet conflicting objectives, an exact task allocation approach is typically required to ensure optimal solutions. As no existing method holistically addresses these issues, we propose an exact multi-objective task allocation framework to jointly optimize the overall reliability and latency of a workflow application in the specific edge-hub-cloud architecture. We present a comprehensive binary integer linear programming formulation that considers the relative importance of each objective. It incorporates time redundancy techniques, while accounting for crucial constraints often overlooked in related studies. We evaluate our approach using a relevant real-world workflow application, as well as synthetic workflows varying in structure, size, and criticality. In the real-world application, our method achieved average improvements of 84.19% in reliability and 49.81% in latency over baseline strategies, across relevant objective trade-offs. Overall, the experimental results demonstrate the effectiveness and scalability of our approach across diverse workflow applications for the considered system architecture, highlighting its practicality with runtimes averaging between 0.03 and 50.94 seconds across all examined workflows.


💡 Research Summary

The paper addresses the problem of allocating tasks of safety‑critical workflow applications across a streamlined three‑tier architecture consisting of an edge device, a hub device, and a cloud server. Unlike conventional edge‑cloud hierarchies, the edge‑hub‑cloud continuum features a hub that is less powerful than an edge server but positioned closer to the edge, creating a unique set of constraints: limited CPU, memory, storage, battery energy on the edge and hub, and variable communication bandwidth and latency between tiers. The authors argue that for critical workflows, both reliability and latency must be optimized simultaneously, yet these objectives are inherently conflicting. Moreover, existing literature either focuses on a single objective, ignores important resource constraints, or relies on heuristics that cannot guarantee feasibility under all conditions.

To overcome these gaps, the authors propose an exact multi‑objective optimization framework based on Binary Integer Linear Programming (BILP). The methodology consists of two graph‑transformation steps. First, the original workflow task graph (TG) is converted into an intermediate Edge‑Hub‑Cloud graph (EG) that enumerates all feasible placements of each task on the three devices, while embedding device‑specific energy consumption, reliability, and communication models. Second, the EG is transformed into a Reliability‑aware Edge‑Hub‑Cloud graph (REG) that incorporates time‑redundancy techniques (dual or triple execution) required to meet per‑task reliability requirements. This two‑step transformation makes every admissible allocation and redundancy configuration explicit, allowing the BILP solver to select exactly one placement and one redundancy level per task.

The BILP objective is a weighted sum of two terms: (i) overall reliability, expressed as the complement of the product of failure probabilities of the selected executions, and (ii) total latency, which aggregates computation time, communication delay, and additional delay introduced by redundant executions. The weights are user‑defined, enabling exploration of reliability‑centric, latency‑centric, or balanced trade‑offs. The constraint set is comprehensive: each task must be assigned once; device capacities for CPU, memory, storage, and energy must not be exceeded; communication links must respect bandwidth limits; precedence relations of the workflow DAG must be honored; and each task’s minimum reliability threshold must be satisfied by the chosen redundancy level.

Experimental evaluation is conducted on two fronts. The first uses a real‑world UAV‑based power‑line inspection workflow comprising image capture, preprocessing, transmission, cloud‑based AI analysis, and feedback. Six different weight configurations are tested and compared against three baselines: a simple round‑robin allocation, an energy‑minimizing heuristic, and a state‑of‑the‑art reliability‑latency heuristic from the literature. The proposed exact method achieves an average reliability improvement of 84.19 % and a latency reduction of 49.81 % relative to the baselines, with the reliability‑focused configuration reaching over 95 % success probability.

The second set of experiments employs synthetic workflows with varying DAG structures and node counts ranging from 10 to 200. Across 30 generated instances, the BILP solver produces optimal solutions in 0.03 s to 50.94 s, demonstrating scalability for offline design‑time planning. Compared to heuristic baselines, the exact approach consistently delivers at least a 30 % gain in a composite reliability‑latency metric while satisfying all resource constraints.

Key contributions of the paper are: (1) a novel two‑step graph transformation that makes the full design space of placements and time‑redundancy options explicit; (2) a comprehensive multi‑objective BILP formulation that simultaneously accounts for memory, storage, energy, computation, communication, and reliability constraints; (3) integration of dual and triple execution techniques within the optimization to meet task‑level reliability requirements; (4) thorough empirical validation on both a realistic UAV workflow and a broad suite of synthetic DAGs, confirming significant performance improvements and practical solution times.

Limitations are acknowledged: the exact BILP approach, while feasible for offline planning, may become computationally intensive for very large or dynamically changing workloads, making it unsuitable for real‑time reallocation. Additionally, the energy model for redundant executions assumes static parameters and may not capture all runtime variations. Future work is suggested to explore problem decomposition, parallel BILP solving, hybrid exact‑heuristic schemes, and online adaptation mechanisms to extend applicability to dynamic edge‑hub‑cloud environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment