Scalable Hierarchical Scheduling for Malleable Parallel Jobs on Multiprocessor-based Systems

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The proliferation of multi-core and multiprocessor-based computer systems has led to explosive development of parallel applications and hence the need for efficient schedulers. In this paper, we study hierarchical scheduling for malleable parallel jobs on multiprocessor-based systems, which appears in many distributed and multilayered computing environments. We propose a hierarchical scheduling algorithm, named AC-DS, that consists of a feedback-driven adaptive scheduler, a desire aggregation scheme and an efficient resource allocation policy. From theoretical perspective, we show that AC-DS has scalable performance regardless of the number of hierarchical levels. In particular, we prove that AC-DS achieves $O(1)$-competitiveness with respect to the overall completion time of the jobs, or the makespan. A detailed malleable job model is developed to experimentally evaluate the effectiveness of the proposed scheduling algorithm. The results verify the scalability of AC-DS and demonstrate that AC-DS outperforms other strategies for a wide range of parallel workloads.

💡 Research Summary

The paper addresses the challenge of scheduling malleable parallel jobs in modern multi‑core and multiprocessor environments that are organized as hierarchical, tree‑structured systems such as grids and clouds. Existing work mainly focuses on two‑level hierarchies and does not consider scalability when the number of levels grows. To fill this gap, the authors propose a general hierarchical scheduling framework and a concrete algorithm called AC‑DS (Adaptive Control – Desire‑Sum – Dynamic Equi‑Partitioning).

System model
A set of n jobs arrives online; each job is malleable, i.e., its parallelism h_i(t) may vary during execution. If a_i(t) processors are allocated at time t, the execution rate is Γ_i(t)=min{a_i(t),h_i(t)}. Each job has total work w_i and span l_i (the time needed with one processor and with infinitely many processors, respectively). The computing platform is modeled as a rooted tree with K arbitrary levels and a total of P processors at the root. The objective is to allocate processors from the root down to the leaf jobs, without any knowledge of future arrivals, release times, or job characteristics, so as to minimize the makespan (overall completion time).

Algorithmic components

A‑Control (AC) – adaptive bottom‑level scheduler
At the end of each scheduling quantum q, the scheduler measures the amount of work w_i(q) completed and the reduction in span l_i(q). The average parallelism A_i(q)=w_i(q)/l_i(q) is computed and used as the processor “desire” for the next quantum: d_i(q+1)=A_i(q). The first quantum starts with a desire of 1. This simple feedback mechanism assumes that the average parallelism over a short interval is a good predictor of near‑future resource needs.
Desire‑Sum (DS) – intermediate‑level aggregation
Each internal node n_k_i receives the desires of its m children from the lower level, sums them, and reports the aggregate desire d_k_i = Σ_j d_{k‑1}_j to its parent. Quantum lengths may differ across levels, but the authors assume that a higher‑level quantum is an integer multiple of the immediate lower‑level quantum, ensuring that only the most recent desires are used when a higher‑level quantum expires.
Dynamic Equi‑Partitioning (DEQ) – resource allocation policy
When a node receives a total of a_k_i processors from its parent, DEQ distributes them among its children as follows: children whose desire does not exceed the current equal share a_k_i/|N| are satisfied first; the allocated processors are subtracted, the equal share is recomputed, and the process repeats recursively. If no child can be satisfied, the remaining processors are divided equally among all unsatisfied children. Fractional processor allocation is allowed for analytical convenience, effectively modeling time‑sharing.

Theoretical analysis
The authors prove that AC‑DS achieves O(1) competitive ratio with respect to the optimal offline scheduler, regardless of the depth K. The proof hinges on three lemmas: (i) a job receiving at least half of its average parallelism runs at ≥½ of its optimal speed; (ii) DS+DEQ never allocate more processors than the total desire while satisfying the maximal number of children; (iii) these properties propagate up the tree, guaranteeing that the makespan of AC‑DS is bounded by a constant factor of the optimal makespan.

Experimental evaluation
A new malleable job model is built by extending an existing model with generic parallelism variations (gradual increase, sudden drops, periodic oscillations). Three hierarchical strategies are compared: AC‑DS, Equal‑Partition (static equal split), and Fixed‑Quota (pre‑assigned quotas). Simulations are run on trees with 2–6 levels and varying numbers of jobs. Results show that AC‑DS consistently yields lower makespans; the advantage grows with the number of levels, where AC‑DS’s makespan increase is almost flat while the baselines degrade noticeably. Processor utilization stays between 85 % and 95 % under AC‑DS, and the algorithm automatically throttles over‑allocation when job desires exceed available resources.

Contributions

A general hierarchical, non‑clairvoyant scheduling framework that works for any number of levels.
The AC‑DS algorithm that combines a simple feedback‑driven demand estimator, a linear aggregation scheme, and a fair yet efficient allocation policy, achieving constant‑factor optimality.
A comprehensive experimental methodology based on a realistic malleable workload model, demonstrating scalability and robustness of AC‑DS.

The paper concludes with suggestions for future work, including handling heterogeneous cores, network latency, dynamic restructuring of the hierarchy, and incorporating energy‑aware objectives.

Scalable Hierarchical Scheduling for Malleable Parallel Jobs on Multiprocessor-based Systems

💡 Research Summary

Comments & Academic Discussion

Leave a Comment