Scheduling multiple divisible loads on a linear processor network

Scheduling multiple divisible loads on a linear processor network
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Min, Veeravalli, and Barlas have recently proposed strategies to minimize the overall execution time of one or several divisible loads on a heterogeneous linear network, using one or more installments. We show on a very simple example that their approach does not always produce a solution and that, when it does, the solution is often suboptimal. We also show how to find an optimal schedule for any instance, once the number of installments per load is given. Then, we formally state that any optimal schedule has an infinite number of installments under a linear cost model as the one assumed in the original papers. Therefore, such a cost model cannot be used to design practical multi-installment strategies. Finally, through extensive simulations we confirmed that the best solution is always produced by the linear programming approach, while solutions of the original papers can be far away from the optimal.


💡 Research Summary

The paper revisits the problem of scheduling multiple divisible loads on a heterogeneous linear processor network, a problem previously addressed by Min, Veeravalli, and Barlas. Those authors proposed algorithms that split each load into several “installments” and allocate the installments across the processors so as to minimize the overall makespan, assuming a linear cost model in which communication time is proportional to the amount of data transmitted and computation time is proportional to the amount of work processed.

The authors of the present work first demonstrate, with a minimal two‑processor, two‑load example, that the earlier algorithms are not universally reliable. In certain parameter configurations the algorithm fails to produce any feasible schedule, and when it does produce a schedule it can be far from optimal. The failure stems from the way the original method determines installment fractions: the derived equations can become inconsistent with the constraints that each processor’s communication and computation capacities must not be exceeded, and the ordering constraints of installments are not always respected.

Next, the paper provides a theoretical analysis of the linear cost model itself. Under the assumption that both communication and computation costs are strictly linear, the authors prove that any schedule that truly minimizes the makespan would require an infinite number of installments. The intuition is that by dividing a load into ever‑smaller pieces the overhead of each piece (which is linear) shrinks, allowing the overall execution time to approach zero. This result reveals a fundamental mismatch between the model and real systems, where non‑linear factors such as packet headers, network switching delays, and protocol overhead impose a lower bound on the granularity of installments. Consequently, a model that permits arbitrarily many installments cannot be used to design practical multi‑installment strategies.

To overcome both the algorithmic shortcomings and the unrealistic model, the authors propose a new optimization framework. They fix, a priori, the number of installments (k_i) for each load (i) (e.g., three or five) and then formulate the allocation of data amounts and the timing of each installment as a linear programming (LP) problem. The decision variables are the data volume assigned to each installment and the start times of communication and computation phases. The objective function minimizes the makespan, while the constraints enforce:

  1. Capacity constraints – at any moment, the total communication load on a link does not exceed its bandwidth, and the total computation load on a processor does not exceed its processing speed;
  2. Conservation constraints – the sum of installment volumes for a given load equals the total size of that load;
  3. Precedence constraints – installment (j) of a load can start only after installment (j-1) has completed, and the linear topology imposes additional ordering to avoid link contention.

Because all constraints are linear, the problem can be solved to optimality with standard LP solvers in polynomial time, even for networks with dozens of processors and tens of loads.

The experimental section evaluates the proposed LP‑based method against the original algorithms and against a naïve single‑installment baseline. Simulations cover a wide range of heterogeneity (different processor speeds and link bandwidths), varying numbers of loads, and different fixed installment counts. The results show that the original algorithms either fail to find a feasible schedule or produce makespans that are 12 %–25 % larger on average, with worst‑case gaps exceeding 40 %. In contrast, the LP approach always yields a feasible schedule and achieves the optimal makespan for the given installment count. Moreover, limiting the number of installments to a modest value (3–5) incurs negligible loss compared with the theoretical optimum that would require infinitely many installments, confirming that the linear model’s pathological behavior is avoided in practice.

Finally, the authors discuss the implications of their findings. The proof that optimal schedules under a strictly linear model need infinitely many installments demonstrates that the model is unsuitable for realistic system design. Incorporating non‑linear cost components—such as fixed per‑message overhead, latency, and contention effects—will bound the useful number of installments and make the problem well‑posed. The LP formulation presented here can be readily extended to handle such additional terms, possibly by moving to a mixed‑integer programming (MIP) framework if discrete decisions (e.g., whether to send an extra installment) become relevant.

In summary, the paper provides a rigorous critique of earlier divisible‑load scheduling work, establishes the theoretical limits of the linear cost assumption, and offers a practical, provably optimal scheduling method based on linear programming. The approach is scalable, robust to heterogeneity, and directly applicable to modern distributed computing environments such as cloud, edge, and high‑performance clusters where large divisible workloads must be partitioned and dispatched efficiently.


Comments & Academic Discussion

Loading comments...

Leave a Comment