Minimizing Makespan in Sublinear Time via Weighted Random Sampling

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the classical makespan minimization scheduling problem where $n$ jobs must be scheduled on $m$ identical machines. Using weighted random sampling, we developed two sublinear time approximation schemes: one for the case where $n$ is known and the other for the case where $n$ is unknown. Both algorithms not only give a $(1+3ε)$-approximation to the optimal makespan but also generate a sketch schedule. Our first algorithm, which targets the case where $n$ is known and draws samples in a single round under weighted random sampling, has a running time of $\tilde{O}(\tfrac{m^5}{ε^4} \sqrt{n}+A(\ceiling{m\over ε}, ε ))$, where $A(\mathcal{N}, α)$ is the time complexity of any $(1+α)$-approximation scheme for the makespan minimization of $\mathcal{N}$ jobs. The second algorithm addresses the case where $n$ is unknown. It uses adaptive weighted random sampling, %\textit{that is}, it draws samples in several rounds, adjusting the number of samples after each round, and runs in sublinear time $\tilde{O}\left( \tfrac{m^5} {ε^4} \sqrt{n} + A(\ceiling{m\over ε}, ε )\right)$. We also provide an implementation that generates a weighted random sample using $O(\log n)$ uniform random samples.

💡 Research Summary

The paper tackles the classic makespan minimization problem on m identical parallel machines, where n jobs with processing times p₁,…,pₙ must be assigned so that the completion time of the last job (the makespan) is as small as possible. This problem is NP‑hard, and a wealth of polynomial‑time approximation schemes (PTAS) exist, but they all require reading the entire input. In massive data settings, even a linear scan may be infeasible, motivating sublinear‑time algorithms that read only a tiny fraction of the data while still delivering a provable approximation guarantee.

The authors’ key insight is to replace uniform random sampling—commonly used in sublinear algorithms—with weighted random sampling (WRS), where each job j is sampled with probability proportional to its processing time pⱼ (the weight). This ensures that large jobs, which dominate the makespan, are far more likely to appear in the sample than under uniform sampling, thereby avoiding the “miss the heavy job” pitfall that forces uniform methods to take linear samples.

The overall framework consists of two stages:

Sketch Construction – Using O(√n) weighted samples, the algorithm builds a compact representation ˜I of the original instance. Jobs are bucketed into geometric intervals Iₖ = (p′_max·(1−δ)ᵏ, p′_max·(1−δ)^{k−1}], where p′_max is an upper bound on the true maximum processing time. For each interval the algorithm estimates the number of jobs nₖ (denoted ˜nₖ) and records the interval’s upper bound ˜pₖ. The estimate is obtained via a “birth‑death paradox” argument: the probability that a sampled job falls into interval Iₖ equals the total weight of that interval divided by the overall weight, so the observed sample count yields an unbiased estimator of nₖ. Large jobs are guaranteed to appear in the sample; medium jobs are estimated group‑wise; very small jobs are ignored because their total processing time contributes at most a β₂‑fraction of the optimal makespan. The resulting sketch satisfies an (α, β₁, β₂)‑property: (i) each ˜nₖ approximates nₖ within (1±α), (ii) the optimal makespan of the sketch is within (1±β₁) of the true optimum, and (iii) the omitted small jobs contribute at most β₂·OPT.
Makespan Approximation from the Sketch – Any existing (1+α) approximation algorithm A for the full problem can be used as a black box. The meta‑algorithm extracts the h(m,δ)=⌈m/δ⌉ largest jobs from the sketch, runs A on them to obtain a schedule S with makespan T₀ ≤ (1+δ)·OPT_{large}. Let P = Σ ˜nₖ·˜pₖ be the total estimated work. The final estimate is T = (1+δ)·max(T₀, P/m). The authors prove that T ≤ (1+ε)·OPT and that the remaining (non‑large) jobs can be inserted into the schedule using simple List Scheduling without exceeding T. Thus the algorithm delivers a (1+ε)‑approximation while only accessing O(√n) weighted samples.

Two scenarios are considered:

Known n – A single‑round weighted sampling suffices. The total running time is
˜O(m⁵·ε⁻⁴·√n + A(⌈m/ε⌉, ε)).
The first term accounts for sampling, interval construction, and the birth‑death estimation; the second term is the cost of invoking the black‑box PTAS on the O(m/ε) largest jobs.
Unknown n – An adaptive sampling scheme is introduced. The algorithm starts with a small sample size, checks whether the current estimate of the total number of jobs is reliable, and if not doubles the sample size and repeats. This geometric increase continues until the birth‑death estimates become stable, after which the same sketch‑construction and meta‑algorithm are applied. The overall complexity remains the same asymptotically.

A practical contribution is an implementation detail: a weighted sample can be generated using only O(log n) uniform random numbers by maintaining a cumulative weight array and performing binary search on a uniformly drawn value. This keeps the sampling overhead low.

Complexity discussion – The dominant factor m⁵·ε⁻⁴·√n reflects the cost of obtaining enough weighted samples to guarantee, with high probability, that all “large” jobs are captured and that the birth‑death estimator has low variance. While sublinear in n (√n), the dependence on m and ε is polynomial and can become prohibitive for large machine counts or very tight approximation factors. The second term, A(⌈m/ε⌉, ε), is essentially the runtime of any existing PTAS on a reduced instance of size O(m/ε); modern PTAS’s achieve near‑linear time in that reduced size, so this term is usually modest.

Strengths –

The use of weighted sampling directly addresses the core difficulty of uniform sampling in scheduling: the need to see the heaviest jobs.
The sketch abstraction is clean, with explicit (α,β₁,β₂) guarantees that make the downstream approximation analysis straightforward.
By treating any PTAS as a black box, the method inherits the best known approximation ratios without reinventing the scheduling core.
The adaptive version removes the need for prior knowledge of n, which is valuable in streaming or database contexts.

Limitations and open questions –

The m⁵·ε⁻⁴ factor may limit practicality; reducing the exponent on m (perhaps via more clever sampling or concentration bounds) would broaden applicability.
The birth‑death estimator assumes that the weight distribution is not pathological; heavy‑tailed distributions could still require more samples to achieve the desired confidence.
Ignoring small jobs relies on β₂ being sufficiently tiny; the paper does not provide concrete constants, leaving the practitioner to tune δ and β₂.
The transition from a “sketch schedule” to a concrete full schedule is only sketched; an explicit algorithm with empirical evaluation would strengthen the contribution.
No experimental results are presented; real‑world workloads (e.g., cloud batch jobs) could reveal hidden constants and overheads, especially the cost of maintaining cumulative weight structures for massive n.

Future directions –

Investigate whether the sampling cost can be reduced to O(√n · polylog m) by using more sophisticated importance‑sampling techniques or by exploiting structure in the processing‑time distribution.
Extend the framework to related objectives such as minimizing the sum of completion times, energy‑aware scheduling, or multi‑objective trade‑offs.
Develop incremental update procedures for the sketch when jobs arrive or depart, enabling truly online sublinear scheduling.
Conduct a thorough empirical study on large‑scale synthetic and production traces to validate the theoretical guarantees and to calibrate the constants (δ, β₁, β₂) for practical ε values.

In summary, the paper introduces a novel sublinear‑time approach to makespan minimization by leveraging weighted random sampling to construct a compact, provably accurate sketch of the job set. It bridges the gap between sublinear data access and high‑quality scheduling approximations, and it opens a promising line of research at the intersection of streaming algorithms, importance sampling, and classic combinatorial optimization.

Minimizing Makespan in Sublinear Time via Weighted Random Sampling

💡 Research Summary

Comments & Academic Discussion

Leave a Comment