[Paper Review] Scheduling Distributed Clusters of Parallel Machines: Primal-Dual and LP-based Approximation Algorithms
This paper presents the first constant-factor approximation algorithms for scheduling jobs across distributed clusters of parallel machines, minimizing weighted average completion time. It introduces a combinatorial algorithm based on a novel mapping to a single-machine special case and an LP-rounding approach with strong theoretical guarantees, achieving a 2-approximation under unit-speed conditions and extending prior work on concurrent open shop scheduling.
The Map-Reduce computing framework rose to prominence with datasets of such size that dozens of machines on a single cluster were needed for individual jobs. As datasets approach the exabyte scale, a single job may need distributed processing not only on multiple machines, but on multiple clusters. We consider a scheduling problem to minimize weighted average completion time of N jobs on M distributed clusters of parallel machines. In keeping with the scale of the problems motivating this work, we assume that (1) each job is divided into M "subjobs" and (2) distinct subjobs of a given job may be processed concurrently. When each cluster is a single machine, this is the NP-Hard concurrent open shop problem. A clear limitation of such a model is that a serial processing assumption sidesteps the issue of how different tasks of a given subjob might be processed in parallel. Our algorithms explicitly model clusters as pools of resources and effectively overcome this issue. Under a variety of parameter settings, we develop two constant factor approximation algorithms for this problem. The first algorithm uses an LP relaxation tailored to this problem from prior work. This LP-based algorithm provides strong performance guarantees. Our second algorithm exploits a surprisingly simple mapping to the special case of one machine per cluster. This mapping-based algorithm is combinatorial and extremely fast. These are the first constant factor approximations for this problem.
Motivation & Objective
- To address the challenge of scheduling large-scale jobs across multiple clusters of parallel machines, motivated by exabyte-scale data processing needs.
- To extend prior models by allowing heterogeneous machine speeds, release times, and weighted job completion times.
- To develop the first constant-factor approximation algorithms for this generalized distributed scheduling problem.
- To overcome limitations of prior heuristics like SWAG, which lack worst-case performance guarantees.
- To provide both LP-based and combinatorial algorithms with strong theoretical performance bounds.
Proposed method
- Proposes a new problem formulation—concurrent cluster scheduling—where each job is split into subjobs processed across m clusters, each with multiple parallel machines.
- Introduces a primal-dual inspired LP relaxation tailored to the problem, using a modified constraint set to model subjob completion and machine speed variations.
- Develops a combinatorial algorithm via a surprising mapping to the single-machine-per-cluster case, enabling O(n² + nm) time complexity.
- Applies LP rounding with a novel lower bound on job completion time Cj, not explicitly in the LP, to strengthen approximation guarantees.
- Uses a transformation from P||∑wjLj to CC||∑wjCj to prove optimality equivalence between solutions in the transformed and original problems.
- Employs constraint modification in the LP relaxation to capture structural properties of the cluster scheduling problem, enabling tighter bounds.
Experimental results
Research questions
- RQ1Can we design a constant-factor approximation algorithm for scheduling jobs across distributed clusters of parallel machines with heterogeneous speeds and release times?
- RQ2How does the performance of single-permutation schedules compare to multi-permutation schedules in concurrent cluster scheduling?
- RQ3Can a combinatorial algorithm achieve strong approximation ratios without relying on LP relaxation?
- RQ4What is the worst-case loss in optimality when restricting to single-permutation schedules in the unit-speed case?
- RQ5Can implicit constraint modifications in LP relaxations improve approximation performance in scheduling problems?
Key findings
- The paper presents the first constant-factor approximation algorithms for the concurrent cluster scheduling problem, extending beyond the NP-hard concurrent open shop model.
- The LP-based algorithm achieves a 2-approximation when all machines are of unit speed and subjobs are divided into equally sized tasks.
- The combinatorial algorithm, based on a mapping to the single-machine-per-cluster case, runs in O(n² + nm) time and provides a constant-factor approximation.
- Theoretical analysis shows that single-permutation schedules can incur up to a 1.2x optimality gap compared to globally optimal solutions, though a 3-approximation is always achievable.
- The LP relaxation is strengthened by incorporating an implicit lower bound on job completion time Cj, which is critical for achieving tight approximation ratios.
- The approach demonstrates that constraints redundant in standard LP relaxations can become essential when modeling cluster-specific scheduling structures.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.