QUICK REVIEW

[Paper Review] Scheduling Distributed Clusters of Parallel Machines: Primal-Dual and LP-based Approximation Algorithms

Riley Murray, Megan Chao|arXiv (Cornell University)|Jan 1, 2016

Scheduling and Optimization Algorithms1 references1 citations

TL;DR

This paper presents the first constant-factor approximation algorithms for scheduling jobs across distributed clusters of parallel machines, minimizing weighted average completion time. It introduces a combinatorial algorithm based on a novel mapping to a single-machine special case and an LP-rounding approach with strong theoretical guarantees, achieving a 2-approximation under unit-speed conditions and extending prior work on concurrent open shop scheduling.

ABSTRACT

The Map-Reduce computing framework rose to prominence with datasets of such size that dozens of machines on a single cluster were needed for individual jobs. As datasets approach the exabyte scale, a single job may need distributed processing not only on multiple machines, but on multiple clusters. We consider a scheduling problem to minimize weighted average completion time of N jobs on M distributed clusters of parallel machines. In keeping with the scale of the problems motivating this work, we assume that (1) each job is divided into M "subjobs" and (2) distinct subjobs of a given job may be processed concurrently. When each cluster is a single machine, this is the NP-Hard concurrent open shop problem. A clear limitation of such a model is that a serial processing assumption sidesteps the issue of how different tasks of a given subjob might be processed in parallel. Our algorithms explicitly model clusters as pools of resources and effectively overcome this issue. Under a variety of parameter settings, we develop two constant factor approximation algorithms for this problem. The first algorithm uses an LP relaxation tailored to this problem from prior work. This LP-based algorithm provides strong performance guarantees. Our second algorithm exploits a surprisingly simple mapping to the special case of one machine per cluster. This mapping-based algorithm is combinatorial and extremely fast. These are the first constant factor approximations for this problem.

Motivation & Objective

To address the challenge of scheduling large-scale jobs across multiple clusters of parallel machines, motivated by exabyte-scale data processing needs.
To extend prior models by allowing heterogeneous machine speeds, release times, and weighted job completion times.
To develop the first constant-factor approximation algorithms for this generalized distributed scheduling problem.
To overcome limitations of prior heuristics like SWAG, which lack worst-case performance guarantees.
To provide both LP-based and combinatorial algorithms with strong theoretical performance bounds.

Proposed method

Proposes a new problem formulation—concurrent cluster scheduling—where each job is split into subjobs processed across m clusters, each with multiple parallel machines.
Introduces a primal-dual inspired LP relaxation tailored to the problem, using a modified constraint set to model subjob completion and machine speed variations.
Develops a combinatorial algorithm via a surprising mapping to the single-machine-per-cluster case, enabling O(n² + nm) time complexity.
Applies LP rounding with a novel lower bound on job completion time Cj, not explicitly in the LP, to strengthen approximation guarantees.
Uses a transformation from P||∑wjLj to CC||∑wjCj to prove optimality equivalence between solutions in the transformed and original problems.
Employs constraint modification in the LP relaxation to capture structural properties of the cluster scheduling problem, enabling tighter bounds.

Experimental results

Research questions

RQ1Can we design a constant-factor approximation algorithm for scheduling jobs across distributed clusters of parallel machines with heterogeneous speeds and release times?
RQ2How does the performance of single-permutation schedules compare to multi-permutation schedules in concurrent cluster scheduling?
RQ3Can a combinatorial algorithm achieve strong approximation ratios without relying on LP relaxation?
RQ4What is the worst-case loss in optimality when restricting to single-permutation schedules in the unit-speed case?
RQ5Can implicit constraint modifications in LP relaxations improve approximation performance in scheduling problems?

Key findings

The paper presents the first constant-factor approximation algorithms for the concurrent cluster scheduling problem, extending beyond the NP-hard concurrent open shop model.
The LP-based algorithm achieves a 2-approximation when all machines are of unit speed and subjobs are divided into equally sized tasks.
The combinatorial algorithm, based on a mapping to the single-machine-per-cluster case, runs in O(n² + nm) time and provides a constant-factor approximation.
Theoretical analysis shows that single-permutation schedules can incur up to a 1.2x optimality gap compared to globally optimal solutions, though a 3-approximation is always achievable.
The LP relaxation is strengthened by incorporating an implicit lower bound on job completion time Cj, which is critical for achieving tight approximation ratios.
The approach demonstrates that constraints redundant in standard LP relaxations can become essential when modeling cluster-specific scheduling structures.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.