Skip to main content
QUICK REVIEW

[Paper Review] Scheduling Distributed Clusters of Parallel Machines: Primal-Dual and LP-based Approximation Algorithms

Riley Murray, Megan Chao|arXiv (Cornell University)|Jan 1, 2016
Scheduling and Optimization Algorithms1 references1 citations
TL;DR

This paper presents the first constant-factor approximation algorithms for scheduling jobs across distributed clusters of parallel machines, minimizing weighted average completion time. It introduces a combinatorial algorithm based on a novel mapping to a single-machine special case and an LP-rounding approach with strong theoretical guarantees, achieving a 2-approximation under unit-speed conditions and extending prior work on concurrent open shop scheduling.

ABSTRACT

The Map-Reduce computing framework rose to prominence with datasets of such size that dozens of machines on a single cluster were needed for individual jobs. As datasets approach the exabyte scale, a single job may need distributed processing not only on multiple machines, but on multiple clusters. We consider a scheduling problem to minimize weighted average completion time of N jobs on M distributed clusters of parallel machines. In keeping with the scale of the problems motivating this work, we assume that (1) each job is divided into M "subjobs" and (2) distinct subjobs of a given job may be processed concurrently. When each cluster is a single machine, this is the NP-Hard concurrent open shop problem. A clear limitation of such a model is that a serial processing assumption sidesteps the issue of how different tasks of a given subjob might be processed in parallel. Our algorithms explicitly model clusters as pools of resources and effectively overcome this issue. Under a variety of parameter settings, we develop two constant factor approximation algorithms for this problem. The first algorithm uses an LP relaxation tailored to this problem from prior work. This LP-based algorithm provides strong performance guarantees. Our second algorithm exploits a surprisingly simple mapping to the special case of one machine per cluster. This mapping-based algorithm is combinatorial and extremely fast. These are the first constant factor approximations for this problem.

Motivation & Objective

  • To address the challenge of scheduling large-scale jobs across multiple clusters of parallel machines, motivated by exabyte-scale data processing needs.
  • To extend prior models by allowing heterogeneous machine speeds, release times, and weighted job completion times.
  • To develop the first constant-factor approximation algorithms for this generalized distributed scheduling problem.
  • To overcome limitations of prior heuristics like SWAG, which lack worst-case performance guarantees.
  • To provide both LP-based and combinatorial algorithms with strong theoretical performance bounds.

Proposed method

  • Proposes a new problem formulation—concurrent cluster scheduling—where each job is split into subjobs processed across m clusters, each with multiple parallel machines.
  • Introduces a primal-dual inspired LP relaxation tailored to the problem, using a modified constraint set to model subjob completion and machine speed variations.
  • Develops a combinatorial algorithm via a surprising mapping to the single-machine-per-cluster case, enabling O(n² + nm) time complexity.
  • Applies LP rounding with a novel lower bound on job completion time Cj, not explicitly in the LP, to strengthen approximation guarantees.
  • Uses a transformation from P||∑wjLj to CC||∑wjCj to prove optimality equivalence between solutions in the transformed and original problems.
  • Employs constraint modification in the LP relaxation to capture structural properties of the cluster scheduling problem, enabling tighter bounds.

Experimental results

Research questions

  • RQ1Can we design a constant-factor approximation algorithm for scheduling jobs across distributed clusters of parallel machines with heterogeneous speeds and release times?
  • RQ2How does the performance of single-permutation schedules compare to multi-permutation schedules in concurrent cluster scheduling?
  • RQ3Can a combinatorial algorithm achieve strong approximation ratios without relying on LP relaxation?
  • RQ4What is the worst-case loss in optimality when restricting to single-permutation schedules in the unit-speed case?
  • RQ5Can implicit constraint modifications in LP relaxations improve approximation performance in scheduling problems?

Key findings

  • The paper presents the first constant-factor approximation algorithms for the concurrent cluster scheduling problem, extending beyond the NP-hard concurrent open shop model.
  • The LP-based algorithm achieves a 2-approximation when all machines are of unit speed and subjobs are divided into equally sized tasks.
  • The combinatorial algorithm, based on a mapping to the single-machine-per-cluster case, runs in O(n² + nm) time and provides a constant-factor approximation.
  • Theoretical analysis shows that single-permutation schedules can incur up to a 1.2x optimality gap compared to globally optimal solutions, though a 3-approximation is always achievable.
  • The LP relaxation is strengthened by incorporating an implicit lower bound on job completion time Cj, which is critical for achieving tight approximation ratios.
  • The approach demonstrates that constraints redundant in standard LP relaxations can become essential when modeling cluster-specific scheduling structures.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.