[Paper Review] "Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products
Short-Dot introduces a coding-theory inspired method to compute large linear transforms in distributed systems by using many short, sparse dot products so that any K of P processors suffice to recover A x, mitigating stragglers.
Faced with saturation of Moore's law and increasing dimension of data, system designers have increasingly resorted to parallel and distributed computing. However, distributed computing is often bottle necked by a small fraction of slow processors called "stragglers" that reduce the speed of computation because the fusion node has to wait for all processors to finish. To combat the effect of stragglers, recent literature introduces redundancy in computations across processors, e.g.,~using repetition-based strategies or erasure codes. The fusion node can exploit this redundancy by completing the computation using outputs from only a subset of the processors, ignoring the stragglers. In this paper, we propose a novel technique -- that we call "Short-Dot" -- to introduce redundant computations in a coding theory inspired fashion, for computing linear transforms of long vectors. Instead of computing long dot products as required in the original linear transform, we construct a larger number of redundant and short dot products that can be computed faster and more efficiently at individual processors. In reference to comparable schemes that introduce redundancy to tackle stragglers, Short-Dot reduces the cost of computation, storage and communication since shorter portions are stored and computed at each processor, and also shorter portions of the input is communicated to each processor. We demonstrate through probabilistic analysis as well as experiments that Short-Dot offers significant speed-up compared to existing techniques. We also derive trade-offs between the length of the dot-products and the resilience to stragglers (number of processors to wait for), for any such strategy and compare it to that achieved by our strategy.
Motivation & Objective
- Motivate fast computation of high-dimensional linear transforms under straggler-induced latency.
- Develop a coding strategy that reduces per-processor dot-product length while preserving recoverability of A x.
- Characterize fundamental trade-offs between dot-product length and straggler resilience.
- Provide analysis and empirical results showing performance gains over existing schemes.
Proposed method
- Construct a P by N matrix F such that any K rows can linearly combine to recover the M rows of A, while each F row is sparse with s = (N/P)(P−K+M).
- Encode F offline using a matrix B and appended vectors to enforce the sparsity pattern and the recovery property.
- Distribute the short dot products to P processors; each processor computes a dot-product of x restricted to its sparsity pattern.
- The fusion node uses the first K responses to recover Ax via a linear combination determined by the corresponding rows of B.
- Provide a theoretical bound showing the sparsity limit and near-optimality for large N, and compare with MDS and repetition strategies.
- Analyze computation time under a shifted-exponential model to compare Short-Dot with uncoded, repetition, and MDS schemes.
Experimental results
Research questions
- RQ1Can any A x be recovered from K out of P short, sparse dot-products with controlled sparsity per processor?
- RQ2What is the fundamental trade-off between the length of the dot-products and the number of processors needed to wait for (K)?
- RQ3How does Short-Dot perform relative to uncoded, repetition, and MDS-based strategies under straggler conditions?
- RQ4Under what conditions is Short-Dot near-optimal in terms of sparsity and resilience?
- RQ5What are the expected computation-time benefits of Short-Dot in large-scale settings?
Key findings
- Short-Dot achieves a per-processor dot-product sparsity s = (N/P)(P−K+M) while ensuring any K rows of F can generate the A x vectors.
- The paper proves existence of F with the desired properties and derives a lower bound on average sparsity; Short-Dot meets near-optimal sparsity for large N and M>1.
- Under the shifted-exponential straggling model, Short-Dot yields lower expected computation time than uncoded, repetition, and MDS strategies in key regimes, including a regime where M=Θ(P) and certain sub-linear cases.
- Short-Dot can offer asymptotically faster computation time, with gains scaling as log(P) or P-related factors depending on M and P.
- The approach reduces storage and communication load per processor since each dot-product is shorter and input subsets are communicated.
- Experimental results indicate Short-Dot outperforms existing strategies in straggler-prone environments.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.