QUICK REVIEW

[Paper Review] Coded Sparse Matrix Multiplication

Sinong Wang, Jiashang Liu|arXiv (Cornell University)|Feb 9, 2018

Stochastic Gradient Optimization Techniques16 references73 citations

TL;DR

Introduces sparse code for distributed A^T B computation that preserves sparsity, achieves near-optimal recovery threshold Theta(mn), and ensures nearly linear decoding time in nnz(C).

ABSTRACT

In a large-scale and distributed matrix multiplication problem $C=A^{\intercal}B$, where $C\in\mathbb{R}^{r imes t}$, the coded computation plays an important role to effectively deal with "stragglers" (distributed computations that may get delayed due to few slow or faulty processors). However, existing coded schemes could destroy the significant sparsity that exists in large-scale machine learning problems, and could result in much higher computation overhead, i.e., $O(rt)$ decoding time. In this paper, we develop a new coded computation strategy, we call \emph{sparse code}, which achieves near \emph{optimal recovery threshold}, \emph{low computation overhead}, and \emph{linear decoding time} $O(nnz(C))$. We implement our scheme and demonstrate the advantage of the approach over both uncoded and current fastest coded strategies.

Motivation & Objective

Motivate and address straggler issues in large-scale distributed matrix multiplication.
Preserve input/output sparsity to reduce computation and communication overhead.
Design a coding scheme with near-optimal recovery threshold and low decoding complexity.
Develop and analyze a degree distribution and decoding algorithm tailored for sparse matrices.
Empirically benchmark against uncoded schemes and existing coded strategies.

Proposed method

Define the (P,S)-sparse code where each worker computes a weighted sum of A_i^T B_j with weights drawn from a finite set S.
Use a degree distribution P to decide how many terms participate in each coded task (Wave Soliton distribution).
Form a coefficient matrix M from the weights and employ a hybrid decoding algorithm that combines peeling (graph-based) decoding with Gaussian elimination.
Introduce a rooting step to recover blocks via linear combinations when peeling stalls, ensuring decoding completes with high probability.
Prove near-optimal recovery threshold K = Theta(mn) with high probability by linking the full-rank condition of M to the existence of a perfect matching in a random bipartite graph.
Describe decoding complexity O(nnz(C) ln(mn)) and show it scales with nnz(C) rather than rt or full matrix size.

Experimental results

Research questions

RQ1Can coded computation mitigate stragglers in sparse, large-scale matrix multiplication without destroying sparsity?
RQ2What recovery threshold and decoding complexity are achievable for sparse inputs, and how can they be made nearly linear in nnz(C)?
RQ3How to design a degree distribution and decoding procedure that ensure full-rank decoding with high probability?
RQ4How does the sparse code compare to existing schemes (uncoded, sparse MDS, product code, polynomial code, LT code) in practice?

Key findings

The sparse code achieves recovery threshold Theta(mn) with high probability.
Decoding time is nearly linear in nnz(C): O(nnz(C) ln(mn)).
Average per-row degree in the coefficient matrix is O(ln(mn)), yielding sparse M with alpha = O(ln(mn)).
A Wave Soliton distribution enables near-optimal recovery with constant rooting steps.
Schwartz-Zeppel Lemma is used to relate full rank to the existence of perfect matchings in a random bipartite graph, enabling full-rank proofs.
Experimental results on large sparse matrices show significant time improvements over uncoded, LT code, sparse MDS, product code, and polynomial code baselines.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.