QUICK REVIEW

[Paper Review] Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

Jack Kosaian, K. V. Rashmi|arXiv (Cornell University)|Jun 4, 2018

Stochastic Gradient Optimization Techniques19 references41 citations

TL;DR

The paper learns encoding and decoding neural networks to create erasure codes that provide resilience for non-linear computations, enabling approximate reconstruction of unavailable neural-network inference outputs.

ABSTRACT

Machine learning algorithms are typically run on large scale, distributed compute infrastructure that routinely face a number of unavailabilities such as failures and temporary slowdowns. Adding redundant computations using coding-theoretic tools called "codes" is an emerging technique to alleviate the adverse effects of such unavailabilities. A code consists of an encoding function that proactively introduces redundant computation and a decoding function that reconstructs unavailable outputs using the available ones. Past work focuses on using codes to provide resilience for linear computations and specific iterative optimization algorithms. However, computations performed for a variety of applications including inference on state-of-the-art machine learning algorithms, such as neural networks, typically fall outside this realm. In this paper, we propose taking a learning-based approach to designing codes that can handle non-linear computations. We present carefully designed neural network architectures and a training methodology for learning encoding and decoding functions that produce approximate reconstructions of unavailable computation results. We present extensive experimental results demonstrating the effectiveness of the proposed approach: we show that the our learned codes can accurately reconstruct $64 - 98\%$ of the unavailable predictions from neural-network based image classifiers on the MNIST, Fashion-MNIST, and CIFAR-10 datasets. To the best of our knowledge, this work proposes the first learning-based approach for designing codes, and also presents the first coding-theoretic solution that can provide resilience for any non-linear (differentiable) computation. Our results show that learning can be an effective technique for designing codes, and that learned codes are a highly promising approach for bringing the benefits of coding to non-linear computations.

Motivation & Objective

Motivate resilience for non-linear computations in distributed ML inference and reduce latency from unavailability.
Propose a learning-based approach to design encoding and decoding functions that work for any differentiable function F.
Develop neural network architectures (MLPEncoder, ConvEncoder, decoding network) to implement E and D.
Train E and D jointly by backpropagating through the base model F using appropriate losses.
Evaluate effectiveness on neural-network classifiers across MNIST, Fashion-MNIST, and CIFAR-10.

Proposed method

Represent encoding and decoding as neural networks trained end-to-end.
Use three-stage pipeline: input data, encoding to parity inputs, apply F to all inputs, decode remaining unavailable outputs.
Train E and D by backpropagating losses through F, using either F-based loss (MSE-Base or KL-Base) or label-based loss (XENT-Label).
Employ two encoding architectures: MLPEncoder (fully-connected) and ConvEncoder (dilated convolutions) to generate r parity outputs.
Use a 3-layer MLP for decoding that takes all F(Xi) and F(Pj) (with zeros for unavailable) as input and outputs reconstructed F(Xi).
Handle multi-channel inputs by encoding channels independently and combining parity channels.

Experimental results

Research questions

RQ1Can a learning-based approach design encoding and decoding functions that provide resilience for non-linear (differentiable) computations?
RQ2How accurately can learned codes reconstruct unavailable outputs for neural-network inference across different datasets and base models?
RQ3What architectures and training losses enable effective end-to-end learning of E and D through a non-linear F?
RQ4How does the amount of redundancy (k and r) impact reconstruction quality for inference tasks?

Key findings

Learned codes can accurately reconstruct 64-98% of unavailable predictions.
ResNet-18 classifiers: 98.87% (MNIST), 92.06% (Fashion-MNIST), 80.84% (CIFAR-10) recovered under studied settings.
With 20% redundancy (k=5, r=1), overall prediction accuracy improves from 84.12% to 90.59% (CIFAR-10) and from 89.28% to 98.75% (MNIST) in their scenario.
The approach demonstrates the first learning-based design of erasure codes and resilience for non-linear computations (inference).
Evaluations used two base models (MLP and ResNet-18) across three datasets, demonstrating robustness of learned codes.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.