Skip to main content
QUICK REVIEW

[論文レビュー] Soft Weight-Sharing for Neural Network Compression

Karen Ullrich, Edward Meeds|arXiv (Cornell University)|Feb 13, 2017
Advanced Neural Network Applications被引用数 81
ひとこと要約

The paper uses a learned mixture of Gaussians prior over weights (soft weight-sharing) to achieve simultaneous pruning and quantization during retraining, enabling competitive compression without multi-stage pruning/quantization pipelines.

ABSTRACT

The success of deep learning in numerous application domains created the de- sire to run and train them on mobile devices. This however, conflicts with their computationally, memory and energy intense nature, leading to a growing interest in compression. Recent work by Han et al. (2015a) propose a pipeline that involves retraining, pruning and quantization of neural network weights, obtaining state-of-the-art compression rates. In this paper, we show that competitive compression rates can be achieved by using a version of soft weight-sharing (Nowlan & Hinton, 1992). Our method achieves both quantization and pruning in one simple (re-)training procedure. This point of view also exposes the relation between compression and the minimum description length (MDL) principle.

研究の動機と目的

  • Motivate neural network compression for on-device deployment by reducing memory and energy demands.
  • Propose an empirical Bayes prior over weights that promotes clustering and pruning.
  • Demonstrate that soft weight-sharing achieves competitive compression with minimal accuracy loss.
  • Show how MDL and bits-back insights relate compression to probabilistic modeling and coding.
  • Provide practical retraining and post-processing steps to realize compression in real networks.

提案手法

  • Model weights with a mixture of Gaussians prior p(w) = product_i sum_j pi_j N(w_i | mu_j, sigma_j^2).
  • Train weights and mixture parameters (mu_j, sigma_j, pi_j) together via maximum likelihood (empirical Bayes).
  • Optimize objective L = Le + tau * Lc, where Le is the data likelihood term and Lc = KL(q(w)||p(w)).
  • Use factorized Dirac posteriors during retraining with soft weight-sharing to encourage clustering around mixture components.
  • Fix a zero component to enforce pruning and allow other components to merge when pressure from the error term is low.
  • Apply gradient-based optimization (Adam) to update weights and mixture parameters; use small tau to weight the prior.
  • Post-process by assigning weights to the mean of the most responsible component and merging near-duplicate components.]
  • research_questions:[

実験結果

リサーチクエスチョン

  • RQ1Can a learned Gaussian mixture prior over weights induce simultaneous quantization and pruning during retraining?
  • RQ2How does soft weight-sharing relate to MDL and bits-back principles in neural network compression?
  • RQ3What compression rates and accuracy trade-offs are achievable on standard models (e.g., LeNet variants, ResNet) using this approach?
  • RQ4How can hyper-parameters and priors be configured to avoid premature component collapse and achieve scalable compression?

主な発見

ModelMethodTop-1 Error[%]Δ[%]|W|[10^6]|W_≠0|/|W|[%]CR
LeNet-300-100Han et al. (2015a)1.64 → 1.580.060.28.040
LeNet-300-100Guo et al. (2016)2.28 → 1.99-0.291.856
LeNet-300-100Ours1.89 → 1.94-0.054.364
LeNet-5-CaffeHan et al. (2015a)0.80 → 0.74-0.060.48.039
LeNet-5-CaffeGuo et al. (2016)0.91 → 0.910.000.90.9108
LeNet-5-CaffeOurs0.88 → 0.970.090.5162
ResNet (light)Ours6.48 → 8.502.022.76.645
  • Achieved competitive compression rates on MNIST models, with notable pruning and quantization effects during retraining.
  • On LeNet-300-100, observed up to 96% pruning in the first layer and overall compression rate of 64x with minimal accuracy drop (0.9811 to 0.9806).
  • On LeNet-5-Caffe, achieved a final compression rate of 162x with modest accuracy increase in the reported setup.
  • For a light ResNet model (2.7M parameters), demonstrated compression at 45% weight sparsity with 6.6% nonzero weight density and 8.50% top-1 error after compression (from 6.48%).
  • Hyper-parameter optimization (Bayesian optimization via Spearmint) explored 13 settings, balancing accuracy loss against compression rate.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。