QUICK REVIEW

[論文レビュー] Soft Weight-Sharing for Neural Network Compression

Karen Ullrich, Edward Meeds|arXiv (Cornell University)|Feb 13, 2017

Advanced Neural Network Applications被引用数 81

ひとこと要約

The paper uses a learned mixture of Gaussians prior over weights (soft weight-sharing) to achieve simultaneous pruning and quantization during retraining, enabling competitive compression without multi-stage pruning/quantization pipelines.

ABSTRACT

The success of deep learning in numerous application domains created the de- sire to run and train them on mobile devices. This however, conflicts with their computationally, memory and energy intense nature, leading to a growing interest in compression. Recent work by Han et al. (2015a) propose a pipeline that involves retraining, pruning and quantization of neural network weights, obtaining state-of-the-art compression rates. In this paper, we show that competitive compression rates can be achieved by using a version of soft weight-sharing (Nowlan & Hinton, 1992). Our method achieves both quantization and pruning in one simple (re-)training procedure. This point of view also exposes the relation between compression and the minimum description length (MDL) principle.

研究の動機と目的

Motivate neural network compression for on-device deployment by reducing memory and energy demands.
Propose an empirical Bayes prior over weights that promotes clustering and pruning.
Demonstrate that soft weight-sharing achieves competitive compression with minimal accuracy loss.
Show how MDL and bits-back insights relate compression to probabilistic modeling and coding.
Provide practical retraining and post-processing steps to realize compression in real networks.

提案手法

Model weights with a mixture of Gaussians prior p(w) = product_i sum_j pi_j N(w_i | mu_j, sigma_j^2).
Train weights and mixture parameters (mu_j, sigma_j, pi_j) together via maximum likelihood (empirical Bayes).
Optimize objective L = Le + tau * Lc, where Le is the data likelihood term and Lc = KL(q(w)||p(w)).
Use factorized Dirac posteriors during retraining with soft weight-sharing to encourage clustering around mixture components.
Fix a zero component to enforce pruning and allow other components to merge when pressure from the error term is low.
Apply gradient-based optimization (Adam) to update weights and mixture parameters; use small tau to weight the prior.
Post-process by assigning weights to the mean of the most responsible component and merging near-duplicate components.]
research_questions:[

実験結果

リサーチクエスチョン

RQ1Can a learned Gaussian mixture prior over weights induce simultaneous quantization and pruning during retraining?
RQ2How does soft weight-sharing relate to MDL and bits-back principles in neural network compression?
RQ3What compression rates and accuracy trade-offs are achievable on standard models (e.g., LeNet variants, ResNet) using this approach?
RQ4How can hyper-parameters and priors be configured to avoid premature component collapse and achieve scalable compression?

主な発見

Model	Method	Top-1 Error[%]	Δ[%]	\|W\|[10^6]	\|W_≠0\|/\|W\|[%]	CR
LeNet-300-100	Han et al. (2015a)	1.64 → 1.58	0.06	0.2	8.0	40
LeNet-300-100	Guo et al. (2016)	2.28 → 1.99	-0.29		1.8	56
LeNet-300-100	Ours	1.89 → 1.94	-0.05		4.3	64
LeNet-5-Caffe	Han et al. (2015a)	0.80 → 0.74	-0.06	0.4	8.0	39
LeNet-5-Caffe	Guo et al. (2016)	0.91 → 0.91	0.00	0.9	0.9	108
LeNet-5-Caffe	Ours	0.88 → 0.97	0.09	0.5		162
ResNet (light)	Ours	6.48 → 8.50	2.02	2.7	6.6	45

Achieved competitive compression rates on MNIST models, with notable pruning and quantization effects during retraining.
On LeNet-300-100, observed up to 96% pruning in the first layer and overall compression rate of 64x with minimal accuracy drop (0.9811 to 0.9806).
On LeNet-5-Caffe, achieved a final compression rate of 162x with modest accuracy increase in the reported setup.
For a light ResNet model (2.7M parameters), demonstrated compression at 45% weight sparsity with 6.6% nonzero weight density and 8.50% top-1 error after compression (from 6.48%).
Hyper-parameter optimization (Bayesian optimization via Spearmint) explored 13 settings, balancing accuracy loss against compression rate.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。