[論文レビュー] Soft Weight-Sharing for Neural Network Compression
The paper uses a learned mixture of Gaussians prior over weights (soft weight-sharing) to achieve simultaneous pruning and quantization during retraining, enabling competitive compression without multi-stage pruning/quantization pipelines.
The success of deep learning in numerous application domains created the de- sire to run and train them on mobile devices. This however, conflicts with their computationally, memory and energy intense nature, leading to a growing interest in compression. Recent work by Han et al. (2015a) propose a pipeline that involves retraining, pruning and quantization of neural network weights, obtaining state-of-the-art compression rates. In this paper, we show that competitive compression rates can be achieved by using a version of soft weight-sharing (Nowlan & Hinton, 1992). Our method achieves both quantization and pruning in one simple (re-)training procedure. This point of view also exposes the relation between compression and the minimum description length (MDL) principle.
研究の動機と目的
- Motivate neural network compression for on-device deployment by reducing memory and energy demands.
- Propose an empirical Bayes prior over weights that promotes clustering and pruning.
- Demonstrate that soft weight-sharing achieves competitive compression with minimal accuracy loss.
- Show how MDL and bits-back insights relate compression to probabilistic modeling and coding.
- Provide practical retraining and post-processing steps to realize compression in real networks.
提案手法
- Model weights with a mixture of Gaussians prior p(w) = product_i sum_j pi_j N(w_i | mu_j, sigma_j^2).
- Train weights and mixture parameters (mu_j, sigma_j, pi_j) together via maximum likelihood (empirical Bayes).
- Optimize objective L = Le + tau * Lc, where Le is the data likelihood term and Lc = KL(q(w)||p(w)).
- Use factorized Dirac posteriors during retraining with soft weight-sharing to encourage clustering around mixture components.
- Fix a zero component to enforce pruning and allow other components to merge when pressure from the error term is low.
- Apply gradient-based optimization (Adam) to update weights and mixture parameters; use small tau to weight the prior.
- Post-process by assigning weights to the mean of the most responsible component and merging near-duplicate components.]
- research_questions:[
実験結果
リサーチクエスチョン
- RQ1Can a learned Gaussian mixture prior over weights induce simultaneous quantization and pruning during retraining?
- RQ2How does soft weight-sharing relate to MDL and bits-back principles in neural network compression?
- RQ3What compression rates and accuracy trade-offs are achievable on standard models (e.g., LeNet variants, ResNet) using this approach?
- RQ4How can hyper-parameters and priors be configured to avoid premature component collapse and achieve scalable compression?
主な発見
| Model | Method | Top-1 Error[%] | Δ[%] | |W|[10^6] | |W_≠0|/|W|[%] | CR |
|---|---|---|---|---|---|---|
| LeNet-300-100 | Han et al. (2015a) | 1.64 → 1.58 | 0.06 | 0.2 | 8.0 | 40 |
| LeNet-300-100 | Guo et al. (2016) | 2.28 → 1.99 | -0.29 | 1.8 | 56 | |
| LeNet-300-100 | Ours | 1.89 → 1.94 | -0.05 | 4.3 | 64 | |
| LeNet-5-Caffe | Han et al. (2015a) | 0.80 → 0.74 | -0.06 | 0.4 | 8.0 | 39 |
| LeNet-5-Caffe | Guo et al. (2016) | 0.91 → 0.91 | 0.00 | 0.9 | 0.9 | 108 |
| LeNet-5-Caffe | Ours | 0.88 → 0.97 | 0.09 | 0.5 | 162 | |
| ResNet (light) | Ours | 6.48 → 8.50 | 2.02 | 2.7 | 6.6 | 45 |
- Achieved competitive compression rates on MNIST models, with notable pruning and quantization effects during retraining.
- On LeNet-300-100, observed up to 96% pruning in the first layer and overall compression rate of 64x with minimal accuracy drop (0.9811 to 0.9806).
- On LeNet-5-Caffe, achieved a final compression rate of 162x with modest accuracy increase in the reported setup.
- For a light ResNet model (2.7M parameters), demonstrated compression at 45% weight sparsity with 6.6% nonzero weight density and 8.50% top-1 error after compression (from 6.48%).
- Hyper-parameter optimization (Bayesian optimization via Spearmint) explored 13 settings, balancing accuracy loss against compression rate.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。