[论文解读] Soft Weight-Sharing for Neural Network Compression
论文使用一个学习得到的高斯混合先验对权重进行软权重共享,以在重新训练期间实现同时剪枝和量化,从而在无需多阶段剪枝/量化管道的情况下实现具有竞争力的压缩。
The success of deep learning in numerous application domains created the de- sire to run and train them on mobile devices. This however, conflicts with their computationally, memory and energy intense nature, leading to a growing interest in compression. Recent work by Han et al. (2015a) propose a pipeline that involves retraining, pruning and quantization of neural network weights, obtaining state-of-the-art compression rates. In this paper, we show that competitive compression rates can be achieved by using a version of soft weight-sharing (Nowlan & Hinton, 1992). Our method achieves both quantization and pruning in one simple (re-)training procedure. This point of view also exposes the relation between compression and the minimum description length (MDL) principle.
研究动机与目标
- 通过降低内存和能耗需求来推动在设备端部署的神经网络压缩。
- 提出一个对权重的经验贝叶斯先验,促进聚类与剪枝。
- 证明软权重共享在最小化准确度损失的前提下实现具有竞争力的压缩。
- 展示MDL和bits-back洞察如何将压缩与概率建模及编码联系起来。
- 提供实际的重新训练和后处理步骤,以在真实网络中实现压缩。
提出的方法
- Model weights with a mixture of Gaussians prior p(w) = product_i sum_j pi_j N(w_i | mu_j, sigma_j^2).
- Train weights and mixture parameters (mu_j, sigma_j, pi_j) together via maximum likelihood (empirical Bayes).
- Optimize objective L = Le + tau * Lc, where Le is the data likelihood term and Lc = KL(q(w)||p(w)).
- Use factorized Dirac posteriors during retraining with soft weight-sharing to encourage clustering around mixture components.
- Fix a zero component to enforce pruning and allow other components to merge when pressure from the error term is low.
- Apply gradient-based optimization (Adam) to update weights and mixture parameters; use small tau to weight the prior.
- Post-process by assigning weights to the mean of the most responsible component and merging near-duplicate components.]
- research_questions:[
实验结果
研究问题
- RQ1Can a learned Gaussian mixture prior over weights induce simultaneous quantization and pruning during retraining?
- RQ2How does soft weight-sharing relate to MDL and bits-back principles in neural network compression?
- RQ3What compression rates and accuracy trade-offs are achievable on standard models (e.g., LeNet variants, ResNet) using this approach?
- RQ4How can hyper-parameters and priors be configured to avoid premature component collapse and achieve scalable compression?
主要发现
- Achieved competitive compression rates on MNIST models, with notable pruning and quantization effects during retraining.
- On LeNet-300-100, observed up to 96% pruning in the first layer and overall compression rate of 64x with minimal accuracy drop (0.9811 to 0.9806).
- On LeNet-5-Caffe, achieved a final compression rate of 162x with modest accuracy increase in the reported setup.
- For a light ResNet model (2.7M parameters), demonstrated compression at 45% weight sparsity with 6.6% nonzero weight density and 8.50% top-1 error after compression (from 6.48%).
- Hyper-parameter optimization (Bayesian optimization via Spearmint) explored 13 settings, balancing accuracy loss against compression rate.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。