[论文解读] A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM
本论文提出一个基于统一ADMM框架,联合执行DNN权重裁剪与权重聚类/量化,在没有精度损失的前提下实现显著的压缩(例如 LeNet-5 上167×裁剪;AlexNet 上24.7×;当裁剪与聚类结合时,存储容量可提升高达1910×)
Many model compression techniques of Deep Neural Networks (DNNs) have been investigated, including weight pruning, weight clustering and quantization, etc. Weight pruning leverages the redundancy in the number of weights in DNNs, while weight clustering/quantization leverages the redundancy in the number of bit representations of weights. They can be effectively combined in order to exploit the maximum degree of redundancy. However, there lacks a systematic investigation in literature towards this direction. In this paper, we fill this void and develop a unified, systematic framework of DNN weight pruning and clustering/quantization using Alternating Direction Method of Multipliers (ADMM), a powerful technique in optimization theory to deal with non-convex optimization problems. Both DNN weight pruning and clustering/quantization, as well as their combinations, can be solved in a unified manner. For further performance improvement in this framework, we adopt multiple techniques including iterative weight quantization and retraining, joint weight clustering training and centroid updating, weight clustering retraining, etc. The proposed framework achieves significant improvements both in individual weight pruning and clustering/quantization problems, as well as their combinations. For weight pruning alone, we achieve 167x weight reduction in LeNet-5, 24.7x in AlexNet, and 23.4x in VGGNet, without any accuracy loss. For the combination of DNN weight pruning and clustering/quantization, we achieve 1,910x and 210x storage reduction of weight data on LeNet-5 and AlexNet, respectively, without accuracy loss. Our codes and models are released at the link http://bit.ly/2D3F0np
研究动机与目标
- 激发并解决将权重裁剪与权重聚类/量化相结合缺乏系统性研究的问题。
- 开发一个基于统一ADMM的优化框架,在一个单一的公式中执行裁剪、聚类/量化及其组合。
- 在保持准确度的同时,在标准网络上展示显著的模型规模缩减。
- 提供实用的训练流程,包括迭代量化/再训练与质心更新以提升性能。
提出的方法
- 将DNN压缩表述为带有裁剪和聚类/量化约束的联合优化,使用指示函数。
- 应用ADMM将问题分解为子问题:(i) 具有二次惩罚的DNN训练,(ii) 投影到裁剪集合,(iii) 投影到聚类/量化集合。
- 使用欧几里得投影来强制稀疏性(保留前α个权重)并将权重分配到固定量化水平或聚类质心。
- 迭代更新对偶变量并进行再训练以恢复精度,顺序可选:先裁剪,再聚类/量化。
- 提供迭代权重量化和再训练,以及用于基于聚类的压缩的动态质心更新。
实验结果
研究问题
- RQ1在统一的ADMM框架中,裁剪和权重聚类/量化能否被一起优化?
- RQ2在联合裁剪与聚类/量化下,对常见DNNs可以达到的无精度损失的压缩比是多少?
- RQ3先裁剪再聚类/量化的方法是否比同时处理获得更好的性能?
- RQ4迭代量化和再训练如何影响最终精度和存储效率?
- RQ5关于层级裁剪和聚类/量化设置以最大化压缩的实用准则有哪些?
主要发现
- LeNet-5 的167×权重减少,且无精度损失(仅裁剪)。
- AlexNet 的24.7×权重减少,且无精度损失(仅裁剪)。
- VGGNet 的23.4×权重减少,且无精度损失(仅裁剪)。
- 联合裁剪和聚类/量化在 LeNet-5 上实现了1,910×的存储缩减,在 AlexNet 上实现了210×,且无精度损失(不计入裁剪中的索引)。
- 包含索引后,总模型大小缩减分别为623×(LeNet-5)和90×(AlexNet)。
- 对于 LeNet-5,联合方法在各层平均量化约为2.4位,达到88×裁剪。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。