QUICK REVIEW

[论文解读] Transform Quantization for CNN (Convolutional Neural Network) Compression

Sean I. Young, Zhe Wang|arXiv (Cornell University)|Sep 2, 2020

Advanced Neural Network Applications参考文献 97被引用 74

一句话总结

本文提出一种用于卷积神经网络（CNN）权重后训练压缩的变换量化方法，通过率失真框架联合优化学习到的去相关变换与比特深度分配，实现了最先进性能，在1–2比特的极低比特率下对AlexNet、ResNet和DenseNet等模型实现高效压缩，且精度损失极小。

ABSTRACT

In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).

研究动机与目标

为解决现有CNN量化方法忽略权重与激活联合统计特性的次优性能问题。
实现在无需微调的情况下，对已训练好的CNN模型实现高效、低比特率压缩。
在单一率失真优化框架内统一维度压缩、量化与剪枝。
推导一种端到端学习变换（ELT），通过去相关权重并实现最优比特深度分配，以最大化压缩增益。

提出的方法

将CNN权重压缩建模为率失真优化问题，在比特率约束下最小化输出失真。
对权重矩阵应用学习到的去相关变换（ELT），以减少冗余并支持高效量化。
通过一阶泰勒近似将权重扰动与输出误差关联，优化变换系数的比特深度分配以最小化失真。
采用变换域表示，将不显著的系数量化为零，模拟剪枝操作。
推导最优变换为权重与激活协方差矩阵乘积的广义特征值分解。
通过为不同变换通道分配不同比特深度，支持混合精度推理。

实验结果

研究问题

RQ1能否在后训练阶段利用权重与激活的联合统计特性来提升CNN压缩效率？
RQ2联合优化变换与比特深度分配是否优于仅使用标量量化？
RQ3学习到的变换在低比特率CNN压缩中是否优于DCT或KLT等经典变换？
RQ4所提框架在多种CNN架构中，于微调与非微调设置下的表现如何？
RQ5最优变换与KLT或SVD等经典变换之间存在何种理论关系？

主要发现

变换量化在微调与非微调设置下均实现了最先进CNN压缩性能，涵盖AlexNet、ResNet与DenseNet等模型。
经微调后，该方法可将模型压缩至每权重1–2比特，同时保持高精度，显著优于先前方法。
端到端学习变换（ELT）实现的压缩增益接近理论最优，AlexNet的层内变换编码增益最高达19.8 dB。
基于最小化输出失真推导出的最优比特深度分配，优于均匀比特分配，在低比特率下表现更优。
该框架支持在变换域高效推理，并可在专用硬件上实现混合精度部署。
理论分析表明，最优变换等价于权重与激活协方差矩阵乘积的广义特征值分解。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。