QUICK REVIEW

[论文解读] Discrimination-aware Channel Pruning for Deep Neural Networks

Zhuangwei Zhuang, Mingkui Tan|arXiv (Cornell University)|Oct 28, 2018

Advanced Neural Network Applications参考文献 44被引用 162

一句话总结

Discrimination-aware Channel Pruning (DCP) 在中间层引入 discrimination-aware 损失以引导通道剪枝，在保持判别能力与特征图重建之间取得平衡，并在给定剪枝率下在 ImageNet 和 CIFAR 数据集上展示出更高的准确性。

ABSTRACT

Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. To overcome these drawbacks, we investigate a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power. To this end, we introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. Last, we propose a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms the original model by 0.39% in top-1 accuracy.

研究动机与目标

推动高效的通道剪枝，使各层的判别能力得到保留，而不仅仅是最小化重建误差。
在中间层引入 discrimination-aware 损失，以提升局部的判别表示。
将带有其形式的 2,0-范数约束的稀疏通道选择问题进行建模并通过贪心优化方法求解。
展示 DCP 在相似剪枝率下的准确性优于或等同于现有剪枝方法。
在大规模数据集（ILSVRC-12）和较小数据集（CIFAR-10、LFW）上验证方法的有效性。

提出的方法

在选定的中间层插入多组 discrimination-aware 损失，以增强判别能力。
通过联合目标 L(W)=L_M(W)+ 及 L_S^p(W)（选择一个 lambda）来平衡重建损失和 discrimination-aware 损失。
将通道剪枝表述为一个 2,0-范数约束优化问题，并使用贪心算法迭代地按梯度模量选择通道来求解。
阶段性剪枝：先使用 discrimination-aware 损失进行微调，然后在相应阶段使用 L_S^p 和 L_M 对相关层进行剪枝。
使用两步贪心过程：(i) 通过每个通道的梯度范数最大化来选择通道，(ii) 在零化互补部分的条件下，使用 SGD 对所选通道的 W 进行优化。
采用基于相对损失改进的停止准则以自动确定每层的剪枝水平。

实验结果

研究问题

RQ1中间层的 discrimination-aware 损失是否能够可靠地识别出具有真正判别能力的通道，而不仅仅基于重建标准？
RQ2将重建损失与 discrimination-aware 损失结合是否能在深度网络上提升剪枝性能，相较于现有方法？
RQ3DCP 在不同架构（ResNet-18/50、VGGNet）与数据集（CIFAR-10、ILSVRC-12、LFW）在不同剪枝率下的表现如何？
RQ4权衡参数 lambda 和停止条件对剪枝结果与准确性有何影响？

主要发现

在 ILSVRC-12 上，DCP 剪枝的 ResNet-50 在减少 30% 通道后，Top-1 准确率比基线提升 0.39%。
在 ResNet-50 剪枝至 50% 时，DCP 的 Top-1 提升超过 ThiNet 0.81%，Top-5 提升 0.51%。
在 CIFAR-10 上，DCP 在 VGGNet 与 ResNet-56 的精度和参数/FLOPs 的下降幅度均优于若干基线方法。
在 CIFAR-10 上，DCP 剪枝的 MobileNet 变体在剪枝 30% 通道时，精度高于随机剪枝和基线方法。
LFW 实验表明剪枝后的 SphereNet-4 模型在准确率方面具备竞争力，同时参数量和 FLOPs 显著下降（例如在 98.30% LFW 准确率下实现 3.66x 加速）。
消融研究表明更大的量 lambda（强调 discrimination-aware 损失）通常会提升剪枝性能，停止条件有效地确定了剪枝水平。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。