QUICK REVIEW

[论文解读] Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

Jianbo Ye, Xin Lu|arXiv (Cornell University)|Jan 31, 2018

Advanced Neural Network Applications被引用 206

一句话总结

本文提出一种用于 CNN 的通道裁剪方法，不依赖于较小范数特征的信息性较低。它使用基于 ISTA 的端到端对 batchnorm 的 gamma 参数进行稀疏化来门控通道，并引入 gamma-W 重缩放技巧，在微调后可实现具有竞争力精度的紧凑模型。

ABSTRACT

Model pruning has become a useful technique that improves the computational efficiency of deep learning, making it possible to deploy solutions in resource-limited scenarios. A widely-used practice in relevant work assumes that a smaller-norm parameter or feature plays a less informative role at the inference time. In this paper, we propose a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that does not critically rely on this assumption. Instead, it focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task of making high-dimensional tensors of CNN structured sparse. Our approach takes two stages: first to adopt an end-to- end stochastic training method that eventually forces the outputs of some channels to be constant, and then to prune those constant channels from the original neural network by adjusting the biases of their impacting layers such that the resulting compact model can be quickly fine-tuned. Our approach is mathematically appealing from an optimization perspective and easy to reproduce. We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance.

研究动机与目标

质疑 CNN 中对剪枝依赖于较小范数特征的假设。
提出一种直接简化通道到通道计算图的通道裁剪方法。
避免高维结构化稀疏，改为对 batchnorm 缩放参数（gamma）进行稀疏化。
实现端到端的裁剪，额外参数最少且易于复现。
在 CIFAR-10 和 ImageNet 规模的预训练网络上展示有效性。

提出的方法

将 CNN 模型化为通道到通道的信息流，在每个通道设有门控，由 batch normalization 中的 gamma 控制。
在端到端训练中使用 ISTA (Iterative Shrinkage-Thresholding Algorithm) 对 gamma 进行稀疏化，以促成被裁剪路径的常输出通道。
应用 gamma-W 重缩放技巧在训练中加速裁剪，并在裁剪后恢复缩放。
当 gamma[k] 变为零时，吸收/调整后续层的偏置以保持功能，并实现无需新增参数的裁剪。
对得到的紧凑模型进行微调，以恢复任何轻微的性能损失。
提供超参数调优（mu、rho、alpha）的实用指南，以及用于截断/后处理以移除恒定通道的步骤。

实验结果

研究问题

RQ1是否可以通过对 batchnorm 缩放参数（gamma）进行稀疏化，而不是依赖权重范数，来有效实现通道裁剪？
RQ2端到端的 ISTA 基 γ 稀疏化是否在标准基准测试中产生具有竞争力精度的紧凑 CNN？
RQ3所提出的 gamma-W 重缩放技巧在裁剪速度和稳定性方面有何影响，尤其是对预训练模型？
RQ4在 CIFAR-10 和 ImageNet 规模网络上裁剪后，对模型大小和计算量（FLOPs/参数）的实际影响是什么？
RQ5该方法对不同架构和预训练场景的鲁棒性如何（例如 ResNet、类似 Inception 的模块）？

主要发现

该方法在 CIFAR-10 上（ConvNet 和 ResNet-20）实现了显著的参数和通道减少，且精度具有竞争力。
在 ImageNet（ILSVRC2012）上的 ResNet-101 裁剪模型实现了显著的压缩，Top-5 错误增加很小（小于 0.5%）。
gamma-W 重缩放技巧加速了对预训练模型的裁剪，使裁剪在原始训练时间的一小部分内完成。
在一个分割示例中，裁剪在保持或提升多数据集的平均 IOU 的同时，显著减少了参数和 FLOPs。
裁剪有效性与过参数化相关；对饱和网络进行激进裁剪可能降低性能，但在过参数化基础上观察到有利的权衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。