Skip to main content
QUICK REVIEW

[论文解读] Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

Jianbo Ye, Xin Lu|arXiv (Cornell University)|Feb 1, 2018
Advanced Neural Network Applications参考文献 12被引用 153
一句话总结

本文提出一种用于CNN的通道剪枝方法,该方法不依赖“较小范数较不具有信息量”的假设,而是通过端到端ISTA在批归一化尺度参数上学习通道门来剪除恒定通道。

ABSTRACT

Model pruning has become a useful technique that improves the computational efficiency of deep learning, making it possible to deploy solutions in resource-limited scenarios. A widely-used practice in relevant work assumes that a smaller-norm parameter or feature plays a less informative role at the inference time. In this paper, we propose a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that does not critically rely on this assumption. Instead, it focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task of making high-dimensional tensors of CNN structured sparse. Our approach takes two stages: first to adopt an end-to- end stochastic training method that eventually forces the outputs of some channels to be constant, and then to prune those constant channels from the original neural network by adjusting the biases of their impacting layers such that the resulting compact model can be quickly fine-tuned. Our approach is mathematically appealing from an optimization perspective and easy to reproduce. We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance.

研究动机与目标

  • 质疑在CNN的通道剪枝中对较小范数假设的依赖。
  • 引入基于门的剪枝机制,在不增加额外参数或改变计算图的情况下减少通道。
  • 实现端到端训练,在单次训练中生成多种紧凑模型。

提出的方法

  • 将CNN建模为通道到通道的信息流,每个通道通过BN尺度参数gamma实现一个门。
  • 使用ISTA对gamma进行稀疏化,当gamma趋近于零时,通道在效果上变为常量。
  • 应用gamma-W重新缩放技巧以稳定训练并通过调整相应权重来加速稀疏化。
  • 将剪除的常数通道吸收到后续层以保持功能性,然后对得到的紧凑模型进行微调。
  • 分轮剪枝与微调,以获得在推理性能方面具有不同权衡的模型。

实验结果

研究问题

  • RQ1是否可以在不依赖较小范数启发式的情况下,通过直接对BN gamma参数进行稀疏化来实现有效的通道剪枝?
  • RQ2在gamma上使用ISTA进行端到端训练,如何在保持网络功能性的同时实现可靠的通道剪枝?
  • RQ3使用该方法在CIFAR-10、ImageNet和分割任务上可以实现哪些实际的模型大小和计算方面的提升?

主要发现

  • 所提出的方法在CIFAR-10和ImageNet基准上实现了显著的参数和FLOP减少,同时保持有竞争力的准确性。
  • 对ImageNet的Pruned ResNet-101在微调后实现了超过2.5倍的压缩,精度损失适中。
  • 在CIFAR-10上,对ResNet-20进行剪枝可将通道减少约37%,在某些配置中精度损失约1%。
  • 在分割模型中剪枝可以显著减少参数和FLOPs,有时还能提高多个数据集的IOU。
  • gamma-W重新缩放技巧和基于ISTA的稀疏化使收敛更快,剪枝性能更稳健。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。