QUICK REVIEW

[论文解读] Learning Efficient Convolutional Networks through Network Slimming

Zhuang Liu, Jianguo Li|arXiv (Cornell University)|Aug 22, 2017

Advanced Neural Network Applications参考文献 32被引用 269

一句话总结

本论文提出 Network Slimming，一种通过对 BN 缩放因子应用 L1 正则化来学习通道级稀疏性的训练方案，能够自动剪枝不显著的通道，得到紧凑的 CNN，在不需要特定硬件的情况下保持或提升精度。它展示了在 CIFAR、SVHN 和 ImageNet 的 VGGNet、ResNet、DenseNet 上，模型大小最多可减少 20 倍，FLOPs 约减少 5 倍。

ABSTRACT

The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20x reduction in model size and a 5x reduction in computing operations.

研究动机与目标

在资源受限环境中，由于模型大小、运行时内存和计算成本，部署大型 CNN 面临挑战。
提出一个简单、与架构无关的训练方案，通过通道级剪枝自动瘦身网络。
证明通道级稀疏性在多种架构和数据集上可以显著减少参数和 FLOPs，同时保持或提升精度。

提出的方法

在每个 BN 层的输出上附加一个通道级缩放因子 gamma，并在训练 W（权重）和 Gamma，同时对 Gamma 施加 L1 稀疏惩罚。
使用 L1 正则化将 Gamma 值向零推近，便于自动识别不显著的通道。
使用全局分位数阈值跨所有层 prune近零 Gamma 的通道，然后对得到的窄模型进行微调。
直接利用 BN 缩放因子作为剪枝信号，无需更改网络架构或要求稀疏计算库。
可选地在多轮迭代中重复该过程以获得进一步的压缩。
为具有跨层连接和前置激活结构的网络调整剪枝，通过对每层的进入通道进行剪枝，并在推理阶段应用通道选择。

实验结果

研究问题

RQ1在训练过程中学习的通道级稀疏性是否可以在不牺牲精度的情况下显著降低模型大小、内存占用和 FLOPs？
RQ2在多种 CNN 架构（VGGNet、ResNet、DenseNet）和数据集（CIFAR-10/100、SVHN、ImageNet）上，网络瘦身的效果有多大？
RQ3稀疏正则化强度和剪枝百分比对最终精度和资源节省的实际影响是什么？
RQ4多轮剪枝在压缩和精度方面相比单轮方法有何不同？

主要发现

网络瘦身在多个架构和数据集上实现了显著的资源减小，同时几乎不影响或甚至不降低精度。
在对通道剪枝达到 60-70% 时，经过微调后在许多情况下可以维持甚至提高测试精度。
在 CIFAR-10/SVHN 上，参数减少高达 ~10x，FLOP 减少约 ~50%，且精度保持。
在 ImageNet 上，剪除 50% 通道可获得超过 5x 的参数节省，FLOP 减少高达 ~30%，且 VGG-A 上精度没有损失。
来自 L1 稀疏性的正则化效应可以提升泛化，有时在剪枝和微调后降低测试误差。
多轮瘦身方案可以为某些模型和数据集带来进一步的压缩和精度提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。