[论文解读] Channel Pruning for Accelerating Very Deep Neural Networks
本文提出一种在推理时进行通道裁剪的方法,使用基于LASSO的通道选择和最小二乘重构来加速非常深的CNN,在VGG-16、ResNet和Xception架构上实现显著的加速,且精度损失极小。
In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant. Code has been made publicly available.
研究动机与目标
- Motivate and develop a practical channel pruning method to accelerate very deep CNNs at inference time.
- Reduce feature map channels layer-by-layer while controlling reconstruction error.
- Ensure compatibility with multi-branch architectures (e.g., ResNet, Xception).
- Demonstrate effectiveness on modern networks (VGG-16, ResNet-50, Xception-50) across ImageNet and other datasets.
- Show that pruning can be combined with other efficiency techniques for greater speed-ups.
提出的方法
- Formulate channel pruning as minimizing reconstruction error of the output feature maps after pruning input channels (Eq. 1).
- Relax the NP-hard l0 sparsity to an l1 penalty and alternate between solving for channel selection via LASSO (Eq. 3) and reconstructing the remaining outputs via least squares (Eq. 4).
- Apply the layer-wise pruning sequentially to account for accumulated error (Eq. 5).
- Extend the approach to multi-branch networks by sampling/pruning input channels in shared paths and addressing the last layer of residual blocks to recover Y1+Y2 (Sec. 3.3).
- Introduce variants like multi-branch enhancement to better handle residual connections (Sec. 3 Last layer and Sec. 3 First layer).
- Fine-tune pruned models briefly to recover accuracy (10-20 epochs as reported) and compare against training-from-scratch baselines.
实验结果
研究问题
- RQ1Can channels be pruned at inference time without retraining from scratch while preserving accuracy?
- RQ2How can inter-channel redundancy be exploited to select representative channels and accurately reconstruct outputs?
- RQ3How does the method perform across single-branch networks (VGG-16) and multi-branch architectures (ResNet, Xception)?
主要发现
- Achieves up to 5x acceleration on VGG-16 with only 0.3% increase in top-5 error when combined with tensor factorization (and beyond 4x with small accuracy loss).
- For ResNet-50 and Xception-50, the method achieves around 2x speed-up with 1.4% and 1.0% accuracy loss respectively (without and with fine-tuning).
- Pruning two layers at a time with the proposed approach consistently outperforms naive channel selection baselines (first k channels, max response) in reconstruction error (single-layer pruning results).
- Sequential, layer-wise pruning with accumulated error accounted yields competitive absolute GPU speedups without specialized libraries (Table 3).
- Multi-branch enhancement improves pruning effectiveness in residual blocks by better handling shortcut connections (improved 4.0% top-5 accuracy in ResNet-50 with enhancement).
- Combining channel pruning with spatial and channel factorization (3C) yields the best reported reductions (e.g., 4x or 5x on VGG-16) with relatively small accuracy losses.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。