QUICK REVIEW

[论文解读] An Effective Information Theoretic Framework for Channel Pruning

Yihao Chen, Zefang Wang|arXiv (Cornell University)|Aug 14, 2024

Artificial Immune Systems Applications被引用 2

一句话总结

该论文提出了一种基于信息论的通道剪枝框架，利用秩与熵的融合作为‘信息集中度’，以指导逐层剪枝比例，同时借助Shapley值识别并移除重要性最低的通道。该方法在显著降低FLOP和参数量的同时，实现了SOTA精度表现，例如在ResNet-56/CIFAR-10上实现45.5% FLOP减少下0.21%的精度提升，在ResNet-50/ImageNet上实现41.6% FLOP减少下仅0.43%的Top-1精度损失。

ABSTRACT

Channel pruning is a promising method for accelerating and compressing convolutional neural networks. However, current pruning algorithms still remain unsolved problems that how to assign layer-wise pruning ratios properly and discard the least important channels with a convincing criterion. In this paper, we present a novel channel pruning approach via information theory and interpretability of neural networks. Specifically, we regard information entropy as the expected amount of information for convolutional layers. In addition, if we suppose a matrix as a system of linear equations, a higher-rank matrix represents there exist more solutions to it, which indicates more uncertainty. From the point of view of information theory, the rank can also describe the amount of information. In a neural network, considering the rank and entropy as two information indicators of convolutional layers, we propose a fusion function to reach a compromise of them, where the fusion results are defined as ``information concentration''. When pre-defining layer-wise pruning ratios, we employ the information concentration as a reference instead of heuristic and engineering tuning to provide a more interpretable solution. Moreover, we leverage Shapley values, which are a potent tool in the interpretability of neural networks, to evaluate the channel contributions and discard the least important channels for model compression while maintaining its performance. Extensive experiments demonstrate the effectiveness and promising performance of our method. For example, our method improves the accuracy by 0.21% when reducing 45.5% FLOPs and removing 40.3% parameters for ResNet-56 on CIFAR-10. Moreover, our method obtains loss in Top-1/Top-5 accuracies of 0.43%/0.11% by reducing 41.6% FLOPs and removing 35.0% parameters for ResNet-50 on ImageNet.

研究动机与目标

为解决当前通道剪枝方法中缺乏可解释性，特别是逐层剪枝比例分配方面的问题。
提供一种基于信息论的系统性准则，用于识别不重要的通道，而非依赖启发式或工程化调参。
通过结合秩与熵作为互补指标，提升特征重要性的度量，从而提高模型压缩效率。
利用Shapley值评估通道贡献，确保剪枝过程中保持高模型精度。
开发一种可泛化、可解释的框架，适用于不同架构与任务，包括图像分类与目标检测。

提出的方法

该方法提出‘信息集中度’作为秩与熵的融合，用于量化每个卷积层的信息含量。
秩用作线性系统中独立解数量的代理，代表信息容量。
熵用于衡量某一层特征图激活中的不确定性或信息含量。
秩与熵的融合提供了更稳健、更具可解释性的度量指标，用于分配逐层剪枝比例。
计算Shapley值以评估每个通道对最终预测的贡献，从而精确识别重要性最低的通道。
按层剪除Shapley值最低的通道，随后进行微调以恢复精度。

实验结果

研究问题

RQ1如何以系统性、可解释的方式分配逐层剪枝比例，而非依赖启发式调参？
RQ2秩与熵的融合能否作为通道剪枝中特征重要性的可靠代理？
RQ3与传统重要性准则相比，使用Shapley值在多大程度上提升了剪枝后模型的精度？
RQ4该信息论框架在降低FLOPs与参数量的同时，能在多大程度上保持模型性能？
RQ5该方法能否在不同架构与任务（如图像分类与目标检测）中实现良好泛化？

主要发现

在CIFAR-10的ResNet-56上，该方法在减少45.5% FLOPs的同时精度提升0.21%，参数量减少40.3%。
在ImageNet的ResNet-50上，该方法实现仅0.43%的Top-1精度损失，同时减少41.6% FLOPs与35.0%参数量。
在目标检测任务中，剪枝后的RetinaNet在COCO2017上实现37.6% mAP，参数量为25.55M，展现出优异的效率-精度权衡。
信息集中度度量能有效指导逐层剪枝比例，显著降低对人工调参的依赖。
Shapley值为通道重要性评分提供了可靠的游戏理论基础，显著提升了剪枝后的性能保持能力。
该框架具有良好的泛化能力，在多种模型与任务中均表现优异，涵盖分类与检测任务。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。