QUICK REVIEW

[论文解读] SNIP: Single-shot Network Pruning based on Connection Sensitivity

Namhoon Lee, Thalaiyasingam Ajanthan|arXiv (Cornell University)|Oct 4, 2018

Advanced Neural Network Applications参考文献 29被引用 363

一句话总结

SNIP 在训练前通过衡量剪裁一个连接对损失的影响来识别重要的网络连接，然后将其修剪到目标稀疏度并训练得到的稀疏网络，在各种体系结构中在极高稀疏度下仍能接近原始准确度。

ABSTRACT

Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.

研究动机与目标

推动对大网络进行修剪，以在不显著降低性能的情况下降低内存和计算需求。
提出一种数据相关的显著性标准，在训练前识别结构上重要的连接。
实现初始化时一次性修剪，消除预训练和迭代剪枝–训练循环的需求。
证明该方法在多种架构和数据集上的鲁棒性。

提出的方法

定义二进制连接指示符 c 和权重向量 w，以将修剪建模为稀疏度控制。
将连接敏感度 s_j 计算为对 c_j 的损失导数的归一化幅值：s_j = |g_j(w; D)| / ∑_k |g_k(w; D)|，其中 g_j = ∂L(c ⊙ w; D)/∂c_j 在 c=1 时的值。
通过将 c_j 设为 1 对于最大的 kappa 个 s_j，其余设为 0，以保留前 kappa 条连接。
在初始化时仅剪枝一次，通过带有修剪掩码的最小化问题 min_w L(c ⊙ w; D) 来实现剪枝，然后按标准方式训练稀疏网络。
初始化使用方差缩放的权重，以确保跨体系结构的梯度信号一致。
使用一个小批量数据来计算显著性，支持将显著性在多个小批量上累积，或在内存允许时使用验证集/全数据集。
算法 SNIP 以四步进行：在一个小批量上计算 s_j、从 s_j 推导剪枝掩码、在该掩码下优化 w、最终将掩码应用到训练后的权重。

实验结果

研究问题

RQ1数据相关的显著性标准是否能在训练前识别出重要连接？
RQ2在不同架构和数据集上，可以实现多少稀疏度而不显著损失准确性？
RQ3初始化时的修剪对架构类型（卷积神经网络、残差网络、递归神经网络）和初始化方案是否鲁棒？
RQ4当结合输入数据进行检查时，该方法是否揭示保留的连接确实与任务相关？
RQ5使用小批量来计算显著性对修剪结果和最终性能有何影响？

主要发现

SNIP 在多种架构下，在 MNIST、CIFAR-10 和 Tiny-ImageNet 上产出极其稀疏的模型，其准确率几乎与参照网络相同。
对 LeNet-300-100 的修剪达 98%，对 LeNet-5-Caffe 的修剪达 99%，仍然达到与稠密基线相当或更好的精度。
该方法可泛化到卷积、残差和递归网络，无需特定架构的剪枝计划或预训练。
基于显著性的剪枝表明保留的连接与判别性输入特征对齐，表明确实与任务相关。
性能与许多现有剪枝方法相比仍具竞争力，甚至优于它们，同时不需要额外的超参数或预训练。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。