QUICK REVIEW

[论文解读] Pruning neural networks without any data by iteratively conserving synaptic flow

Hidenori Tanaka, Daniel Kunin|arXiv (Cornell University)|Jun 9, 2020

Advanced Memory and Neural Computing被引用 256

一句话总结

论文提出了一种数据无关的剪枝方法 SynFlow，旨在通过保持突触流来避免层级崩溃并在不进行训练或使用数据的情况下实现最大临界压缩。

ABSTRACT

Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.

研究动机与目标

动机化并形式化解释为什么在初始化阶段进行剪枝会导致层崩溃。
证明在常见激活下，突触显著性在神经元和层之间保持守恒。
解释为何较大层的平均分数较小，导致梯度基方法中的层崩溃。
开发一种数据无关的剪枝算法，实现最大临界压缩。
通过实验表明，SynFlow 在无需训练数据的情况下达到或超过最先进的剪枝方法。

提出的方法

将突触显著性定义为梯度与参数的Hadamard乘积，并展示神经元级和网络级的守恒定律。
证明基于梯度的分数具备守恒性质，解释层大小依赖性。
基于数据无关损失引入迭代突触流剪枝（SynFlow），得到正向且守恒的分数。
证明迭代的、正向的、保守的分数在全局掩码下实现最大临界压缩。
给出 SynFlow 的算法伪代码并讨论计算成本（100 次剪枝迭代）。
在多种模型和数据集上，实证比较 SynFlow 与 SNIP、GraSP 以及幅度/随机剪枝。

实验结果

研究问题

RQ1是否可以在初始化时在不进行训练或使用数据的情况下识别高度稀疏的可训练子网络？
RQ2为什么基于梯度的单次剪枝方法容易导致层崩溃，以及如何缓解？
RQ3在避免层崩溃的同时，数据无关的剪枝方法是否能够达到最大临界压缩？
RQ4剪枝分数的迭代评估如何影响初始化时网络可训练性的保持？

主要发现

SynFlow 在高压缩比下无需数据就持续优于基线和数据相关的剪枝方法。
在初始化时，突触显著性的守恒定律在神经元级和网络级成立，解释了为何某些分数会导致层崩溃。
层大小与层平均分数之间的反比关系解释了为何大型层首先被梯度基方法剪掉。
迭代的、正向的、保守的分数在全局掩码下保证最大临界压缩（无层崩溃）。
SynFlow 在 12 个模型/数据集组合中实现了数据无关剪枝的最先进剪枝性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。