[论文解读] Stabilizing the Lottery Ticket Hypothesis
论文表明在训练早期通过回卷到训练后几%,而不是初始化,可以得到高度稀疏的子网络,在 CIFAR-10 和 ImageNet 上能达到或超过原网络的准确性,并将稳定性作为关键解释。
Pruning is a well-established technique for removing unnecessary structure from neural networks after training to improve the performance of inference. Several recent results have explored the possibility of pruning at initialization time to provide similar benefits during training. In particular, the "lottery ticket hypothesis" conjectures that typical neural networks contain small subnetworks that can train to similar accuracy in a commensurate number of steps. The evidence for this claim is that a procedure based on iterative magnitude pruning (IMP) reliably finds such subnetworks retroactively on small vision tasks. However, IMP fails on deeper networks, and proposed methods to prune before training or train pruned networks encounter similar scaling limitations. In this paper, we argue that these efforts have struggled on deeper networks because they have focused on pruning precisely at initialization. We modify IMP to search for subnetworks that could have been obtained by pruning early in training (0.1% to 7% through) rather than at iteration 0. With this change, it finds small subnetworks of deeper networks (e.g., 80% sparsity on Resnet-50) that can complete the training process to match the accuracy of the original network on more challenging tasks (e.g., ImageNet). In situations where IMP fails at iteration 0, the accuracy benefits of delaying pruning accrue rapidly over the earliest iterations of training. To explain these behaviors, we study subnetwork "stability," finding that - as accuracy improves in this fashion - IMP subnetworks train to parameters closer to those of the full network and do so with improved consistency in the face of gradient noise. These results offer new insights into the opportunity to prune large-scale networks early in training and the behaviors underlying the lottery ticket hypothesis
研究动机与目标
- 研究为什么在初始化时剪枝对较深的网络无效,以及在早期训练中剪枝是否能产生可训练的子网络。
- 评估回卷到早期训练迭代对子网络性能与稳定性的影响。
- 引入并分析对剪枝和数据顺序的稳定性,作为影响 Lottery Tickets 的机制。
提出的方法
- 将迭代幅度剪枝(IMP)修改为回卷到训练早期的权重(训练进度达到 k%)而非迭代 0。
- 在 CIFAR-10 上使用 LeNet、ResNet-18 和 VGG-19 对 IMP 进行有无回卷的评估,并与随机剪枝进行比较。
- 衡量两种稳定性:对剪枝的稳定性和对数据顺序的稳定性,使用训练后被屏蔽权重之间的 L2 距离。
- 结合回卷,将实验扩展到大规模的 ImageNet 模型(ResNet-50、Inception-v3、SqueezeNet)。
- 分析后期回卷如何提升子网络的稳定性和准确性,以及这与 Lottery Ticket Hypothesis 的关系。
实验结果
研究问题
- RQ1在初始化阶段由 IMP 识别的子网络是否也能在更深的网络中训练至相似的准确性?
- RQ2在早期训练中晚些时候剪枝(回卷)是否会产生更小、可训练的子网络,其性能与原始网络相匹配或超过?
- RQ3子网络稳定性(对剪枝和对数据顺序)是否是发现 Winning Tickets 的预测因子?
- RQ4回卷如何影响像 ImageNet 这样的大规模任务中的高稀疏子网络?
主要发现
- 在初始化阶段,对深层网络如 ResNet-18 和 VGG-19,若不进行学习率调整,IMP 无法找到 Winning Tickets。
- 回卷到早期训练迭代(0.1%–7%)时,子网络在 CIFAR-10 上可达到 50%–99% 的稀疏度并达到与全网络相同的准确性。
- 在 ImageNet 上,回卷到训练的 4.4%、3.5% 和 6.6% 时,所得到的子网络分别比原网络小 70%、70%、50%,并能达到原始准确性,分别对应 Resnet-50、Inception-v3、SqueezeNet。
- 由 IMP 找到的子网络在对剪枝和对数据顺序方面的稳定性均显著高于随机剪枝的子网络,且此稳定性与更高的准确性相关。
- 后期回卷迭代持续提升在初次未产生 Winning Ticket 的子网络中的稳定性和准确性。
- 结果提出了带回卷的修订版 Lottery Ticket Hypothesis,指出在训练早期,而非仅在初始化时剪枝的机会。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。