[论文解读] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
论文表明,密集神经网络包含稀疏子网((winning tickets)),若从原始值初始化并从头训练,可以在相似的迭代中达到原网络的准确性,且参数量通常仅为原网络的10–20%。
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
研究动机与目标
- 激励为何剪枝友好的稀疏结构对训练效率和推理有利。
- 测试是否存在可从其原始初始化训练以实现可比性能的稀疏子网络。
- 通过迭代剪枝在 MNIST 和 CIFAR-10 上实证识别 winning tickets。
- 评估初始化、稀疏度和结构如何影响 winning tickets 的可训练性和泛化能力。
提出的方法
- 训练一个密集网络,剪掉最小幅度的权重,并将幸存权重重置为初始值,以形成一个 winning ticket。
- 在 n 轮中迭代地剪去剩余权重的分数 p^(1/n),以找到更小的 winning tickets。
- 在 SGD/动量/Adam 下,比较 winning tickets 相对于原始网络的训练动态和测试准确性。
- 在 MNIST 和 CIFAR-10 上评估全连接和卷积架构。
- 重新初始化 winning tickets 时,将权重重置为随机初始化,以观察性能并测试初始化的重要性。
实验结果
研究问题
- RQ1在随机初始化的密集网络中,是否存在可以训练到与整个网络相当精度的稀疏子网络?
- RQ2迭代剪枝与一次性剪枝在胜出票的规模和性能上有何影响?
- RQ3初始化与网络结构在胜出票成功中的作用是什么?
- RQ4胜出票相比原始网络是否具有更好的泛化能力,在哪些稀疏水平下?
主要发现
- 胜出票存在于原始参数数量的10–20%范围内,并且在相近的训练迭代中可以达到或超过测试准确率。
- 迭代剪枝发现更小的 winning tickets,学习更快,且常常达到比原网络更高的测试准确率。
- 对一个 winning ticket 的权重进行随机重新初始化会降低性能,突出初始化的重要性。
- 胜出票在某些稀疏度下显示出更好的泛化,训练和测试准确率差距变小。
- 这一效应在多种体系结构(Lenet、Conv-2/4/6、VGG-19、ResNet-18)和优化器上都可观察到,尽管学习率和预热对成功有影响,尤其是在更深的网络中。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。