QUICK REVIEW

[论文解读] One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

Ari S. Morcos, Haonan Yu|arXiv (Cornell University)|Jun 6, 2019

Generative Adversarial Networks and Image Synthesis参考文献 33被引用 107

一句话总结

本文表明，在一个数据集或优化器上找到的 winning ticket 初始化，在自然图像任务中通常会转移到其他数据集和优化器，特别是当其来源于更大数据集时。

ABSTRACT

The success of lottery ticket initializations (Frankle and Carbin, 2019) suggests that small, sparsified networks can be trained so long as the network is initialized appropriately. Unfortunately, finding these "winning ticket" initializations is computationally expensive. One potential solution is to reuse the same winning tickets across a variety of datasets and optimizers. However, the generality of winning ticket initializations remains unclear. Here, we attempt to answer this question by generating winning tickets for one training configuration (optimizer and dataset) and evaluating their performance on another configuration. Perhaps surprisingly, we found that, within the natural images domain, winning ticket initializations generalized across a variety of datasets, including Fashion MNIST, SVHN, CIFAR-10/100, ImageNet, and Places365, often achieving performance close to that of winning tickets generated on the same dataset. Moreover, winning tickets generated using larger datasets consistently transferred better than those generated using smaller datasets. We also found that winning ticket initializations generalize across optimizers with high performance. These results suggest that winning ticket initializations generated by sufficiently large datasets contain inductive biases generic to neural networks more broadly which improve training across many settings and provide hope for the development of better initialization methods.

研究动机与目标

研究 winning ticket 初始化是否在自然图像领域的跨数据集上具有泛化性。
评估 winning ticket 在跨优化器上的转移能力。
考察数据集大小和类别数如何影响跨任务的 winning ticket 泛化性。

提出的方法

使用逐次幅值剪枝，每次剪枝率为 20%，并进行对初始值的后期重置。
比较全局剪枝与局部剪枝，结果显示全局剪枝更优。
将源数据集/优化器上生成的 winning tickets 转移到目标数据集/优化器上并评估性能。
在转移时排除最后的分类层，因为输出类别不同，将其随机重新初始化。

实验结果

研究问题

RQ1在自然图像分类任务中，winning tickets 是否能在数据集之间转移？
RQ2winning tickets 是否在优化器之间转移（带动量的 SGD vs. Adam）？
RQ3源数据集的大小和类别数是否影响转移效果？

主要发现

跨数据集转移的 winning tickets 在多个目标数据集上通常的表现接近数据集特定的 winning tickets。
由更大数据集生成的 winning tickets 的泛化性优于来自较小数据集的。
转移的 tickets 在不同优化器之间具有泛化性，表明存在与优化器无关的归纳偏置。
全局幅值剪枝的性能优于逐层剪枝，并且倾向于对更深的层进行更积极的剪枝，同时保留前几层。
转移的 tickets 可以缓解在参数过多的网络中的过拟合，尤其是在非常小的数据集上。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。