QUICK REVIEW

[論文レビュー] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Jonathan Frankle, Michael Carbin|arXiv (Cornell University)|Mar 9, 2018

Adversarial Robustness in Machine Learning参考文献 64被引用数 1,319

ひとこと要約

本論文は、密なニューラルネットワークが稀なサブネットワーク（勝利チケット）を含み、それらを元の値で初期化して最初から学習させると、同様の反復数で元の精度に匹敵できることを示しており、しばしばパラメータの10–20%程度で達成される。

ABSTRACT

Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.

研究の動機と目的

トレーニング効率と推論の観点から、剪定しやすい疎なアーキテクチャがなぜ望ましいのかを動機づける。
元の初期値から訓練を開始して、同等の性能を達成できる疎なサブネットワークが存在するかを検証する。
反復剪定を用いて、MNISTとCIFAR-10に渡る勝利チケットを経験的に同定する。
初期化、疎さ、アーキテクチャが勝利チケットの学習性および一般化にどのように影響するかを評価する。

提案手法

密なネットワークを訓練し、最小振幅の重みを剪定し、残存する重みを初期値にリセットして勝利チケットを形成する。
残存重みのうち、nラウンドにわたって段階的に割合 p^(1/n) を剪定して、より小さな勝利チケットを見つける。
SGD・モーメンタム・Adam の下で、勝利チケットと元のネットワークの訓練ダイナミクスとテスト精度を比較する。
MNIST と CIFAR-10 に対して、全結合および畳み込みアーキテクチャの両方を評価する。
勝利チケットを再初期化する場合、ランダムな初期値にリセットして性能を観察し、初期化の重要性を検証する。

実験結果

リサーチクエスチョン

RQ1ランダムに初期化された dense ネットワーク内に、完全なネットワークと同等の精度に訓練できる疎なサブネットワークが存在するか？
RQ2反復剪定と一括剪定が、勝利チケットのサイズと性能にどのように影響するか？
RQ3勝利チケットの成功において、初期化とネットワーク構造の役割はどうなるか？
RQ4勝利チケットは元のネットワークより一般化能力が高いのか、どの疎さのレベルでそうなるのか？

主な発見

勝利チケットは元のパラメータ数の10–20%で存在し、同等の訓練反復でテスト精度を一致または上回ることができる。
反復剪定は、学習が速い小さな勝利チケットを見つけ、元のネットワークより高いテスト精度を達成することが多い。
勝利チケットの重みをランダムに再初期化すると性能が低下し、初期化の重要性を浮き彫りにする。
特定の疎さにおいて、トレーニング精度とテスト精度の差が小さくなり、一般化能力が向上する。
この効果は複数のアーキテクチャ（LeNet、Conv-2/4/6、VGG-19、ResNet-18）と最適化手法で観察され、学習率やウォームアップが成功に影響を与えることもある。特に深いネットでは。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。