QUICK REVIEW

[論文レビュー] Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

Hesham Mostafa, Xin Wang|arXiv (Cornell University)|Feb 15, 2019

Machine Learning and Data Classification被引用数 123

ひとこと要約

本論文は、固定パラメータ予算で深層CNNを訓練する新規の動的スパース再パラメータ化手法を提案し、静的および動的ベースラインを上回り、CIFAR-10およびImageNetの実験で事後圧縮と同等またはそれを上回る精度を達成します。

ABSTRACT

Modern deep neural networks are typically highly overparameterized. Pruning techniques are able to remove a significant fraction of network parameters with little loss in accuracy. Recently, techniques based on dynamic reallocation of non-zero parameters have emerged, allowing direct training of sparse networks without having to pre-train a large dense model. Here we present a novel dynamic sparse reparameterization method that addresses the limitations of previous techniques such as high computational cost and the need for manual configuration of the number of free parameters allocated to each layer. We evaluate the performance of dynamic reallocation methods in training deep convolutional networks and show that our method outperforms previous static and dynamic reparameterization methods, yielding the best accuracy for a fixed parameter budget, on par with accuracies obtained by iteratively pruning a pre-trained dense model. We further investigated the mechanisms underlying the superior generalization performance of the resultant sparse networks. We found that neither the structure, nor the initialization of the non-zero parameters were sufficient to explain the superior performance. Rather, effective learning crucially depended on the continuous exploration of the sparse network structure space during training. Our work suggests that exploring structural degrees of freedom during training is more effective than adding extra parameters to the network.

研究の動機と目的

深層CNNの固定メモリ予算下でパラメータ効率の良い訓練を動機づける。
学習中に非ゼロパラメータを再配分する動的スパース再パラメータ化手法を開発する。
CNNとデータセット全体で、静的スパース、動的再パラメータ化、および圧縮のベースラインと比較評価する。
訓練中のダイナミックな構造探索による一般化利得の背後にある機構を調査する。

提案手法

ネットワークを非ゼロがあるスパースパラメータテンソルで表現し、非ゼロは勾配降下法で最適化され、その配置は訓練中に再割り当てされる。
大きさベースの剪定とランダム成長の2段階サイクルを用いて、自由パラメータを層内および層間で移動させる。
グローバル閾値 H を用いて剪定閾値を適応的に調整し、非ゼロパラメータの総数を固定に保つ。
新たに解放されたパラメータを、損失勾配が大きく、構造がよりスパースな層を優先するヒューリスティックに従って層間で再配分する。
CIFAR-10とImageNetで、完全密、薄密、静的スパース、圧縮スパース、DeepR、SET、HashedNet のベースラインと動的スパース再パラメータ化を比較する。

実験結果

リサーチクエスチョン

RQ1動的スパース再パラメータ化を用いて、固定予算のパラメータで深いCNNを効果的に訓練できるか。
RQ2訓練中の非ゼロウェイトの層間適応再配分は、静的スパースや事後訓練剪定より一般化を改善するか。
RQ3訓練中のネットワーク構造のダイナミックな探索は、最終的なスパース構造や初期化を超えて高い一般化を達成するために必要か。
RQ4動的スパース訓練を用いたとき、層およびブロック全体でどのような出現的なスパースパターンが現れるか。

主な発見

動的スパース訓練は、同じパラメータ予算で静的再パラメータ化より一般化性能が向上し、しばしば事後訓練圧縮ベースラインと同等またはそれを上回る。
最終的なスパースパターンは、より大きなパラメータテンソルが疎になりやすく、より深い層が疎になる傾向を示す。
このアプローチは、競合する動的手法に対して計算オーバーヘッドがほとんどなく、層間でパラメータを自動的に再配分できる。
優れた性能は、訓練中の継続的な構造探索に起因し、最終的なスパース構造や初期化だけによるものではない。
初期エポック後に動的再割り当てを停止しても収束を達成しうることから、早期の構造探索が重要であることが示唆される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。