QUICK REVIEW

[論文レビュー] SNIP: Single-shot Network Pruning based on Connection Sensitivity

Namhoon Lee, Thalaiyasingam Ajanthan|arXiv (Cornell University)|Oct 4, 2018

Advanced Neural Network Applications参考文献 29被引用数 363

ひとこと要約

SNIPはトレーニング前に接続の重要性を測定してどの接続を剪定すべきかを判断し、目標スパース性に剪定してから得られたスパースネットワークを訓練することで、さまざまなアーキテクチャにおいて極端なスパース性でも元の精度に近い性能を達成する。

ABSTRACT

Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.

研究の動機と目的

大規模なネットワークを剪定してメモリと計算量を削減しつつ、性能の大幅な低下を防ぐ。
データ依存の顕性基準を提案し、トレーニング前に構造的に重要な接続を識別する。
初期化時に一括で剪定を実現し、事前訓練や反復的な剪定–訓練サイクルを不要にする。
多様なアーキテクチャとデータセットに対して手法の頑健性を示す。

提案手法

剪定をスパース性制御としてモデル化するために2値の接続指標 c と重みベクトル w を定義する。
損失の c_j に対する導関数の正規化された大きさとして接続感度 s_j を計算する：s_j = |g_j(w; D)| / sum_k |g_k(w; D)|、ここで g_j = ∂L(c ⊙ w; D)/∂c_j |_{c=1}。
s_j が大きい上位 kappa 個の接続を保持するように c_j = 1（その他は 0）に設定する。
初期化時に一度だけ剪定マスクを用いて min_w L(c ⊙ w; D) を解き、剪定されたマスクでスパースネットワークを標準的に訓練する。
初期化はアーキテクチャ間で一貫した勾配信号を確保するように分散スケーリングされた重みで実施する。
サリエンシを計算する際にミニバッチのデータを使用し、バッチごとにサリエンシを蓄積するか、メモリが許す場合は検証データ/全データを使用する柔軟性を持つ。
アルゴリズム SNIP は4つのステップで進行する：ミニバッチ上で s_j を計算し、s_j から剪定マスクを導出し、マスクの下で w を最適化し、最後に訓練済みの重みにマスクを適用する。

実験結果

リサーチクエスチョン

RQ1データ依存の顕性基準はトレーニング前に重要な接続を特定できるか。
RQ2異なるアーキテクチャやデータセット全体で、精度の目立った低下を伴わずにどれだけのスパース性を達成できるか。
RQ3初期化時の剪定はアーキテクチャタイプ（CNN、Residual、RNN）および初期化スキームに頑健か。
RQ4保持された接続が入力データに関して本当にタスクに関連するものであると検証することはできるか。
RQ5サリエンシ計算にミニバッチを使用することは剪定結果と最終性能にどのような影響を与えるか。

主な発見

SNIPは、MNIST、CIFAR-10、Tiny-ImageNet に対して、さまざまなアーキテクチャで参照ネットワークとほぼ同じ精度の非常にスパースなモデルを生み出す。
LeNet-300-100 で最大98％、LeNet-5-Caffe で最大99％の剪定レベルでも、密なベースラインと同等またはそれより良い精度を達成する。
この手法は畳み込み、残差、および再帰型ネットワークへと一般化し、アーキテクチャ固有の剪定スケジュールや事前訓練を必要としない。
サリエンシに基づく剪定は、保持された接続が識別的な入力特徴と整列することを明らかにし、実際のタスク関連性を示す。
パフォーマンスは多くの既存剪定手法と比べて競争力があるか、あるいはそれを上回る一方で、追加のハイパーパラメータや事前訓練を必要としない。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。