QUICK REVIEW

[論文レビュー] Pruning neural networks without any data by iteratively conserving synaptic flow

Hidenori Tanaka, Daniel Kunin|arXiv (Cornell University)|Jun 9, 2020

Advanced Memory and Neural Computing被引用数 256

ひとこと要約

本論文はデータ非依存の剪定法である SynFlow を提案し、学習やデータアクセスなしで層崩壊を回避するシナプスフローを保持しつつ、最大のクリティカル圧縮を達成する。

ABSTRACT

Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.

研究の動機と目的

初期化時の剪定がなぜ層崩壊を引き起こすのかを動機づけ、形式化する。
共通の活性化の下でシナプス重要度はニューロンと層全体で保存されることを示す。
大きい層ほど平均スコアが小さくなる理由を説明し、それが勾配ベースの手法での崩壊をもたらす。
最大限の臨界圧縮を達成するデータ非依存剪定アルゴリズムを開発する。
トレーニングデータを用いずに、SynFlow が最先端の剪定法と同等または上回ることを実験を通じて示す。

提案手法

シナプス顕著性を勾配とパラメータの Hadamard 積として定義し、ニューロンごとおよびネットワーク全体の保存則を示す。
勾配ベースのスコアが保存性を示すことを証明し、層サイズの依存性を説明する。
正の保存されたスコアを生み出すデータ非依存の損失に基づく Iterative Synaptic Flow Pruning (SynFlow) を導入する。
反復的で正で保守的なスコアリングが global masking の下で Maximal Critical Compression を満たすことを証明する。
SynFlow のアルゴリズム的疑似コードを提供し、計算コスト（100 回の剪定反復）を議論する。
複数のモデルとデータセットにわたって SynFlow を SNIP、GraSP、マグニチュード/ランダム剪定と経験的に比較する。

実験結果

リサーチクエスチョン

RQ1訓練やデータなしで、初期化時に高度にスパースな訓練可能サブネットワークを同定できるか？
RQ2勾配ベースのワンショット剪定法がなぜ層崩壊を引き起こしがちなのか、そしてこれをどのように軽減できるか？
RQ3データ非依存剪定法は層崩壊を避けつつ Maximal Critical Compression に到達できるか？
RQ4剪定スコアの反復評価は初期化時におけるネットワークの訓練性の保存にどのように影響するか？

主な発見

SynFlow はデータを使用せずに高圧縮領域で一貫してベースラインおよびデータ依存剪定法を上回る。
初期化時、シナプス顕著性の保存則がニューロンごとおよびネットワーク全体で成り立つことが、なぜいくつかのスコアで層崩壊が生じるのかを説明する。
層サイズと平均層スコアの逆相関は、勾配ベースの方法で大きな層が最初に剪定される理由を説明する。
反復的で正の保守的なスコアリングは global masking の下で Maximal Critical Compression を保証する（層崩壊なし）。
SynFlow はデータ非依存剪定を用いて、12 のモデル/データセット組み合わせで最先端の剪定性能を達成する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。