QUICK REVIEW

[論文レビュー] Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee

Alireza Aghasi, Afshin Abdi|arXiv (Cornell University)|Nov 16, 2016

Stochastic Gradient Optimization Techniques被引用数 114

ひとこと要約

Net-Trimは、層ごとの凸法 pruning 方法を導入し、訓練済みニューロルネットワークを疎にする一方で、制御可能な許容誤差内で層の入力-出力関係を保持し、性能保証と2つの再訓練方式（並列とカスケード）を提供する。

ABSTRACT

We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex optimization program. This program seeks a sparse set of weights at each layer that keeps the layer inputs and outputs consistent with the originally trained model. The algorithms and associated analysis are applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. We present both parallel and cascade versions of the algorithm. While the latter can achieve slightly simpler models with the same generalization performance, the former can be computed in a distributed manner. In both cases, Net-Trim significantly reduces the number of connections in the network, while also providing enough regularization to slightly reduce the generalization error. We also provide a mathematical analysis of the consistency between the initial network and the retrained model. To analyze the model sample complexity, we derive the general sufficient conditions for the recovery of a sparse transform matrix. For a single layer taking independent Gaussian random vectors of length $N$ as inputs, we show that if the network response can be described using a maximum number of $s$ non-zero weights per node, these weights can be learned from $\\mathcal{O}(s\\log N)$ samples.

研究の動機と目的

訓練済み深層ネットワークの過剰適合と冗長性を低減するためのモデル縮小を動機づける。
層ごとに凸プルーニングを適用して疎な重み行列を生み出すフレームワークを開発する。
元のネットワークと再訓練後のネットワークの一貫性に関する理論的保証を提供する。
計算的に扱いやすい実用的な並列再訓練とカスケード再訓練の方式を提供する。

提案手法

ReLU制約の凸緩和によって層ごとの一貫性を課すとともに、重み行列のl1ノルムを最小化することで各層を剪定する凸代理を定式化する。
特定の層について、元の層出力と整合する近似的なポスト活性化出力を満たす制約の下で min ||U||1 を解く。
二つの再訓練方式を提供: parallel Net-Trim（独立した層の再訓練）と cascade Net-Trim（層出力を次の再訓練へ伝搬）。
層間での再訓練誤差の伝搬に関する理論的境界を導出する（Theorem 1 および Theorem 2）。
特殊ケース解析では、Gaussian入力設定における疎な重み行列の学習のサンプル複雑性を示す（Theorem 3）。
現実的な剪定能力を高い疎度で示し（例として>93% のリンクが剪定される）、既存のトレーニング正則化手法と互換性があることを示す。

実験結果

リサーチクエスチョン

RQ1層ごとの凸プログラムは、元の層出力と再訓練後の層出力の類似性を維持しつつ、疎な重み行列を回復できるか？
RQ2層を逐次再訓練する場合と並列で再訓練する場合の誤差伝搬に関する理論的保証は何か？
RQ3Gaussian入力仮定の下で疎な層変換を回復するのに必要なサンプル数はどれくらいか？
RQ4並列Net-TrimとカスケードNet-Trimは、疎度、実現可能性、一般化性能の点でどう比較されるか？
RQ5Net-Trimは、再訓練をゼロから行わずに、既存の正則化手法と訓練後に組み合わせることができるか？

主な発見

Net-Trimは、層を横断して元のネットワーク応答を制御されたepsilon以内に保ちつつ、顕著な疎化を達成する。
Parallel Net-Trimは各層を凸計画で独立に再訓練し、分散計算を可能にし、層ごとの誤差蓄積の上限（epsilonの和）を生む。
Cascade Net-Trimは実現可能性を保つため許容範囲を拡張して層を逐次再訓練し、わずかに異なる誤差増加とともにより疎なモデルを生み出す可能性がある。
Gaussian入力では、列あたり最大s個の非ゼロを持つ疎な重み行列はO(s log N)個のサンプルから学習可能（Theorem 3）。
Net-Trimはすでに訓練済みのネットワークを後処理して、Dropout や l1 ペナルティなど従来の正則化手法を超えてモデルの複雑性をさらに低減できる。
この枠組みは、元のネットワークと再訓練後のネットワークとの密接な対応を保持する、原理的で凸的な剪定手法を提供し、整合性の保証を行う。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。