QUICK REVIEW

[論文レビュー] Rethinking the Value of Network Pruning

Zhuang Liu, Mingjie Sun|arXiv (Cornell University)|Oct 11, 2018

Anomaly Detection Techniques and Applications被引用数 742

ひとこと要約

本論文は、構造的プルーニングにおいて、剪定されたモデルを最初から訓練することが、継承ウェイトを用いたファインチューニングと同等かそれを上回ることが多く、剪定されたアーキテクチャ自体が効率性の主要因であることを示唆しており、剪定はアーキテクチャ探索として機能し得ることを示している。

ABSTRACT

Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization.

研究の動機と目的

剪定前に大規模で過剰なパラメータを持つモデルを訓練する必要性を疑問視する。
継承ウェイトを用いた剪定モデルのファインチューニングが、スクラッチから訓練した剪定モデルよりも優れているかを評価する。
事前定義されたターゲットと自動的に発見される（アーキテクチャ発見型）プルーニングターゲットの効果を区別する。
剪定が主に重量選択ではなくアーキテクチャ探索として機能するかを評価する。
構造的プルーニングと非構造的プルーニングを比較し、ロトニー宝くじ仮説（Lottery Ticket Hypothesis）との関連を示す。

提案手法

プルーニングを事前定義されたターゲットアーキテクチャと自動的に発見されたターゲットアーキテクチャに分類する。
剪定モデルをスクラッチから訓練（Scratch-E、Scratch-B）する vs 継承ウェイトからファインチューニングする。
複数の剪定手法（L1-norm フィルター剪定、ThiNet、回帰ベースの再構成、Network Slimming、Sparse Structure Selection）と非構造的マグニチュードベース剪定を適用する。
CIFAR-10、CIFAR-100、ImageNet上でVGG、ResNet、DenseNetの variants を評価する。
剪定アーキテクチャのパラメータ効率とスパーシティパターンを分析する。
Lottery Ticket Hypothesisと比較し、アーキテクチャ探索への示唆を論じる。

Figure 1: A typical three-stage network pruning pipeline.

実験結果

リサーチクエスチョン

RQ1定義済みおよび自動的 pruning ターゲット全体で、継承ウェイトを用いてファインチューニングした剪定モデルは、スクラッチから訓練した同じ剪定アーキテクチャを上回るか。
RQ2剪定アーキテクチャが、保存されたウェイトよりも最終的な効率と精度を決定する程度はどの程度か。
RQ3剪定は大規模モデルの事前訓練なしに、パラメータ効率の良いアーキテクチャを生み出す効果的なアーキテクチャ探索手法になり得るか。
RQ4構造的プルーニングと非構造的プルーニングは、ImageNetのような大規模データセット上でスクラッチ訓練された剪定モデルを訓練する能力においてどのように比較されるか。

主な発見

事前定義された構造的プルーニングでは、スクラッチ訓練済みのモデルがファインチューニングした counterparts の精度に達するかそれを上回り、Scratch-Bは多くの場合Scratch-Eより優れ、Imagenet ではファインチューニングを上回ることもある。
自動的な構造的プルーニングでは、スクラッチ訓練済みの剪定モデルは概してファインチューニング済みモデルに匹敵するかそれを上回り、Scratch-Bが頻繁に優れている。
非構造的プルーニングはImageNetでスクラッチ訓練がファインチューニングより劣ることを示しており、構造的プルーニングとの違いを強調する。
自動剪定法で得られた剪定アーキテクチャは、一様に剪定されたアーキテクチャよりもパラメータ効率が高く、アーキテクチャ探索の価値を示している。
ガイド付き/剪定アーキテクチャは他のモデルやデータセットへ設計パターンを転用でき、特定の剪定モデルを超える実用的な設計原則を示唆している。

Figure 2: Difference between predefined and automatically discovered target architectures, in channel pruning as an example. The pruning ratio $x$ is user-specified, while $a,b,c,d$ are determined by the pruning algorithm. Unstructured sparse pruning can also be viewed as automatic.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。