QUICK REVIEW

[論文レビュー] A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM

Shaokai Ye, Tianyun Zhang|arXiv (Cornell University)|Nov 5, 2018

Advanced Neural Network Applications参考文献 24被引用数 43

ひとこと要約

この論文は、DNNの重みのプルーニングと重みのクラスタリング/量子化を同時に行う統一ADMMベースの枠組みを提示し、精度低下なしで大幅な圧縮を達成する（例：LeNet-5で167×のプルーニング; AlexNetで24.7×; プルーニングとクラスタリングを組み合わせた場合は最大1,910×のストレージ削減）。”

ABSTRACT

Many model compression techniques of Deep Neural Networks (DNNs) have been investigated, including weight pruning, weight clustering and quantization, etc. Weight pruning leverages the redundancy in the number of weights in DNNs, while weight clustering/quantization leverages the redundancy in the number of bit representations of weights. They can be effectively combined in order to exploit the maximum degree of redundancy. However, there lacks a systematic investigation in literature towards this direction. In this paper, we fill this void and develop a unified, systematic framework of DNN weight pruning and clustering/quantization using Alternating Direction Method of Multipliers (ADMM), a powerful technique in optimization theory to deal with non-convex optimization problems. Both DNN weight pruning and clustering/quantization, as well as their combinations, can be solved in a unified manner. For further performance improvement in this framework, we adopt multiple techniques including iterative weight quantization and retraining, joint weight clustering training and centroid updating, weight clustering retraining, etc. The proposed framework achieves significant improvements both in individual weight pruning and clustering/quantization problems, as well as their combinations. For weight pruning alone, we achieve 167x weight reduction in LeNet-5, 24.7x in AlexNet, and 23.4x in VGGNet, without any accuracy loss. For the combination of DNN weight pruning and clustering/quantization, we achieve 1,910x and 210x storage reduction of weight data on LeNet-5 and AlexNet, respectively, without accuracy loss. Our codes and models are released at the link http://bit.ly/2D3F0np

研究の動機と目的

weight pruningとweight clustering/quantizationの組み合わせに関する体系的研究の不足を動機づけ、解決する。
単一の定式化でプルーニングとクラスタリング/量子化（およびその組み合わせ）を行う統一的ADMMベースの最適化フレームワークを開発する。
標準的なネットワーク全体で精度を保ちながらモデルサイズを大幅に削減することを実証する。
反復的量子化/再学習とセントロイド更新を含む実用的なトレーニング手順を提供し、性能を高める。

提案手法

重みプルーニングとクラスタリング/量子化の制約付き指標関数を用いた共同最適化としてDNN圧縮を定式化する。
ADMMを適用してサブ問題に分解する： (i) 二乗ペナルティを用いたDNNトレーニング、 (ii) プルーニング集合への射影、 (iii) クラスタリング/量子化集合への射影。
ユークリッド射影を用いてスパース性を強制（上位αの重みを維持）し、重みを固定量子化レベルまたはクラスタのセントロイドのいずれかに割り当てる。
デュアル変数を反復的に更新し、再学習を実施して精度を回復する。順序としては任意で：まずプルーニング、次にクラスタリング/量子化。
反復的な重み量子化と再学習、およびクラスタリングベースの圧縮のための動的セントロイド更新を提供する。

実験結果

リサーチクエスチョン

RQ1プルーニングと重みのクラスタリング/量子化は統一されたADMMフレームワークで together optimize できるか。
RQ2結合したプルーニングとクラスタリング/量子化を使用した場合、精度を損なうことなく一般的なDNNでどの程度の圧縮比を達成できるか。
RQ3プルーニングを先行してからクラスタリング/量子化を行うアプローチは、同時に処理する方法より性能が良くなるか。
RQ4反復的な量子化と再学習は最終的な精度とストレージ効率にどのように影響するか。
RQ5圧縮を最大化するための層ごとのプルーニングとクラスタリング/量子化設定の実用的なガイドラインは何か。

主な発見

モデル	精度低下	重みの数	CONV 重みビット	FC 重みビット	総データサイズ/ 圧縮比	総モデルサイズ（インデックスを含む）/ 圧縮比
LeNet-5 Baseline	0.0%	430.5K	32	32	1.7MB	1.7MB
Iterative pruning (?)	0.1%	35.8K	8	5	24.2KB / 70.2×	52.1KB / 33×
Our Method (Clustering)	0.1%	2.57K	3	2 (3 for output layer)	0.89KB / 1,910×	2.73KB / 623×
Our Method (Quantization)	0.2%	2.57K	3	2 (3 for output layer)	0.89KB / 1,910×	2.73KB / 623×

LeNet-5で精度低下なし（プルーニングのみ）で167×の重み削減。
AlexNetで精度低下なし（プルーニングのみ）で24.7×の重み削減。
VGGNetで精度低下なし（プルーニングのみ）で23.4×の重み削減。
プルーニングとクラスタリング/量子化を組み合わせるとLeNet-5で1,910×のストレージ削減、AlexNetで210×のストレージ削減を精度低下なしで達成（プルーニングのインデックスを考慮しない場合）。
インデックスを含めると総モデルサイズ削減はLeNet-5で623×、AlexNetで90×となる。
LeNet-5では、結合法により88×のプルーニングと層間で約2.4ビットの平均量子化を達成。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。