QUICK REVIEW

[논문 리뷰] A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM

Shaokai Ye, Tianyun Zhang|arXiv (Cornell University)|2018. 11. 05.

Advanced Neural Network Applications참고 문헌 24인용 수 43

한 줄 요약

이 논문은 DNN 가중치 가지치기와 가중치 클러스터링/양자화를 함께 수행하는 통합 ADMM 기반 프레임워크를 제시하며, 정확도 손실 없이 상당한 압축을 달성한다(예: LeNet-5에서 167배 가지치기; AlexNet에서 24.7배; 가지치기와 클러스터링을 결합할 때 저장 공간이 최대 1,910배 감소).

ABSTRACT

Many model compression techniques of Deep Neural Networks (DNNs) have been investigated, including weight pruning, weight clustering and quantization, etc. Weight pruning leverages the redundancy in the number of weights in DNNs, while weight clustering/quantization leverages the redundancy in the number of bit representations of weights. They can be effectively combined in order to exploit the maximum degree of redundancy. However, there lacks a systematic investigation in literature towards this direction. In this paper, we fill this void and develop a unified, systematic framework of DNN weight pruning and clustering/quantization using Alternating Direction Method of Multipliers (ADMM), a powerful technique in optimization theory to deal with non-convex optimization problems. Both DNN weight pruning and clustering/quantization, as well as their combinations, can be solved in a unified manner. For further performance improvement in this framework, we adopt multiple techniques including iterative weight quantization and retraining, joint weight clustering training and centroid updating, weight clustering retraining, etc. The proposed framework achieves significant improvements both in individual weight pruning and clustering/quantization problems, as well as their combinations. For weight pruning alone, we achieve 167x weight reduction in LeNet-5, 24.7x in AlexNet, and 23.4x in VGGNet, without any accuracy loss. For the combination of DNN weight pruning and clustering/quantization, we achieve 1,910x and 210x storage reduction of weight data on LeNet-5 and AlexNet, respectively, without accuracy loss. Our codes and models are released at the link http://bit.ly/2D3F0np

연구 동기 및 목표

가중치 가지치기와 가중치 클러스터링/양자화를 결합하는 체계적 연구의 부족을 동기 부여하고 해결한다.
단일 수식으로 가지치기와 클러스터링/양자화를 수행하는 통합 ADMM 기반 최적화 프레임워크를 개발한다.
정확도를 유지하면서 표준 네트워크 전반에 걸친 모델 크기 감소를 입증한다.
반복적 양자화/재학습 및 중심점 업데이트를 포함한 실용적 학습 절차를 제시하여 성능을 높인다.

제안 방법

가지치기 및 클러스터링/양자화에 대한 제약 조건을 표시 함수(indicator functions)로 표현하여 DNN 압축을 공동 최적화로 공식화한다.
ADMM을 적용하여 문제를 서브프로블럼으로 분해한다: (i) 제곱 페널티가 있는 DNN 학습, (ii) 가지치기 집합으로의 투영, (iii) 클러스터링/양자화 집합으로의 투영.
희소성을 강제하기 위한 유클리드 투영(상위 α 가중치 유지) 및 가중치를 고정 양자화 레벨 또는 클러스터 중심으로 배정하기 위한 투영 사용.
정확도를 회복하기 위한 재학습과 듀얼 변수의 반복적 업데이트를 수행하며, 선택적 순서는 가지치기를 먼저 수행한 후 클러스터링/양자화를 수행하는 방식이다.
반복적 가중치 양자화 및 재학습을 제공하고, 클러스터링 기반 압축을 위한 중심점 동적 업데이트를 수행한다.

실험 결과

연구 질문

RQ1가지치기와 가중치 클러스터링/양자화를 통합된 ADMM 프레임워크에서 함께 최적화할 수 있는가?
RQ2공통 DNN에 대해 가지치기와 클러스터링/양자화를 함께 사용할 때 정확도 손실 없이 달성할 수 있는 압축 비율은 얼마인가?
RQ3가지치기를 먼저 하고 클러스터링/양자화를 수행하는 접근 방식이 동시 처리보다 성능이 더 우수한가?
RQ4반복적 양자화 및 재학습이 최종 정확도와 저장 효율성에 어떤 영향을 미치는가?
RQ5압축 극대화를 위한 레이어별 가지치기 및 클러스터링/양자화 설정에 대한 실용적 가이드라인은 무엇인가?

주요 결과

LeNet-5에서 정확도 손실 없이 가지치기만으로 167× 가중치 감소.
AlexNet에서 정확도 손실 없이 가지치기만으로 24.7× 가중치 감소.
VGGNet에서 정확도 손실 없이 가지치기만으로 23.4× 가중치 감소.
가지치기와 클러스터링/양자화를 함께 적용하면 LeNet-5에서 저장 공간이 1,910× 감소하고 AlexNet에서 210× 감소하나 가지치기의 인덱스는 포함하지 않음(정확도 손실 없음).
인덱스를 포함하면 총 모델 크기 감소율은 LeNet-5에서 623×, AlexNet에서 90×.
LeNet-5의 경우, 결합 방법은 계층별 약 2.4비트 평균 양자화를 갖는 88× 가지치를 산출한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.