QUICK REVIEW

[논문 리뷰] Model compression via distillation and quantization

Antonio Polino, Razvan Pascanu|arXiv (Cornell University)|2018. 02. 15.

Advanced Neural Network Applications참고 문헌 26인용 수 262

한 줄 요약

본 논문은 두 가지 방법—quantized distillation 과 differentiable quantization—를 도입하여, 완전 정밀도 교사로부터 더 얕은, quantized 학생들에게 증류함으로써 심층 네트워크를 압축하고, 시각 및 언어 작업 전반에서 강한 정확도 유지와 상당한 압축을 달성한다.

ABSTRACT

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep models in resource-constrained environments, such as mobile or embedded devices. This paper focuses on this problem, and proposes two new compression methods, which jointly leverage weight quantization and distillation of larger teacher networks into smaller student networks. The first method we propose is called quantized distillation and leverages distillation during the training process, by incorporating distillation loss, expressed with respect to the teacher, into the training of a student network whose weights are quantized to a limited set of levels. The second method, differentiable quantization, optimizes the location of quantization points through stochastic gradient descent, to better fit the behavior of the teacher model. We validate both methods through experiments on convolutional and recurrent architectures. We show that quantized shallow students can reach similar accuracy levels to full-precision teacher models, while providing order of magnitude compression, and inference speedup that is linear in the depth reduction. In sum, our results enable DNNs for resource-constrained environments to leverage architecture and accuracy advances developed on more powerful devices.

연구 동기 및 목표

고정밀도 교사로부터 높은 정확도를 활용하여 압축된 학생 모델을 개선한다.
증류와 가중치 양자화를 결합하여 깊이와 너비 모두를 줄이는 동시 축소를 달성한다.
CNN, RNN, 번역 작업에서 방법의 일반성과 실용적 이득을 입증한다.
표준 벤치마크에서 정확도 유지 while 압축 및 속도향상을 정량화한다.

제안 방법

가중치 양자화를 스케일링, 버켓팅, 그리고 균일/비균일 스킴 모두를 정의한다.
양자화된 가중치를 사용하는 증류 손실로 학생을 훈련하는 quantized distillation을 도입한다.
SGD를 통해 양자화 함수 뒤로 역전파하여 양자화 포인트 p를 학습하는 differentiable quantization를 개발한다.
ResNet 계열의 CNN, Wide ResNets, OpenNMT의 LSTM, WMT 번역 설정에 방법을 적용한다.
버켓화 및 Huffman 인코딩 표현을 포함하여 압축 이점, 저장소 및 추론 속도 향상을 분석한다.

실험 결과

연구 질문

RQ1증류와 양자화를 결합하면 자원 제약 환경에 적합한 고정밀도의 압축 모델을 얻을 수 있는가?
RQ2양자화된 증류와 차분 가능 양자화가 시각 및 언어 작업 전반에서 정확도, 수렴성, 효율성에서 어떻게 비교되는가?
RQ3비트 폭, 버켓 크기, 구조가 압축-정확도 트레이드오프에 어떤 영향을 미치는가?
RQ4양자화 모델 훈련 시 증류 손실이 표준 손실보다 우수한가?
RQ5이 방법들이 대규모 데이터셋과 구조로 확장될 수 있는가(예: ImageNet, WMT)?

주요 결과

양자화된 얕은 학생들이 완전 정밀도 교사 정확도에 근접하면서 최대한의 압축은 한 자릿수 배까지도 달성할 수 있다.
양자화된 증류는 작업 간 2 비트 및 4 비트 설정에서 후처리 양자화(post-mortem quantization)와 differentiable quantization보다 종종 우수하게 작동한다.
ImageNet에서 4-bit 양자화된, 증류된 2xResNet18은 교사인 ResNet34와 유사한 정확도에 도달하면서도 더 작고 빠르다.
CIFAR-10에서 differentiable quantization과 quantized distillation은 4비트에서 거의 교사 수준의 정확도를 보이며, 증류 손실을 사용할수록 더 큰 이점을 얻는다.
OpenNMT 및 WMT 실험에서 증류는 크기를 줄인 상태에서도 BLEU와 perplexity를 교사 수준에 가깝게 유지하는 데 도움을 준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.