QUICK REVIEW

[논문 리뷰] Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

Philipp Gysel|arXiv (Cornell University)|2016. 05. 20.

Advanced Neural Network Applications참고 문헌 38인용 수 101

한 줄 요약

Ristretto는 CNN 압축의 비트 폭을 줄여 가중치와 활성화 값의 비트 폭을 낮추어 하드웨어 산술을 시뮬레이션하는 빠른 GPU 가속 프레임워크로, 덧셈기 전용 또는 저비트폭 구현을 가능하게 하고 정확도 유지를 위한 미세 조정.

ABSTRACT

Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. To enable embedded devices such as smartphones, Google glasses and monitoring cameras with the astonishing power of deep learning, dedicated hardware accelerators can be used to decrease both execution time and power consumption. In applications where fast connection to the cloud is not guaranteed or where privacy is important, computation needs to be done locally. Many hardware accelerators for deep neural networks have been proposed recently. A first important step of accelerator design is hardware-oriented approximation of deep networks, which enables energy-efficient inference. We present Ristretto, a fast and automated framework for CNN approximation. Ristretto simulates the hardware arithmetic of a custom hardware accelerator. The framework reduces the bit-width of network parameters and outputs of resource-intense layers, which reduces the chip area for multiplication units significantly. Alternatively, Ristretto can remove the need for multipliers altogether, resulting in an adder-only arithmetic. The tool fine-tunes trimmed networks to achieve high classification accuracy. Since training of deep neural networks can be time-consuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network. Given a maximum tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.

연구 동기 및 목표

임베디드 디바이스에서 압축으로 디컴프레션 복잡성을 추가하지 않고 에너지 효율적인 신경망 추론을 가능하게 하는 동기를 제시.
Ristretto 프레임워크를 도입하여 하드웨어 산술을 시뮬레이션하고 가중치 및 활성화의 비트 폭 축소를 탐구한다.
CaffeNet 및 SqueezeNet와 같은 CNN이 1% 허용 범위에서 최소한의 정확도 손실로 8비트로 축소될 수 있음을 보여준다.

제안 방법

레이어 입력, 가중치 및 출력 값을 축소 정밀도로 양자화하여 커스텀 가속기의 하드웨어 산술을 시뮬레이션한다.
비트 폭을 조정하고 누적에 덧셈기 트리 사용하여 고정 소수점과 덧셈기 전용 산술 시나리오를 모두 지원한다.
추론에는 round-nearest-even; 파인 튜닝에는 확률적 반올림을 사용하여 양자화 오차를 관리한다.
업데이트 중 확률적 반올림을 사용하고 그림자 전체 정밀도 가중치를 사용하여 이산 매개변수 공간에서 양자화된 네트워크를 미세 조정한다.
네트워크 구조를 변경하거나 디컴프레션 오버헤드를 도입하지 않고 GPU 최적화 루틴을 활용하여 네트워크를 빠르게 압축한다.

실험 결과

연구 질문

RQ1주어진 허용 오차 하에서 CNN 매개변수 및 활성화의 수치 정밀도를 감소시키는 것이 분류 정확도에 어떻게 영향을 미치는가?
RQ2CaffeNet 및 SqueezeNet과 같은 CNN을 정확도 손실이 1%를 넘지 않는 범위에서 8비트 표현으로 압축할 수 있는가?
RQ3추론 중 및 이산 매개변수 공간에서의 미세 조정 동안 어떤 반올림 전략이 정확도를 가장 잘 보존하는가?
RQ4CNN 가속기에서 memory footprint와 곱셈기 사용에 대한 하드웨어 지향 근사화의 실용적 시사점은 무엇인가?

주요 결과

Ristretto는 CaffeNet과 SqueezeNet를 1% 허용 오차 범위 내에서 8비트 표현으로 축소할 수 있다.
양자화와 이산 매개변수 공간에서의 미세 조정은 공격적인 비트 폭 축소 이후 정확도 회복에 도움을 준다.
결정론적 추론 양자화에는 round-nearest-even이 사용되며, 이산 공간의 미세 조정에는 확률적 반올림이 도움을 준다.
이 프레임워크는 디컴프레션 오버헤드를 도입하지 않고 하드웨어 산술을 시뮬레이션함으로써 메모리 점유율과 곱셈기 면적을 감소시키는 것을 목표로 한다.
하드웨어 경로에서 비트 폭과 누적 정밀도를 조정하여 덧셈기 전용 산술이 가능하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.