QUICK REVIEW

[논문 리뷰] PACT: Parameterized Clipping Activation for Quantized Neural Networks

Jungwook Choi, Zhuo Wang|arXiv (Cornell University)|2018. 05. 16.

Model Reduction and Neural Networks참고 문헌 19인용 수 719

한 줄 요약

PACT는 학습 중 양자화를 활성화하기 위한 학습 가능 클리핑 매개변수 α를 도입하여 4비트 가중치 및 활성화를 거의 풀정밀도 수준의 정확도로 가능하게 하며 하드웨어 효율성을 달성한다.

ABSTRACT

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training - that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $α$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories.

연구 동기 및 목표

CNN의 계산 및 저장 비용을 활성화 양자화를 통해 학습 중에 감소시키는 것을 목표로 한다.
양자화 스케일을 최적화하기 위해 α라는 학습 가능한 활성 클리핑 매개변수를 도입한다.
여러 모델/데이터셋에 걸쳐 4비트 양자화 네트워크가 풀정밀도 정확도에 근접할 수 있음을 Demonstrate한다.
정합적인 하드웨어 영향 및 정밀도 감소로 인한 시스템 차원의 성능 이점을 분석한다.

제안 방법

ReLU를 매개변수화된 클리핑 활성화인 PACT로 대체하고 클립 값은 α이다.
클리핑 후 선형 양자화를 통해 잘린 활성화 y를 k비트로 양자화한다.
역전파를 통해 그라디언트를 직전전파 추정기(Straight-Through Estimator)로 계산하여 α를 학습한다.
α를 L2 항으로 규제하여 활성화 범위를 작게 만들고 양자화 오차를 줄인다.
하드웨어 복잡성을 줄이고 최종 출력 스케일링을 단순화하기 위해 레이어별 α를 공유한다.

실험 결과

연구 질문

RQ1학습 가능한 클리핑 매개변수로 양자화된 활성화가 매우 낮은 비트 수에서 정확도를 유지할 수 있는가?
RQ2학습 중에 α를 최적화하면 고정/클립된 활성화보다 더 나은 양자화 스케일이 얻어지는가?
RQ3다양한 CNN 아키텍처와 데이터셋에서 PACT를 사용할 때의 정확도와 하드웨어 트레이드오프는 무엇인가?
RQ4가중치와 활성화의 4비트 양자화가 상당한 정확도 손실 없이 가능한가?

주요 결과

PACT는 학습 가능한 클리핑 매개변수로 활성화 양자화를 가능하게 하여 정확도를 보존한다.
4비트 양자화된 CNN에서 PACT는 여러 아키텍처와 데이터셋에서 풀정밀도 네트워크와 유사한 정확도를 달성한다.
PACT는 AlexNet, ResNet18, ResNet50에서 낮은 비트 수의 정확도 저하 측면에서 기존 양자화 방식보다 우수하다.
4비트 가중치와 활성화의 공동 양자화에서 PACT를 사용하면 테스트된 네트워크 전반에서 거의 풀정밀도 성능을 얻을 수 있다.
시스템 수준 분석은 감소된 정밀도 사용 시 하드웨어 면적 대폭 감소와 대역폭 제약 하에서 잠재적인 비선형(초선형) 성능 향상을 나타낸다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.