QUICK REVIEW

[논문 리뷰] Ternary Neural Networks with Fine-Grained Quantization

Naveen Mellempudi, Abhisek Kundu|arXiv (Cornell University)|2017. 05. 02.

Advanced Neural Network Applications참고 문헌 11인용 수 61

한 줄 요약

FGQ는 재학습 없이도 8/4-bit 활성화와 함께 사전 학습된 풀 정밀도 모델을 3진 가중치로 변환하고, 가중치 그룹을 사용해 정확도와 계산 감소를 균형 있게 조정하며, ImageNet에서 FP32에 근접한 정확도와 상당한 속도 향상을 달성합니다.

ABSTRACT

We propose a novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits. Using this method, we demonstrate a minimal loss in classification accuracy on state-of-the-art topologies without additional training. We provide an improved theoretical formulation that forms the basis for a higher quality solution using FGQ. Our method involves ternarizing the original weight tensor in groups of $N$ weights. Using $N=4$, we achieve Top-1 accuracy within $3.7\%$ and $4.2\%$ of the baseline full precision result for Resnet-101 and Resnet-50 respectively, while eliminating $75\%$ of all multiplications. These results enable a full 8/4-bit inference pipeline, with best-reported accuracy using ternary weights on ImageNet dataset, with a potential of $9 imes$ improvement in performance. Also, for smaller networks like AlexNet, FGQ achieves state-of-the-art results. We further study the impact of group size on both performance and accuracy. With a group size of $N=64$, we eliminate $\approx99\%$ of the multiplications; however, this introduces a noticeable drop in accuracy, which necessitates fine tuning the parameters at lower precision. We address this by fine-tuning Resnet-50 with 8-bit activations and ternary weights at $N=64$, improving the Top-1 accuracy to within $4\%$ of the full precision result with $<30\%$ additional training overhead. Our final quantized model can run on a full 8-bit compute pipeline using 2-bit weights and has the potential of up to $15 imes$ improvement in performance compared to baseline full-precision models.

연구 동기 및 목표

극히 낮은 정밀도 가중치와 활성화로 재학습 없이 또는 최소로만 재학습하여 거의 최첨단 추론을 가능하게 하고자 한다.
정보를 보존하기 위해 그룹으로 가중치를 3치화하는 FGQ(fine-grained quantization) 방법을 도입한다.
2w-8a 및 2w-4a를 사용하여 ResNet-101/ResNet-50 및 AlexNet에서 FGQ가 높은 Top-1 정확도를 달성함을 보인다.
정확도와 계산 절감의 차이를 좌우하는 그룹 크기의 영향과 8비트 컴퓨트 파이프라인에 대한 하드웨어 함의를 분석한다.

제안 방법

N 크기로 비겹치는 그룹으로 전체 정밀도 가중치 텐서를 3진화하여 각 그룹마다 독립적인 부분 문제를 생성한다.
각 그룹에 대해 alpha와 3진화된 가중치 벡터를 구해 ||W^(i) - alpha * W^^(i)||_F^2를 최소화한다( Eq. 2 ).
양의 가중치/음의 가중치를 위한 Delta_p와 Delta_n을 단일 alpha와 함께 사용하고 alpha*, Delta_p*, Delta_n*를 폐쇄형 해법이나 브루트포스 방법으로 구한다( Eq. 3–5 ).
그룹 내 동적 범위를 최소화하고 메모리 레이아웃 및 벡터화를 용이하게 하기 위해 입력 채널 차원을 따라 정적 그룹화 전략을 채택한다(Fig. 2).
활성화를 8/4 비트로 양자화하고 계산 중에 32비트 누산기를 적용하여 오버플로를 방지하며, 추론 중 분산 이동에 보정하기 위해 배치 정규화 통계를 재계산한다.
그룹 크기(N)를 다양하게 실험하여 정확도와 3진 누산 비율의 트레이드오프를 탐색한다(예: N=4는 75%의 3진 FPAs, N=64는 약 99%에 해당).

실험 결과

연구 질문

RQ1사전 학습된 풀 정밀도 네트워크를 재학습 없이도 최소한의 정확도 손실로 3진 가중치로 변환할 수 있는가?
RQ22w-8a/2w-4a 추론 파이프라인에서 정밀도 그룹화(N)가 정확도와 계산 절감에 어떤 영향을 미치는가?
RQ3레이어 간 가중치 분포를 보존하기 위한 최적의 그룹화 전략은 무엇이며 정확도를 극대화하는가?
RQ4FGQ가 재학습 없이 ResNet-101, ResNet-50, 및 AlexNet에서 최첨단 또는 거의 최첨단 정확도를 달성할 수 있는가?
RQ5전체 8비트 컴퓨트 파이프라인에 대한 FGQ의 실용적인 하드웨어 영향 및 성능 향상은 무엇인가?

주요 결과

FGQ with N=4 (FGQ-N4) achieves Top-1 accuracy of 73.85% on ResNet-101 with 2w-8a and 70.69% with 2w-4a on ImageNet, without re-training.
FGQ-N4 applied to ResNet-50 yields 70.76% Top-1 with 2w-8a and 68.38% with 2w-4a, near full-precision results.
FGQ-N4 applied to AlexNet yields 49.04% Top-1 with 2w-8a (no re-training), about 8% away from the baseline 56.83%.
Larger group sizes (e.g., N=64) can eliminate ~99% of multiplications but cause noticeable accuracy loss, which can be mitigated by limited low-precision retraining.
The approach enables a full 8-bit compute pipeline with 2-bit weights and up to 15x theoretical performance improvement over full-precision baselines.
Compared to closely related works, FGQ achieves competitive or superior accuracy without low-precision training for many configurations.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.