QUICK REVIEW

[논문 리뷰] FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

Yonggan Fu, Haoran You|arXiv (Cornell University)|2020. 12. 24.

Advanced Neural Network Applications인용 수 30

한 줄 요약

FracTrain은 진행적 분수 양자화와 입력 적응형 동적 분수 양자화를 결합하여 여러 모델과 데이터셋에서 시연된 바와 같이 정확도를 유지하면서 DNN 학습 비용을 줄입니다.

ABSTRACT

Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs. As reducing precision is one of the most effective knobs for boosting training time/energy efficiency, there has been a growing interest in low-precision DNN training. In this paper, we explore from an orthogonal direction: how to fractionally squeeze out more training cost savings from the most redundant bit level, progressively along the training trajectory and dynamically per input. Specifically, we propose FracTrain that integrates (i) progressive fractional quantization which gradually increases the precision of activations, weights, and gradients that will not reach the precision of SOTA static quantized DNN training until the final training stage, and (ii) dynamic fractional quantization which assigns precisions to both the activations and gradients of each layer in an input-adaptive manner, for only "fractionally" updating layer parameters. Extensive simulations and ablation studies (six models, four datasets, and three training settings including standard, adaptation, and fine-tuning) validate the effectiveness of FracTrain in reducing computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%~+1.87%) accuracy. For example, when training ResNet-74 on CIFAR-10, FracTrain achieves 77.6% and 53.5% computational cost and training latency savings, respectively, compared with the best SOTA baseline, while achieving a comparable (-0.07%) accuracy. Our codes are available at: https://github.com/RICE-EIC/FracTrain.

연구 동기 및 목표

제한된 자원에서 기기나 엣지 DNN 학습을 촉진한다.
학습 궤적 및 입력별로 적응하는 정적이지 않은 학습 시점 양자화 전략을 개발한다.
학습 중 점진적으로 정밀도를 높이는 Progressive Fractional Quantization (PFQ)을 제안한다.
가벼운 게이팅을 사용해 입력별로 계층 정밀도를 적응시키는 Dynamic Fractional Quantization (DFQ)을 제안한다.
PFQ와 DFQ를 하나의 FracTrain 프레임워크로 통합하고 학습 비용 절감 및 정확도를 평가한다.

제안 방법

4단계 정밀도 스케줄과 에포크 간 손실 변화 기반 지표를 사용하여 점진적으로 정밀도를 높이는 PFQ를 도입한다.
레이어별 게이팅 네트워크를 가진 DFQ를 도입하여 소프트 중간 변형을 통해 비트 정밀도 간 선택과 비용 인식 학습 목표를 제공한다.
PFQ와 DFQ를 결합하여 FracTrain 목표를 정의한다; PFQ는 시간적 정밀도 진행을 제어하고 DFQ는 공간적, 입력-적응 정밀도를 처리한다.
레이어 계산을 게이트된 저-비트 합성곱의 합과 스킵 연결로 모델링하여 분수 업데이트를 실현한다.
비용 인식 손실 항 cp(W_base, W_G) 를 사용하고 가중 매개변수의 부호를 조정하여 목표 학습 비용 cp를 겨냥한다.
여섯 모델(ResNet-18/34/38/74, MobileNetV2, Transformer-base)에서 CIFAR-10/100, ImageNet, 및 WikiText-103에 대해 평가한다.

실험 결과

연구 질문

RQ1학습 중에 점진적으로 정밀도를 증가시키는(PFQ) 방식이 정확도를 희생하지 않으면서 더 낮은 학습 비용을 달성할 수 있는가?
RQ2입력-적응형의 레이어별 정밀도 선택(DFQ)이 PFQ를 넘어 학습 비용을 더욱 줄일 수 있는가?
RQ3다양한 모델, 데이터셋 및 작업 전반에 걸친 시간적 및 공간적 부분 양자화(FracTrain)의 결합 이점은 무엇인가?
RQ4FracTrain이 최신 정적 저정밀 학습 베이스라인과 정확도 및 학습 비용 측면에서 어떻게 비교되는가?

주요 결과

FracTrain은 여러 모델과 데이터셋에서 상당한 학습 비용 절감을 달성하고 종종 동등하거나 더 나은 정확도를 보인다.
PFQ는 ResNet-38/74 및 CIFAR-10/100에서 SBM과 비교해 학습 비용을 일관되게 줄이고 정확도는 유지하거나 약간 향상.
DFQ는 SBM에 비해 계산 비용을 줄이면서 정확도를 유지하거나 향상시키며 선택적 레이어 업데이트 방법보다 우수하다.
FracTrain (PFQ+DFQ)는 MACs를 상당히 감소시키고 에너지 및 지연과 같은 하드웨어 지표에서도 비슷한 정확도로 큰 감소를 달성한다.
ImageNet과 WikiText-103에서 PFQ는 각각 약 21%와 44%의 비용 절감을 달성하면서 정확도/ perplexity를 유지하거나 개선한다.
CIFAR-100의 적응 및 파인튜닝 시나리오에서도 FracTrain은 정확도를 유지하거나 약간 향상시키면서 MACs를 크게 줄인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.