QUICK REVIEW

[논문 리뷰] VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer

Mengshu Sun, Haoyu Ma|arXiv (Cornell University)|2022. 01. 17.

Advanced Image and Video Retrieval Techniques인용 수 28

한 줄 요약

VAQF가 비트-가중치, 저정밀 ViT를 위한 FPGA 기반 ViT 가속기를 자동으로 설계하고, 컴파일 지침 기반 양자화 전략으로 실시간 FPS 목표를 달성한다.

ABSTRACT

The transformer architectures with attention mechanisms have obtained success in Nature Language Processing (NLP), and Vision Transformers (ViTs) have recently extended the application domains to various vision tasks. While achieving high performance, ViTs suffer from large model size and high computation complexity that hinders the deployment of them on edge devices. To achieve high throughput on hardware and preserve the model accuracy simultaneously, we propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized ViTs with binary weights and low-precision activations. Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations as well as the optimized parameter settings of the accelerator that fulfill the hardware requirements. The implementations are developed with Vivado High-Level Synthesis (HLS) on the Xilinx ZCU102 FPGA board, and the evaluation results with the DeiT-base model indicate that a frame rate requirement of 24 frames per second (FPS) is satisfied with 8-bit activation quantization, and a target of 30 FPS is met with 6-bit activation quantization. To the best of our knowledge, this is the first time quantization has been incorporated into ViT acceleration on FPGAs with the help of a fully automatic framework to guide the quantization strategy on the software side and the accelerator implementations on the hardware side given the target frame rate. Very small compilation time cost is incurred compared with quantization training, and the generated accelerators show the capability of achieving real-time execution for state-of-the-art ViT models on FPGAs.

연구 동기 및 목표

edge 디바이스에서 모델 크기와 계산량을 양자화를 통해 줄여 비전 트랜스포머의 효율적 배치를 촉진한다.
타깃 프레임레이트를 meeting하기 위해 activation 정밀도와 가속기 설정을 출력하는 완전 자동 프레임워크를 제안한다.
정확도와 처리량의 균형을 맞추기 위해 이진 가중치와 저정밀 활성화를 통합한다.
Vivado HLS를 이용한 Xilinx 보드에서 FPGA 기반 ViT 가속을 시연한다.

제안 방법

ViT 구조와 목표 FPS를 받아 활성화 정밀도를 컴파일 단계에서 결정하는 VAQF 흐름을 도입한다.
하드웨어 가능성에 의해 이진 가중치와 저정밀 활성화로 ViT 가중치를 양자화한다.
이진 가중치를 위한 루프 타일링, 데이터 패킹 및 LUT 기반 산술을 갖춘 FC 및 멀티헤드 어텐션용 컴퓨트 엔진을 개발한다.
BRAM/DSP/LUT 제약하에서 최대 처리량을 달성하기 위한 층별 최적화 및 FPGA 데이터패스 설계를 구현한다.
FPS 목표를 만족시키기 위해 활성화 정밀도에 대해 이진 탐색을 수행하고 해당 가속기 파라미터를 생성한다.
DeiT-base를 사용해 ZCU102에서 Vivado HLS로 구현 평가를 수행하고 24 FPS(8-bit 활성화) 및 30 FPS(6-bit 활성화)에서 요구 FPS를 충족함을 보고한다.

실험 결과

연구 질문

RQ1정량화된 ViT가 FPGA에서 특정 프레임률을 달성하는 데 필요한 활성화 정밀도를 완전 자동 프레임워크가 결정할 수 있는가?
RQ2이진 가중치와 저정밀 활성화로의 실시간 ViT 추론을 FPGA 플랫폼에서 가능하게 하는 하드웨어-소프트웨어 공동 설계 전략은 무엇인가?
RQ3VAQF가 ViT 모델의 서로 다른 활성화 정밀도에서 정확도와 처리량의 균형을 어떻게 달성하는가?
RQ4데이터 패킹, 타일링 및 LUT 기반 계산이 양자화된 ViT의 FPGA 자원 활용에 어떤 영향을 미치는가?

주요 결과

Binary-weights ViT with full-precision activations achieves 79.5% top-1 accuracy on ImageNet-1K (validation) with a 2.3% drop from the full-precision 81.8% model.
8-bit activations maintain 77.6% accuracy, enabling 24 FPS on the target FPGA board.
6-bit activations achieve 76.5% accuracy, enabling 30 FPS on the target FPGA board.
VAQF의 컴파일 단계는 활성화 정밀도와 가속기 설정을 빠르게 결정하며, 일반적인 양자화 학습 시간보다 훨씬 짧은 분에서 시간에 걸쳐 수행된다.
FPGA 가속기는 이진 가중치에 대해 LUT 기반 연산을 사용하고, 데이터 패킹과 타일링으로 처리량을 극대화하고 BRAM 사용을 감소시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.