QUICK REVIEW

[論文レビュー] VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer

Mengshu Sun, Haoyu Ma|arXiv (Cornell University)|Jan 17, 2022

Advanced Image and Video Retrieval Techniques被引用数 28

ひとこと要約

VAQFは FPGA ベースの ViT アクセラレータを自動設計し、バイナリウェイト・低精度 ViT に対応、コンパイルガイド量子化戦略でリアルタイムのFPS目標を達成。

ABSTRACT

The transformer architectures with attention mechanisms have obtained success in Nature Language Processing (NLP), and Vision Transformers (ViTs) have recently extended the application domains to various vision tasks. While achieving high performance, ViTs suffer from large model size and high computation complexity that hinders the deployment of them on edge devices. To achieve high throughput on hardware and preserve the model accuracy simultaneously, we propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized ViTs with binary weights and low-precision activations. Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations as well as the optimized parameter settings of the accelerator that fulfill the hardware requirements. The implementations are developed with Vivado High-Level Synthesis (HLS) on the Xilinx ZCU102 FPGA board, and the evaluation results with the DeiT-base model indicate that a frame rate requirement of 24 frames per second (FPS) is satisfied with 8-bit activation quantization, and a target of 30 FPS is met with 6-bit activation quantization. To the best of our knowledge, this is the first time quantization has been incorporated into ViT acceleration on FPGAs with the help of a fully automatic framework to guide the quantization strategy on the software side and the accelerator implementations on the hardware side given the target frame rate. Very small compilation time cost is incurred compared with quantization training, and the generated accelerators show the capability of achieving real-time execution for state-of-the-art ViT models on FPGAs.

研究の動機と目的

Vision Transformers の edge デバイスへの効率的な展開を、量子化によってモデルサイズと計算を削減することを目的とする。
ターゲットフレームレートを満たす activation precision と accelerator 設定を出力する完全自動フレームワークを提案する。
二値重みと低精度活性化を統合して精度とスループットのバランスを取る。
Vivado HLS を用いた Xilinx ボードでの FPGA ベースの ViT アクセラレーションを実証する。

提案手法

ViT 構造とターゲット FPS を受け取り、 compilation step で activation precision を決定する VAQF フローを導入する。
ハードウェアの実現可能性を踏まえ、ViT の重みを binary、活性化を低精度に量子化する。
Binary weights のためのループタイル化、データパック、および LUT ベースの演算で FC と multi-head attention の compute engine を開発する。
BRAM/DSP/LUT 制約の下でスループットを最大化するための層別最適化と FPGA データパス設計を実装する。
activation precision の二分探索を用いて FPS ターゲットを満たし、対応する accelerator パラメータを生成する。
DeiT-base を用いた ZCU102 で Vivado HLS による実装を評価し、24 FPS (8-bit activations) および 30 FPS (6-bit activations) の FPS 要件を満たしたことを報告する。

実験結果

リサーチクエスチョン

RQ1Quantized ViTs on FPGAs において、指定されたフレームレートを満たす activation precision を自動で決定できるのか？
RQ2バイナリウェイトと低精度活性化を用いたリアルタイム ViT 推論を FPGA プラットフォーム上で実現するための hardware-software co-design 戦略は何か？
RQ3VAQF は ViT モデルのさまざまな活性化精度において、精度とスループットのバランスをどう取るのか？
RQ4データ packing、 tiling、LUT ベース計算が quantized ViTs の FPGA リソース利用に与える影響は？

主な発見

Binary-weights ViT with full-precision activations achieves 79.5% top-1 accuracy on ImageNet-1K (validation) with a 2.3% drop from the full-precision 81.8% model.
8-bit activations maintain 77.6% accuracy, enabling 24 FPS on the target FPGA board.
6-bit activations achieve 76.5% accuracy, enabling 30 FPS on the target FPGA board.
VAQF's compilation step quickly determines activation precision and accelerator settings, requiring minutes to hours (much less than typical quantization training time).
The FPGA accelerator uses LUT-based computations for binary weights, with data packing and tiling to maximize throughput and reduce BRAM usage.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。