QUICK REVIEW

[論文レビュー] Model Compression Methods for YOLOv5: A Review

Mohammad Jani, Jamil Fayyad|arXiv (Cornell University)|Jul 21, 2023

Advanced Neural Network Applications被引用数 11

ひとこと要約

本論文はYOLOv5に適用された剪定と量子化手法を調査し、実装上の実用的な結果を分析し、エッジ展開のギャップと今後の方向性を特定する。

ABSTRACT

Over the past few years, extensive research has been devoted to enhancing YOLO object detectors. Since its introduction, eight major versions of YOLO have been introduced with the purpose of improving its accuracy and efficiency. While the evident merits of YOLO have yielded to its extensive use in many areas, deploying it on resource-limited devices poses challenges. To address this issue, various neural network compression methods have been developed, which fall under three main categories, namely network pruning, quantization, and knowledge distillation. The fruitful outcomes of utilizing model compression methods, such as lowering memory usage and inference time, make them favorable, if not necessary, for deploying large neural networks on hardware-constrained edge devices. In this review paper, our focus is on pruning and quantization due to their comparative modularity. We categorize them and analyze the practical results of applying those methods to YOLOv5. By doing so, we identify gaps in adapting pruning and quantization for compressing YOLOv5, and provide future directions in this area for further exploration. Among several versions of YOLO, we specifically choose YOLOv5 for its excellent trade-off between recency and popularity in literature. This is the first specific review paper that surveys pruning and quantization methods from an implementation point of view on YOLOv5. Our study is also extendable to newer versions of YOLO as implementing them on resource-limited devices poses the same challenges that persist even today. This paper targets those interested in the practical deployment of model compression methods on YOLOv5, and in exploring different compression techniques that can be used for subsequent versions of YOLO.

研究の動機と目的

resource-limited エッジデバイスでのモデルサイズと推論時間を削減することによってYOLOv5のデプロイを動機付ける。
YOLOv5で実世界の設定で使用された剪定と量子化技術を分類・分析する。
メモリ、FLOPs、速度（FPS）、精度指標を横断して実用的な結果を比較し、ギャップを特定する。
剪定と量子化を新しいYOLOバージョンへ適用する際の残された課題と方向性を提案する。

提案手法

YOLOv5の実用的な実装に重点を置いた剪定と量子化手法のレビュー。
剪定に使用される顕著性基準を、L1/L2ノルム、特徴マップの活性化、BNスケーリングファクター、一次導関数、相互情報量を含めて論じる。
構造とハードウェアへの影響を説明しつつ、非構造的・チャネルベース・フィルタベース・カーネルベースの剪定粒度を区別する。
量子化の概念を、均一 vs 非均一、静的 vs 動的レンジ、QAT vs PTQ、デプロイメントスキーム（Fake量子化 vs 整数のみ量子化）を含めて説明する。
これらの手法をYOLOv5に適用した最近の研究の実験結果を要約し、剪定粒度と量子化スキームごとに分類する。

Figure 1 : YOLO release timeline. YOLOv5 and YOLOv6 have ten and six released variants, respectively.

実験結果

リサーチクエスチョン

RQ1YOLOv5のサイズと潜在的な精度を犠牲にすることなく、どの剪定戦略が最も効果的にサイズと遅延を削減しているか。
RQ2BNスケーリングファクターベースの剪定と他の顕在性基準がYOLOv5で実践的にどう比較されるか。
RQ3どの量子化スキーム（QAT vs PTQ、静的 vs 動的レンジ）がYOLOv5の精度を最も保持しつつエッジハードウェアでのデプロイを可能にするか。
RQ4YOLOv5および新しいYOLOバージョンへの剪定と量子化の適用における特定されたギャップと今後の方向性は何か。

主な発見

BNスケールファクター基準（BNSF）を用いたチャネルベースの剪定が、YOLOv5剪定研究で優勢な顕在性基準である（約60%）。
多くの剪定研究は反復剪定とファインチューニングを用いて精度を回復させ、パラメータ数、サイズ、FLOPsの削減を報告し、場合によりFPSの低下を伴う。
量子化研究は、YOLOv5で最小限の精度損失で3ビット精度まで落とすQATを示す一方、PTQは大きな精度低下なしには8ビット未満には到達しないことが多い。
主な総括から外れた展開戦略（TensorRT、PyTorch量子化、ONNXなど）が多く、新しい量子化手法に焦点を当てている。
いくつかの論文はアーキテクチャの変更（MobileNetV3バックボーン、注意機構など）を剪定と組み合わせて、NVIDIA Jetson Xavier NXやRaspberry Piのようなデバイスでのエッジ展開を可能にしている。

Figure 2 : YOLOv5l architecture. SPPF represents a computation-efficient version of the Spatial Pyramid Pooling, which was originally implemented in YOLOv3; C3 uses the new CSP-combined module whose details are illustrated in Figure 3 .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。