QUICK REVIEW

[論文レビュー] YOLO5Face: Why Reinventing a Face Detector

Delong Qi, Weijun Tan|arXiv (Cornell University)|May 27, 2021

Face recognition and analysis参考文献 47被引用数 39

ひとこと要約

この論文は顔検出を一般的な物体検出として扱い、YOLOv5を適応してYOLO5Faceを作成し、ランドマーク回帰や様々なバックボーンを追加してWiderFaceで最先端の性能を達成、モバイル対応モデルも含む。

ABSTRACT

Tremendous progress has been made on face detection in recent years using convolutional neural networks. While many face detectors use designs designated for detecting faces, we treat face detection as a generic object detection task. We implement a face detector based on the YOLOv5 object detector and call it YOLO5Face. We make a few key modifications to the YOLOv5 and optimize it for face detection. These modifications include adding a five-point landmark regression head, using a stem block at the input of the backbone, using smaller-size kernels in the SPP, and adding a P6 output in the PAN block. We design detectors of different model sizes, from an extra-large model to achieve the best performance to a super small model for real-time detection on an embedded or mobile device. Experiment results on the WiderFace dataset show that on VGA images, our face detectors can achieve state-of-the-art performance in almost all the Easy, Medium, and Hard subsets, exceeding the more complex designated face detectors. The code is available at \url{https://github.com/deepcam-cn/yolov5-face}

研究の動機と目的

顔検出を一般的な物体検出タスクとして再定義し、アーキテクチャを再発明せず標準検出器を活用する。
ランドマーク回帰と各展開ニーズに合わせた部品を備えた、YOLOv5ベースの顔検出器ファミリを開発する。
アーキテクチャの変更と学習戦略を通じて、小さい顔と大きい顔の検出性能を向上させる。
WiderFaceベンチマークおよびクロスドメインデータセットで性能を評価し、サブセット全体で最先端の結果を確立する。

提案手法

Wing lossを用いた五点ランドマーク回帰ヘッドを追加して、YOLOv5をYOLO5Faceに再設計する。
一般化を向上させ、計算量を削減するためにFocusレイヤをStemブロックに置換する。
小顔検出を強化するため、7x7,5x5,3x3の小カーネルのSPPブロックを使用する。
大きい顔の検出を強化するためにP6出力ブロック（ストライド64）を追加する。
ShuffleNetV2を基にした2つの軽量バックボーンを導入し、組み込み機器向けの超コンパクトモデルを作成する。
VGA-res入力で学習し、長辺を640にスケールし、短辺を最大のSPPストライドに合わせ、データ拡張のアブレーション（上下反転を除外；Mosaicは変化）とランドマーク監督の影響を評価する。

実験結果

リサーチクエスチョン

RQ1専用の顔特化アーキテクチャを用いず、一般的な物体検出器フレームワークを用いて顔検出を効果的に達成できるか？
RQ2ランドマーク回帰、Stemブロック、より小さなSPPカーネル、P6ヘッドといった変更は、Easy/Medium/Hard全体でWiderFaceのmAPを向上させるか？
RQ3モバイル・組み込み向けバックボーン（ShuffleNetV2）は、計算量を大幅に削減しつつ競争力のある精度を提供するか？
RQ4データ拡張の選択（上下反転の除外、Mosaicなど）は顔検出器の性能にどう影響するか？
RQ5ランドマークベースの監督とアライメントは、下流の顔認識ベンチマークを改善するか？

主な発見

YOLO5FaceはLargeモデル（例：YOLOv5x6）でWiderFaceのEasy、Medium、Hardサブセットにおいて最先端のmAPを達成。
SPPの小カーネル（7x7,5x5,3x3）の導入により、Easy、Medium、Hard全体で顕著なmAP向上（0.9%、1.49%、1.41%）を得た。
P6出力ブロックの追加は、EasyとMediumで約1%ずつmAPを改善し、Hardでわずかな低下をもたらす。
StemブロックはEasyで最大0.57%、Mediumで0.33%、Hardで0.23%のmAP改善をもたらす。
ShuffleNetV2ベースのバックボーン2つは、超コンパクトな検出器（YOLOv5n, YOLOv5n0.5）を実現し、組み込み機器で競争力のある性能を提供する。
WiderFaceの検証で、YOLOv5x6-FaceはEasy 96.9%、Medium 96.0%、Hard 91.6%を達成；テストでは95.8%、94.9%、90.5%。
YOLO5Faceの派生モデルは、ランドマーク監督を用いたWebfaceで顔認識タスクにおいてRetinaFaceを上回ることがある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。