QUICK REVIEW

[論文レビュー] PyramidBox: A Context-assisted Single Shot Face Detector

Xu Tang, Daniel K. Du|arXiv (Cornell University)|Mar 21, 2018

Face recognition and analysis参考文献 36被引用数 48

ひとこと要約

PyramidBoxはPyramidAnchors、LFPN、コンテキスト認識予測を用いた文脈支援の単一ショット顔検出を導入し、難易度の高い顔検出を改善します。FDDBとWIDER FACEで最先端の性能を達成。

ABSTRACT

Face detection has been well studied for many years and one of remaining challenges is to detect small, blurred and partially occluded faces in uncontrolled environment. This paper proposes a novel context-assisted single shot face detector, named \emph{PyramidBox} to handle the hard face detection problem. Observing the importance of the context, we improve the utilization of contextual information in the following three aspects. First, we design a novel context anchor to supervise high-level contextual feature learning by a semi-supervised method, which we call it PyramidAnchors. Second, we propose the Low-level Feature Pyramid Network to combine adequate high-level context semantic feature and Low-level facial feature together, which also allows the PyramidBox to predict faces of all scales in a single shot. Third, we introduce a context-sensitive structure to increase the capacity of prediction network to improve the final accuracy of output. In addition, we use the method of Data-anchor-sampling to augment the training samples across different scales, which increases the diversity of training data for smaller faces. By exploiting the value of context, PyramidBox achieves superior performance among the state-of-the-art over the two common face detection benchmarks, FDDB and WIDER FACE. Our code is available in PaddlePaddle: \href{https://github.com/PaddlePaddle/models/tree/develop/fluid/face_detection}{\url{https://github.com/PaddlePaddle/models/tree/develop/fluid/face_detection}}.

研究の動機と目的

制約のない環境下で、小さく、ぼやけた、遮蔽された顔の頑健な検出を動機づける。
顔の位置推定と分類を助けるために、頭部・肩・体といった文脈情報を活用する。
低レベルの高解像度特徴と高レベルの意味特徴を融合してマルチスケール検出を行うアーキテクチャを開発する。
追加ラベリングなしで文脈特徴学習を監督する半教師付きPyramidAnchorsを導入する。
小顔の多様性を向上させるため、スケール認識データ拡張を用いて学習データを増強する。

提案手法

複数スケールで顔・頭部・身体の文脈特徴学習を監督するPyramidAnchorsを導入する。
高レベルの文脈と低レベルの顔特徴を統合するLow-level Feature Pyramid Network (LFPN)を開発し、単一ショットのマルチスケール検出を実現する。
局所化と分類を強化するために、広い/深いネットワークとmax-in-out層を備えたContext-sensitive Prediction Module (CPM)を設計する。
Data-anchor-samplingを組み込み、学習データ分布を再形成し、小顔の多様性を高める。
PyramidBox Lossを提案し、Pyramid Anchors全体で顔・頭部・身体の予測を共同で監督する。

実験結果

リサーチクエスチョン

RQ1顔の周囲の文脈情報（頭部・肩・体）をどのように活用して、難しい小さな顔や遮蔽された顔の検出を改善できるか。
RQ2低レベルのFeature Pyramid Network (LFPN)を統合することで、トップダウンの高レベル特徴のみの場合と比べて小さな顔の性能が向上するか。
RQ3PyramidAnchorsと半教師付き文脈ラベリングが、easy/medium/hardのサブセット全体の検出精度に与える影響はどのようか。
RQ4max-in-outを備えたコンテキスト認識予測モジュールは、局所化と分類の精度を両方高めることができるか。
RQ5data-anchor-samplingは、小顔検出を改善するために学習データを効果的に多様化するか。

主な発見

LFPNを中間層（conv7）から開始してLFPNを適用すると、ベースラインと比較してhardサブセットのmAPが高くなる（86.1）、小さな顔に対するLFPNの有効性を示す。
Data-anchor-samplingはeasy/medium/hardサブセット全体のmAPを難しいケースで0.4–0.6ポイント改善する。
複数のピラミッドレベル（顔・頭・身体）を持つPyramidAnchorsは、ベースラインより顕著な改善をもたらす（hard mAPが84.2から85.1へ増加）。
Context-sensitive Prediction Module (CPM) は easy/medium/hard mAPでDSSDおよびSSH風モジュールを上回り、ある比較でCPMがそれぞれ95.6/94.5/88.5を達成。
Max-in-outはサブセット全体で追加の利得（約0.1–0.3 mAPポイント）をもたらす。
提案された全コンポーネントを組み合わせると、PyramidBoxはWIDER FACEの検証/テストセットにおいてeasy（95.5–96.1）、medium（94.7–95.0）、hard（88.8–88.9）のサブセットで顕著なmAP向上を達成し、最先端性能に近づく。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。