QUICK REVIEW

[論文レビュー] BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

Valentin Bazarevsky, Yury Kartynnik|arXiv (Cornell University)|Jul 11, 2019

Face recognition and analysis参考文献 7被引用数 248

ひとこと要約

BlazeFaceはモバイルGPU推論に最適化された軽量顔検出器を提示し、GPUに優しいSSD風アンカースキームと新規結合解決法により旗艦デバイスで200–1000+ FPSを達成します。ARパイプラインの回転対応クロップのため6つの顔部位点を提供します。

ABSTRACT

We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression.

研究の動機と目的

Develop a compact, GPU-friendly face detector for mobile devices optimized for AR pipelines.
Increase inference speed while maintaining high face detection accuracy.
Introduce architectural tweaks (anchors, tie resolution) to suit mobile GPUs and reduce jitter in video streams.
Enable rotation-aware facial crops via 6 keypoints to improve downstream tasks.

提案手法

Design a lightweight feature extractor inspired by MobileNetV1/V2 but tailored for fast detection.
Introduce a GPU-friendly anchor scheme that stops at 8x8 feature maps with 6 anchors per pixel at 8x8.
Propose a tie-resolution strategy alternative to non-maximum suppression to stabilize overlapping predictions.
Produce six facial keypoints (eye centers, ear tragions, mouth center, nose tip) for rotation estimation.
Maintain a full-resolution 8x8 feature map to reduce anchor overlap-induced jitter and enable smoother temporal predictions.

実験結果

リサーチクエスチョン

RQ1Can a compact CNN backbone and GPU-friendly anchors deliver real-time face detection on mobile GPUs?
RQ2Does a novel tie-resolution method improve stability over traditional NMS in dense anchor scenarios?
RQ3How does BlazeFace's accuracy and latency compare to MobileNetV2-SSD on mobile GPUs?
RQ4Can additional facial keypoints enable rotation-aware crops to improve downstream AR tasks?

主な発見

BlazeFace achieves 98.61% average precision on frontal faces with 0.6 ms inference time on iPhone XS using TensorFlow Lite GPU in FP16.
MobileNetV2-SSD achieves 97.95% AP with 2.1 ms inference time under the same framework.
Across devices, BlazeFace significantly outperforms MobileNetV2-SSD in inference speed (e.g., iPhone XS: 0.6 ms vs 2.1 ms).
The proposed tie-resolution strategy reduces temporal jitter by up to 40% on frontal and 30% on rear camera datasets.
Regression parameter error for BlazeFace is 10.4% of inter-ocular distance (versus 7.4% for MobileNetV2-SSD), with a 5.3% jitter metric.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。