Skip to main content
QUICK REVIEW

[論文レビュー] BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

Valentin Bazarevsky, Yury Kartynnik|arXiv (Cornell University)|Jul 11, 2019
Face recognition and analysis参考文献 7被引用数 248
ひとこと要約

BlazeFaceはモバイルGPU推論に最適化された軽量顔検出器を提示し、GPUに優しいSSD風アンカースキームと新規結合解決法により旗艦デバイスで200–1000+ FPSを達成します。ARパイプラインの回転対応クロップのため6つの顔部位点を提供します。

ABSTRACT

We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression.

研究の動機と目的

  • Develop a compact, GPU-friendly face detector for mobile devices optimized for AR pipelines.
  • Increase inference speed while maintaining high face detection accuracy.
  • Introduce architectural tweaks (anchors, tie resolution) to suit mobile GPUs and reduce jitter in video streams.
  • Enable rotation-aware facial crops via 6 keypoints to improve downstream tasks.

提案手法

  • Design a lightweight feature extractor inspired by MobileNetV1/V2 but tailored for fast detection.
  • Introduce a GPU-friendly anchor scheme that stops at 8x8 feature maps with 6 anchors per pixel at 8x8.
  • Propose a tie-resolution strategy alternative to non-maximum suppression to stabilize overlapping predictions.
  • Produce six facial keypoints (eye centers, ear tragions, mouth center, nose tip) for rotation estimation.
  • Maintain a full-resolution 8x8 feature map to reduce anchor overlap-induced jitter and enable smoother temporal predictions.

実験結果

リサーチクエスチョン

  • RQ1Can a compact CNN backbone and GPU-friendly anchors deliver real-time face detection on mobile GPUs?
  • RQ2Does a novel tie-resolution method improve stability over traditional NMS in dense anchor scenarios?
  • RQ3How does BlazeFace's accuracy and latency compare to MobileNetV2-SSD on mobile GPUs?
  • RQ4Can additional facial keypoints enable rotation-aware crops to improve downstream AR tasks?

主な発見

  • BlazeFace achieves 98.61% average precision on frontal faces with 0.6 ms inference time on iPhone XS using TensorFlow Lite GPU in FP16.
  • MobileNetV2-SSD achieves 97.95% AP with 2.1 ms inference time under the same framework.
  • Across devices, BlazeFace significantly outperforms MobileNetV2-SSD in inference speed (e.g., iPhone XS: 0.6 ms vs 2.1 ms).
  • The proposed tie-resolution strategy reduces temporal jitter by up to 40% on frontal and 30% on rear camera datasets.
  • Regression parameter error for BlazeFace is 10.4% of inter-ocular distance (versus 7.4% for MobileNetV2-SSD), with a 5.3% jitter metric.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。