[論文レビュー] BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs
BlazeFaceはモバイルGPU推論に最適化された軽量顔検出器を提示し、GPUに優しいSSD風アンカースキームと新規結合解決法により旗艦デバイスで200–1000+ FPSを達成します。ARパイプラインの回転対応クロップのため6つの顔部位点を提供します。
We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression.
研究の動機と目的
- Develop a compact, GPU-friendly face detector for mobile devices optimized for AR pipelines.
- Increase inference speed while maintaining high face detection accuracy.
- Introduce architectural tweaks (anchors, tie resolution) to suit mobile GPUs and reduce jitter in video streams.
- Enable rotation-aware facial crops via 6 keypoints to improve downstream tasks.
提案手法
- Design a lightweight feature extractor inspired by MobileNetV1/V2 but tailored for fast detection.
- Introduce a GPU-friendly anchor scheme that stops at 8x8 feature maps with 6 anchors per pixel at 8x8.
- Propose a tie-resolution strategy alternative to non-maximum suppression to stabilize overlapping predictions.
- Produce six facial keypoints (eye centers, ear tragions, mouth center, nose tip) for rotation estimation.
- Maintain a full-resolution 8x8 feature map to reduce anchor overlap-induced jitter and enable smoother temporal predictions.
実験結果
リサーチクエスチョン
- RQ1Can a compact CNN backbone and GPU-friendly anchors deliver real-time face detection on mobile GPUs?
- RQ2Does a novel tie-resolution method improve stability over traditional NMS in dense anchor scenarios?
- RQ3How does BlazeFace's accuracy and latency compare to MobileNetV2-SSD on mobile GPUs?
- RQ4Can additional facial keypoints enable rotation-aware crops to improve downstream AR tasks?
主な発見
- BlazeFace achieves 98.61% average precision on frontal faces with 0.6 ms inference time on iPhone XS using TensorFlow Lite GPU in FP16.
- MobileNetV2-SSD achieves 97.95% AP with 2.1 ms inference time under the same framework.
- Across devices, BlazeFace significantly outperforms MobileNetV2-SSD in inference speed (e.g., iPhone XS: 0.6 ms vs 2.1 ms).
- The proposed tie-resolution strategy reduces temporal jitter by up to 40% on frontal and 30% on rear camera datasets.
- Regression parameter error for BlazeFace is 10.4% of inter-ocular distance (versus 7.4% for MobileNetV2-SSD), with a 5.3% jitter metric.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。