QUICK REVIEW

[论文解读] FaceBoxes: A CPU Real-time Face Detector with High Accuracy

Shifeng Zhang, Xiangyu Zhu|arXiv (Cornell University)|Aug 17, 2017

Face recognition and analysis参考文献 45被引用 40

一句话总结

FaceBoxes 是一种专为 CPU 推理设计的实时、高精度人脸检测器，采用轻量化网络结构，结合快速消化卷积层（RDCL）实现高速推理，以及多尺度卷积层（MSCL）实现多尺度人脸检测。其在单核 CPU 上对 VGA 图像实现 20 FPS 的推理速度，在 AFW、PASCAL Face 和 FDDB 基准测试中达到最先进性能，新颖的锚框密集化策略显著提升了小尺寸人脸的召回率。

ABSTRACT

Although tremendous strides have been made in face detection, one of the remaining open challenges is to achieve real-time speed on the CPU as well as maintain high performance, since effective models for face detection tend to be computationally prohibitive. To address this challenge, we propose a novel face detector, named FaceBoxes, with superior performance on both speed and accuracy. Specifically, our method has a lightweight yet powerful network structure that consists of the Rapidly Digested Convolutional Layers (RDCL) and the Multiple Scale Convolutional Layers (MSCL). The RDCL is designed to enable FaceBoxes to achieve real-time speed on the CPU. The MSCL aims at enriching the receptive fields and discretizing anchors over different layers to handle faces of various scales. Besides, we propose a new anchor densification strategy to make different types of anchors have the same density on the image, which significantly improves the recall rate of small faces. As a consequence, the proposed detector runs at 20 FPS on a single CPU core and 125 FPS using a GPU for VGA-resolution images. Moreover, the speed of FaceBoxes is invariant to the number of faces. We comprehensively evaluate this method and present state-of-the-art detection performance on several face detection benchmark datasets, including the AFW, PASCAL face, and FDDB. Code is available at https://github.com/sfzhang15/FaceBoxes

研究动机与目标

解决在 CPU 设备上实现人脸检测实时速度与高精度的挑战。
克服级联 CNN 方法的局限性，后者在人脸数量增加时会出现速度下降问题。
设计一种轻量化、端到端可训练的网络，以在不同人脸尺度和外观下保持高性能。
通过一种新颖的锚框密集化策略提升小尺寸人脸检测的召回率。

提出的方法

引入快速消化卷积层（RDCL）以加速推理，实现 CPU 上的实时性能。
提出多尺度卷积层（MSCL），以丰富感受野，并实现在不同特征层上的多尺度锚框离散化。
设计一种新型锚框密集化策略，以在不同人脸尺度间平衡锚框密度，尤其增强小尺寸人脸检测能力。
采用全卷积单阶段网络架构，实现端到端训练，以实现高效且精确的人脸检测。
在特征图上采用多尺度锚框铺砌机制，以覆盖各种人脸尺寸。
使用交叉熵损失和光滑 L1 损失对模型进行端到端训练，以实现边界框回归。

实验结果

研究问题

RQ1单阶段全卷积人脸检测器是否能在 CPU 上实现实时推理，同时保持高精度？
RQ2如何优化锚框分布以在不增加计算成本的前提下提升小尺寸人脸的召回率？
RQ3哪些架构组件能够在保持检测性能的同时实现在 CPU 上的高速推理？
RQ4锚框密集化在不同基准测试中对小尺寸人脸检测性能的提升程度如何？
RQ5与现有人脸检测器相比，所提出的 MSCL 和 RDCL 设计在速度-精度权衡方面表现如何？

主要发现

FaceBoxes 在单核 CPU 上对 VGA 分辨率图像实现 20 FPS 的推理速度，且推理速度不受图像中人脸数量的影响。
该模型在 GPU 上可达到 125 FPS，展现出强大的硬件可扩展性。
在 FDDB 基准测试中，FaceBoxes 实现了最先进性能，在连续 ROC 曲线上的 mAP 达到 96.0%，优于所有先前方法。
消融实验表明，锚框密集化在 FDDB 上使 mAP 提升 1.1%，证明其在小尺寸人脸检测中的关键作用。
MSCL 通过增强感受野多样性及跨尺度锚框铺砌，在 FDDB 上带来 1.0% 的 mAP 提升。
RDCL 在仅导致 mAP 下降 0.1% 的情况下，将推理时间减少约 19.3ms，证明其高效且保持精度的设计优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。