QUICK REVIEW

[논문 리뷰] PyramidBox: A Context-assisted Single Shot Face Detector

Xu Tang, Daniel K. Du|arXiv (Cornell University)|2018. 03. 21.

Face recognition and analysis참고 문헌 36인용 수 48

한 줄 요약

피라미드박스는 PyramidAnchors, LFPN, 그리고 컨텍스트 인식 예측을 사용한 컨텍스트 보조 단일 샷 얼굴 탐지를 도입하여 어려운 얼굴 탐지를 개선하고 FDDB와 WIDER FACE에서 최첨단 성능을 달성합니다.

ABSTRACT

Face detection has been well studied for many years and one of remaining challenges is to detect small, blurred and partially occluded faces in uncontrolled environment. This paper proposes a novel context-assisted single shot face detector, named \emph{PyramidBox} to handle the hard face detection problem. Observing the importance of the context, we improve the utilization of contextual information in the following three aspects. First, we design a novel context anchor to supervise high-level contextual feature learning by a semi-supervised method, which we call it PyramidAnchors. Second, we propose the Low-level Feature Pyramid Network to combine adequate high-level context semantic feature and Low-level facial feature together, which also allows the PyramidBox to predict faces of all scales in a single shot. Third, we introduce a context-sensitive structure to increase the capacity of prediction network to improve the final accuracy of output. In addition, we use the method of Data-anchor-sampling to augment the training samples across different scales, which increases the diversity of training data for smaller faces. By exploiting the value of context, PyramidBox achieves superior performance among the state-of-the-art over the two common face detection benchmarks, FDDB and WIDER FACE. Our code is available in PaddlePaddle: \href{https://github.com/PaddlePaddle/models/tree/develop/fluid/face_detection}{\url{https://github.com/PaddlePaddle/models/tree/develop/fluid/face_detection}}.

연구 동기 및 목표

잡혀 있지 않은 환경에서 작고 흐릿하며 가려진 얼굴의 견고한 탐지를 동기 부여한다.
컨텍스트 정보(머리, 어깨, 신체)를 활용하여 얼굴 위치 추정 및 분류를 돕는다.
다중 스케일 탐지를 위한 저레벨 고해상도 특징과 고레벨 시맨틱 특징을 융합하는 아키텍처를 개발한다.
추가 표기 없이도 컨텍스추얼 특징 학습을 감독하기 위해 반지도 학습 PyramidAnchors를 도입한다.
작은 얼굴 다양성을 개선하기 위해 스케일 인식 데이터 증강으로 훈련 데이터를 보강한다.

제안 방법

얼굴, 머리, 신체를 여러 스케일에서 감독하기 위한 컨텍스트 특징 학습을 감독하기 위해 PyramidAnchors를 도입한다.
저수준 기능 피라미드 네트워크(LFPN)를 개발하여 고수준의 컨텍스트와 얼굴의 저수준 특징을 융합하고 단일 샷 다중 스케일 탐지를 수행한다.
위치 추정 및 분류를 강화하기 위해 와이드/딥 네트워크와 최대-인-아웃(max-in-out) 층을 갖춘 컨텍스트-민감한 예측 모듈(CPM)을 설계한다.
학습 데이터 분포를 재구성하고 작은 얼굴 다양성을 증가시키기 위해 데이터-앵커 샘플링을 도입한다.
피라미드 앵커 전반에 걸쳐 얼굴, 머리, 신체 예측을 공동으로 감독하는 PyramidBox 손실을 제안한다.

실험 결과

연구 질문

RQ1얼굴 주위의 컨텍스트 정보(머리, 어깨, 몸)를 어떻게 활용하여 어려운 작고 가려진 얼굴 탐지를 향상시킬 수 있는가?
RQ2 LFPN을 통합하는 것이 단순히 상향식의 고수준 특징만 사용하는 것보다 작은 얼굴에 대한 성능을 개선하는가?
RQ3PyramidAnchors와 반지도 학습 컨텍스트 라벨링이 쉬운/중간/어려운 하위 집합에서 탐지 정확도에 어떤 영향을 미치는가?
RQ4맥스-인-아웃을 갖춘 컨텍스트-민감 예측 모듈이 위치 추정 및 분류 정확도를 모두 향상시키는가?
RQ5데이터-앵커 샘플링이 훈련 데이터를 다양화하여 작은 얼굴 탐지를 개선하는가?

주요 결과

LFPN이 중간 계층(conv7)에서 시작하여 LFPN을 적용하면 하드 서브셋의 mAP가 baselines보다 높아(86.1) 작은 얼굴에 대한 LFPN의 효용성이 입증된다.
데이터-앵커 샘플링은 하드 서브셋에서 easy/medium/hard 전체에서 mAP를 0.4–0.6포인트 향상시킨다.
다중 피라미드 레벨(얼굴, 머리, 신체)을 가진 PyramidAnchors는 baselines에 비해 상당한 이득을 제공하며(하드 mAP가 84.2에서 85.1로 증가).
컨텍스트-민감 예측 모듈(CPM)은 easy/medium/hard mAP에서 DSSD 및 SSH 스타일 모듈보다 우수하며, 한 비교에서 CPM은 각각 95.6/94.5/88.5를 달성한다.
Max-in-out은 모든 하위 집합에서 추가 이득(약 0.1–0.3 mAP 포인트)을 기여한다.
모든 제안 구성 요소를 결합하면 PyramidBox가 WIDER FACE 검증/테스트 세트에서 쉬운(95.5–96.1), 중간(94.7–95.0), 어려운(88.8–88.9) 서브셋에서 상당한 mAP 개선을 달성하여 최첨단 성능에 근접한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.