QUICK REVIEW

[논문 리뷰] Style Aggregated Network for Facial Landmark Detection

Xuanyi Dong, Yan Yan|arXiv (Cornell University)|2018. 03. 12.

Face recognition and analysis참고 문헌 65인용 수 54

한 줄 요약

이 논문은 듀얼 입력(원본 및 GAN-생성 스타일-집계 얼굴)을 사용하는 Style-Aggregated Network(SAN)을 제안하여 큰 스타일 변화 하에서 강건한 얼굴 랜드마크 검출을 달성하고 300-W와 AFLW에서 최첨단 성과를 달성한다.

ABSTRACT

Recent advances in facial landmark detection achieve success by learning discriminative features from rich deformation of face shapes and poses. Besides the variance of faces themselves, the intrinsic variance of image styles, e.g., grayscale vs. color images, light vs. dark, intense vs. dull, and so on, has constantly been overlooked. This issue becomes inevitable as increasing web images are collected from various sources for training neural networks. In this work, we propose a style-aggregated approach to deal with the large intrinsic variance of image styles for facial landmark detection. Our method transforms original face images to style-aggregated images by a generative adversarial module. The proposed scheme uses the style-aggregated image to maintain face images that are more robust to environmental changes. Then the original face images accompanying with style-aggregated ones play a duet to train a landmark detector which is complementary to each other. In this way, for each face, our method takes two images as input, i.e., one in its original style and the other in the aggregated style. In experiments, we observe that the large variance of image styles would degenerate the performance of facial landmark detectors. Moreover, we show the robustness of our method to the large variance of image styles by comparing to a variant of our approach, in which the generative adversarial module is removed, and no style-aggregated images are used. Our approach is demonstrated to perform well when compared with state-of-the-art algorithms on benchmark datasets AFLW and 300-W. Code is publicly available on GitHub: https://github.com/D-X-Y/SAN

연구 동기 및 목표

이미지 스타일의 큰 고유 분산(예: 흑백/컬러, 조명)이 랜드마크 정확도에 미치는 영향을 동기화합니다.
GAN 기반 변환을 통해 스타일 변 variation을 정규화하는 스타일-집계 접근법을 제안합니다.
원본 이미지와 스타일-집계 이미지의 보완 정보를 활용하여 강건한 랜드마크 검출기를 학습합니다.
표준 벤치마크(300-W 및 AFLW)에서 최첨단 성능을 시연하고 분석을 위해 스타일-다양한 데이터셋을 공개합니다.

제안 방법

스타일-집계 얼굴 생성 모듈과 랜드마크 예측 모듈로 구성된 두-브랜치 스타일-Aggregated Network(SAN) 아키텍처를 도입합니다.
스타일-집계 모듈은 CycleGAN 기반 전이를 사용하여 여러 스타일 변형을 생성하고 얼굴 주변의 고정된 환경을 포착하는 스타일-집계 이미지를 만듭니다.
랜드마크 예측 모듈은 원본 이미지와 스타일-집계 이미지를 입력으로 받아 VGG-16 기반 백본으로 특징을 추출하고 CPM과 유사한 계단식 히트맷 회귀를 사용합니다.
두 스트림은 보완적인 신념 맵을 생성하고, 이를 세 단계에 걸쳐 융합하여 최종 랜드마크 위치를 회귀합니다.
스타일-집 aggregation 프로세스는 고수준의 스타일-구별 특징을 클러스터링하고 CycleGAN을 스타일 전이 학습에 사용하여 감독 신호 없이 학습합니다.
학습 세부정보에는 스타일-구별 특징을 얻기 위한 ResNet-152 분류기 미세조정, 숨겨진 스타일을 발견하기 위한 k-평균 클러스터링, 정체성 손실(identity loss)으로 스타일 변형을 생성하는 CycleGAN이 포함됩니다.

실험 결과

연구 질문

RQ1실제 데이터에서 이미지 스타일 변화가 얼굴 랜드마크 검출기에 얼마나 악영향을 미치는가?
RQ2레이블이 있는 스타일 주석 없이도 스타일-집계 표현이 스타일 변화에 대한 강건성을 향상시킬 수 있는가?
RQ3원본 입력과 스타일-집계 입력을 결합한 것이 어느 하나의 입력을 사용하는 것보다 더 정확한 랜드마크 예측을 제공하는가?
RQ4SAN이 AFLW 및 300-W 데이터셋에서 최첨단 방법에 비해 어떤 성능을 보이는가?
RQ5다른 훈련/테스트 스타일 조합이 SAN 성능에 미치는 영향은 무엇인가?

주요 결과

SAN은 일반 설정과 도전적 설정 모두에서 300-W 및 AFLW 데이터셋에서 경쟁력 있고 최첨단 성능을 달성합니다.
300-W에서 SAN은 OD 경계상자와 함께 눈에 띄는 개선을 보이며 GT 박스에서 추가 이득에 근접할 수 있어 스타일 변화에 대한 강력한 강건성을 나타냅니다.
AFLW에서 SAN은 더 우수한 NME 결과를 달성하고 AFLW-Full 및 AFLW-Front 모두에서 이전 방법을 능가합니다.
두 스트림 중 하나의 원본 이미지 스트림이나 스타일-집계 스트림을 제거하면 성능이 저하되어 양방향 스트림 접근의 이점을 확인합니다.
스타일-집Aggregation 모듈은 비지도 방식으로 학습 가능하며 숨겨진 스타일을 자동으로 발견하여 다양한 데이터셋에서도 강건한 스타일 정규화를 제공합니다.
학습 및 테스트 스타일이 크게 다를 때에도 SAN은 견고하게 남아 스타일이 크게 달라질 때 300-W 스타일 전체 테스트에서 평균 약 7%의 개선을 보입니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.