QUICK REVIEW

[논문 리뷰] MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

Cheng‐Han Lee, Ziwei Liu|arXiv (Cornell University)|2019. 07. 27.

Face recognition and analysis참고 문헌 47인용 수 83

한 줄 요약

MaskGAN은 의미 마스크를 중간 표현으로 사용하여 다양하고 인터랙티브한 얼굴 이미지 조작을 가능하게 하며, Dense Mapping Network 및 Editing Behavior Simulated Training, 그리고 새로운 데이터셋 CelebAMask-HQ를 도입한다.

ABSTRACT

Facial image manipulation has achieved great progress in recent years. However, previous methods either operate on a predefined set of face attributes or leave users little freedom to interactively manipulate images. To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. Our key insight is that semantic masks serve as a suitable intermediate representation for flexible face manipulation with fidelity preservation. MaskGAN has two main components: 1) Dense Mapping Network (DMN) and 2) Editing Behavior Simulated Training (EBST). Specifically, DMN learns style mapping between a free-form user modified mask and a target image, enabling diverse generation results. EBST models the user editing behavior on the source mask, making the overall framework more robust to various manipulated inputs. Specifically, it introduces dual-editing consistency as the auxiliary supervision signal. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ. MaskGAN is comprehensively evaluated on two challenging tasks: attribute transfer and style copy, demonstrating superior performance over other state-of-the-art methods. The code, models, and dataset are available at https://github.com/switchablenorms/CelebAMask-HQ.

연구 동기 및 목표

시맨틱 마스크를 조작 매개변수로 삼아 다양한 인터랙티브한 얼굴 조작을 가능하게 한다.
대상 이미지와 마스크로부터 사용자 수정 마스크에 대해 강건한 스타일 매핑을 학습한다.
추론 시 마스크 변화에 대한 강인성을 향상시키기 위해 사용자 편집 행동을 모델링한다.
얼굴 편집 연구를 위한 대규모 고해상도 마스크 주석 데이터셋을 제공한다.

제안 방법

Dense Mapping Network (DMN)과 Spatial-Aware Style Encoder는 AdaIN을 사용하여 대상 이미지와 마스크로부터 공간-의식 스타일를 생성된 결과물로 전달한다.
MaskVAE는 얼굴 구조 선지의 다양체를 모델링하고 매끄러운 마스크 보간을 가능하게 한다.
Alpha Blender는 여러 편집된 마스크 간 조작 일관성을 유지하기 위해 알파 혼합을 학습한다.
Editing Behavior Simulated Training (EBST)은 inter/out 마스크를 생성하고 DMN 및 Blender를 이중 편집 일관성을 위해 최적화하여 사용자 편집을 시뮬레이션한다.
현실성과 충실도를 보장하기 위한 적대적, 특징 매칭, 및 지각 손실을 포함한 다목적 학습.

실험 결과

연구 질문

RQ1시맨틱 마스크가 아이덴티티를 보존하면서 다양한 얼굴 조작을 위한 유연한 중간 표현으로 작용할 수 있는가?
RQ2목표 이미지와 사용자 수정 마스크 간의 강건한 스타일 전송을 어떻게 학습하여 인터랙티브 편집을 지원할 수 있는가?
RQ3학습 중 사용자의 편집 행동을 시뮬레이션하는 것이 추론 시 마스크 변화에 대한 강건성을 향상시키는가?
RQ4제안된 CelebAMask-HQ 데이터셋이 고해상도 마스크 기반 얼굴 편집 연구에 미치는 영향은 무엇인가?

주요 결과

MaskGAN은 기초 방법 대비 경쟁적이거나 우수한 분할 및 속성 보존과 함께 그럴듯한 속성 전송 및 스타일 복사를 달성한다.
Spatial-Aware Style Encoder는 타깃 마스크 구조를 조건으로 삼아 더 나은 스타일 전송을 가능하게 하여 사용자 수정 마스크로 인한 편향을 줄인다.
EBST는 마스크 변 variations에 대한 강건성을 향상시키고 인터랙티브 편집 중 아이덴티티 보존을 강화한다.
MaskGAN은 고해상도(512x512) 얼굴 편집 작업에서 강한 성능을 보여주며 CelebAMask-HQ 데이터셋의 혜택을 얻는다.
편집 행동 시뮬레이션과 이중 편집 일관성 손실은 인터랙티브 입력에서 더 안정적인 마스크-투-이미지 조작에 기여한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.