QUICK REVIEW

[논문 리뷰] High Resolution Face Editing with Masked GAN Latent Code Optimization

Martin Pernuš, Vitomir Štruc|arXiv (Cornell University)|2021. 03. 20.

Face recognition and analysis인용 수 8

한 줄 요약

MaskFaceGAN는 스타일 제너레이터2의 잠재 코드를 얼굴 파서와 속성 분류기 등을 통해 공간적 및 의미적 제약 조건을 이용해 최적화함으로써, 1024×1024 해상도에서 잡음 없는, 사진처럼 현실적인 편집을 가능하게 하는 고해상도 얼굴 편집 방법을 제안한다. 기존의 GAN 기반 방법에 비해 속성의 엉키는 현상이 감소된다.

ABSTRACT

Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: (i) are still largely focused on low-resolution images, (ii) often generate editing results with visual artefacts, or (iii) lack fine-grained control and alter multiple (entangled) attributes at once, when trying to generate the desired facial semantics. In this paper, we aim to address these issues though a novel attribute editing approach called MaskFaceGAN that focuses on local attribute editing. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: (i) preservation of relevant image content, (ii) generation of the targeted facial attributes, and (iii) spatially--selective treatment of local image areas. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the CelebA-HQ, Helen and SiblingsDB-HQf datasets and in comparison with several state-of-the-art techniques from the literature, i.e., StarGAN, AttGAN, STGAN, and two versions of InterFaceGAN. Our experimental results show that the proposed approach is able to edit face images with respect to several local facial attributes with unprecedented image quality and at high-resolutions (1024x1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is made freely available from: https://github.com/MartinPernus/MaskFaceGAN.

연구 동기 및 목표

시각적 잡음, 낮은 해상도 또는 속성의 엉키는 현상으로 인해 성능이 떨어지는 기존 GAN 기반 얼굴 편집 방법의 한계를 해결한다.
헤어 컬러, 메이크업, 얼굴 구조와 같은 특정 얼굴 속성을 고해상도에서 세밀하고 국소적으로 편집할 수 있도록 한다.
특정 얼굴 속성을 수정하면서도 전반적인 이미지 구조와 정체성을 유지하기 위해 제약 조건이 부여된 잠재 공간 최적화를 수행한다.
지각적 정확도와 최소한의 의미적 이탈을 유지하면서 국소 및 전반적 속성 편집을 모두 지원하는 방법을 제공한다.

제안 방법

사전 학습된 StyleGAN2 제너레이터의 잠재 코드를 기반으로 기울기 기반 최적화를 수행한다.
의미적 제약 조건을 확보하기 위해 미분 가능한 속성 분류기를 통해 목표 속성의 존재/부재를 강제한다.
사전 학습된 얼굴 파서를 사용하여 영역 기반의 편집 행동(예: 눈썹이나 입술만 수정)을 정의하는 공간적 제약 조건을 적용한다.
원본 이미지의 내용을 유지하기 위해 얼굴 영역의 결합을 기반으로 블렌딩 전략을 사용한다.
입력 이미지와의 지각적 유사성을 유지하기 위해 LPIPS 손실과 다층 특징 매칭을 통합한다.
속성 분류, 공간적 파싱, 지각적 재구성의 다중 목표 손실을 통합하여 강력한 최적화를 수행한다.

실험 결과

연구 질문

RQ1StyleGAN2의 잠재 코드 최적화가 시각적 잡음이 최소한인 고해상도(1024×1024) 얼굴 편집을 달성할 수 있는가?
RQ2얼굴 파서에서 유도된 공간적 제약 조건이 국소적 편집 중 속성의 엉키는 현상을 어느 정도 감소시킬 수 있는가?
RQ3미분 가능한 속성 분류기의 통합이 목표 속성 의미 제어에 어떻게 기여하는가?
RQ4제안된 방법은 기존의 GAN 역전환 기반 접근 방식에 비해 정체성과 배경 세부 정보를 더 잘 유지하는가?
RQ5이 방법은 일관된 지각적 품질을 유지하면서 국소 및 전반적 속성 편집을 모두 지원할 수 있는가?

주요 결과

MaskFaceGAN는 최신 기술 대비 뛰어난 시각적 품질과 잡음 없는 결과를 보이며, 고해상도(1024×1024) 편집 결과를 생성한다.
특히 눈썹, 헤어 컬러, 립스틱과 같은 국소적 속성에 대해 속성의 엉키는 현상이 크게 감소된다.
사용자 연구 결과, MaskFaceGAN는 지각적 품질과 속성 제어에서 경쟁 기술을 능가하지만, '좁은 눈'의 경우 눈을 감는 경향이 있어 원하는 효과를 내지 못한다.
InterFaceGAN와 같은 유사한 방법에 비해 최적화 과정이 더 빠르게 수렴되며, 이미지당 필요한 단계 수가 적다.
전반적 속성(예: '젊음', '남성')에 대해서도 입력 이미지의 얼굴 외형과 배경과의 대응 관계를 강하게 유지한다.
속성 분류기나 얼굴 파서가 잘못된 예측을 내릴 경우 특정 경우에 예상치 못한 편집이 발생하는 한계가 존재한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.