QUICK REVIEW

[논문 리뷰] Unpaired Image-to-Image Translation with Domain Supervision.

Jianxin Lin, Sen Liu|arXiv (Cornell University)|2019. 02. 11.

Generative Adversarial Networks and Image Synthesis참고 문헌 23인용 수 2

한 줄 요약

이 논문은 도메인 지도 GAN(DosGAN)을 제안하며, 사전에 분류기를 훈련시켜 도메인 특수 특징을 추출함으로써 도메인 정보를 명시적 supervision으로 활용하는 새로운 unpaired image-to-image translation 프레임워크이다. 기존의 분리된 코드나 별도의 생성기에 의존하는 방법들과는 달리, DosGAN은 도메인 특수 특징과 도메인 독립 특징을 통합하여 번역 성능을 향상시키며, 얼굴 속성, 정체성, 계절 번역에서 최신 기술 수준의 성능을 달성하고, zero-shot 도메인 전이 및 임의의 이미지 쌍 간의 조건부 번역을 가능하게 한다.

ABSTRACT

Image-to-image translation tasks have been widely investigated with Generative Adversarial Networks (GANs). However, existing approaches are mostly designed in an unsupervised manner while little attention has been paid to domain information within unpaired data. In this paper, we treat domain information as explicit supervision and design an unpaired image-to-image translation framework, Domain-supervised GAN (DosGAN), which takes the first step towards the exploration of explicit domain supervision. In contrast to representing domain characteristics using different generators or domain codes, we pre-train a classification network to explicitly classify the domain of an image. After pre-training, this network is used to extract the domain-specific features of each image. Such features, together with the domain-independent features extracted by another encoder (shared across different domains), are used to generate image in target domain. Extensive experiments on multiple facial attribute translation, multiple identity translation, multiple season translation and conditional edges-to-shoes/handbags demonstrate the effectiveness of our method. In addition, we can transfer the domain-specific feature extractor obtained on the Facescrub dataset with domain supervision information to unseen domains, such as faces in the CelebA dataset. We also succeed in achieving conditional translation with any two images in CelebA, while previous models like StarGAN cannot handle this task.

연구 동기 및 목표

기존 unpaired image-to-image translation 방법들이 unpaired 데이터 내에서 명시적 도메인 정보를 忽略하는 한계를 해결하기 위해.
도메인 분류를 명시적 supervision으로 사용하여 번역 품질 향상과 분리성 향상을 탐색하기 위해.
사전에 훈련된 도메인 특징 추출기를 사용해 미리 보지 않은 도메인으로 전이함으로써 zero-shot 도메인 전이를 가능하게 하기 위해.
StarGAN과 같은 모델이 지원하지 않는, 데이터셋 내 어떤 두 이미지 간의 조건부 번역을 지원하기 위해.

제안 방법

데이터셋의 도메인 레이블을 사용하여 unpaired 데이터에서 분류 네트워크를 사전 훈련하여 각 이미지의 도메인을 예측한다.
사전 훈련된 분류기의 최종 레이어에서 각 입력 이미지의 도메인 특수 특징을 추출한다.
동일한 이미지에서 도메인 독립 특징을 추출하기 위해 공유 인코더를 사용한다.
도메인 특수 특징과 도메인 독립 특징을 결합하여 생성기 네트워크의 입력으로 사용한다.
생성기를 adversarial loss와 cycle consistency loss를 사용하여 훈련시켜 현실적이고 일관된 번역을 보장한다.
기본 이미지의 도메인 특수 특징과 도메인 독립 특징을 조건으로 하여 생성기를 조건부로 설정함으로써 조건부 번역을 가능하게 한다.

실험 결과

연구 질문

RQ1명시적 도메인 지도 supervision이 비지도 접근 방식에 비해 unpaired image-to-image translation 성능 향상에 기여하는가?
RQ2사전에 훈련된 도메인 분류기가 미리 보지 않은 도메인으로 효과적으로 전이되어 zero-shot 도메인 적응이 가능한가?
RQ3제안된 프레임워크는 StarGAN과 달리 데이터셋 내 어떤 두 이미지 간의 조건부 번역도 수행할 수 있는가?
RQ4도메인 특수 특징과 도메인 독립 특징의 분리가 더 나은 분리성과 번역 품질을 이끌어내는가?

주요 결과

DosGAN은 기존 unpaired GAN에 비해 얼굴 속성, 정체성, 계절 번역 벤치마크에서 뛰어난 성능을 달성한다.
사전에 훈련된 도메인 특징 추출기가 미리 보지 않은 도메인으로 효과적으로 일반화되며, 예를 들어 Facescrub에서 CelebA로의 전이 시 미세조정 없이도 성공적으로 작동한다.
DosGAN은 CelebA 내 어떤 두 이미지 간의 조건부 번역을 가능하게 하여, StarGAN이 지원하지 않는 기능을 실현한다.
명시적 supervision을 통해 도메인 특수 특징과 도메인 독립 특징의 분리성이 향상됨을 입증한다.
정량적 결과에서 여러 번역 작업에서 FID 및 사용자 평가 점수에서 뚜렷한 향상이 나타난다.
Ablation study를 통해 명시적 도메인 지도 supervision이 더 나은 번역 충실도와 분리성 향상에 기여하는 것으로 확인된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.