QUICK REVIEW

[논문 리뷰] Multi-mapping Image-to-Image Translation via Learning Disentanglement

Xiaoming Yu, Yuanqi Chen|arXiv (Cornell University)|2019. 09. 17.

Multimodal Machine Learning Applications인용 수 45

한 줄 요약

이 논문은 한 모델로 다중 도메인 및 다중 모달 이미지-투-이미지 번역을 가능하게 하는 disentangled 콘텐츠와 스타일 표현을 학습하는 비지도 통합 프레임워크 DMIT를 제안한다.

ABSTRACT

Recent advances of image-to-image translation focus on learning the one-to-many mapping from two aspects: multi-modal translation and multi-domain translation. However, the existing methods only consider one of the two perspectives, which makes them unable to solve each other's problem. To address this issue, we propose a novel unified model, which bridges these two objectives. First, we disentangle the input images into the latent representations by an encoder-decoder architecture with a conditional adversarial training in the feature space. Then, we encourage the generator to learn multi-mappings by a random cross-domain translation. As a result, we can manipulate different parts of the latent representations to perform multi-modal and multi-domain translations simultaneously. Experiments demonstrate that our method outperforms state-of-the-art methods.

연구 동기 및 목표

다중 도메인 및 다중 모달 I2I 번역을 하나의 통합 프레임워크로 연결한다.
도메인 간 공유되는 disentangled 콘텐츠 및 스타일 표현을 학습한다.
도메인 무작위 샘플링과 잠재 회귀를 통해 도메인 간 교차 번역 및 다양한 출력을 가능하게 한다.
도메인 간 잠재 표현 정렬을 통해 번역 품질과 다양성을 향상시킨다.

제안 방법

입력 이미지를 콘텐츠(C) 공간과 스타일(S) 공간으로 분리하기 위해 E_c와 E_s를 사용한다.
도메인 라벨 d와 스타일 s로 조건화된 통합 스타일 기반 생성기 G를 사용하여 x = G(C(x), S(x), d)를 생성한다.
잠재 공간에서 cVAE 유사 목표 및 조건부 적대적 손실을 활용한 disentanglement 경로로 학습한다.
무작위 교차 도메인 번역 및 잠재 회귀(L_reg)로 다양성과 출력 분포를 촉진한다.
실제 분포와 생성 분포를 도메인 간에 일치시키기 위해 통합 조건부 판별기 D_c와 픽셀 공간 GAN D_x를 사용한다.
공동 최적화: min_{G,E_c,E_s} max_{D_c,D_x} (L_D-Path + L_T-Path) 의 구성요소 L_cVAE, L^c_GAN, L_reg, L^x_GAN로 달성한다.

실험 결과

연구 질문

RQ1다중 도메인 및 다중 모듈 이미지-투-이미지 번역을 어떻게 하나의 비지도 프레임워크로 통합할 수 있을까?
RQ2콘텐츠와 스타일의 disentanglement 및 도메인 간 잠재 공간 정렬이 다양한 도메인에서 고품질의 번역과 다양성을 가능하게 할까?
RQ3무작위 교차 도메인 샘플링 및 잠재 회귀가 출력 분포의 커버리지와 생성 다양성을 향상시킬 수 있을까?
RQ4단일 통합 모델이 도메인이 셀 수 없이 많은 의미론적 이미지 합성을 처리할 수 있을까?

주요 결과

DMIT는 계절 전이 작업에서 기저 모델 대비 우수한 FID 점수를 달성한다.
DMIT는 도메인 내 입력당 더 다양한 출력을 나타내는 더 높은 LPIPS 다양성 점수를 달성한다.
변환 경로(T-Path)와 disentanglement 경로(D-Path) 모두 품질과 다양성에 필수적임을 입증하는 제거적 연구에서 확인되었다.
잠재 회귀(L_reg)와 L^x_GAN은 스타일/콘텐츠 사용의 다양성과 정확성을 향상시킨다.
DMIT는 SISGAN, Paired-D GAN, TAGAN에 비해 FID, 인간 지각 점수, PSNR 및 SSIM 면에서 유리한 성능으로 의미론적 이미지 합성에서도 강력한 성능을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.