QUICK REVIEW

[논문 리뷰] Collaborative Learning for Faster StyleGAN Embedding

Shanyan Guan, Ying Tai|arXiv (Cornell University)|2020. 07. 03.

Generative Adversarial Networks and Image Synthesis참고 문헌 40인용 수 68

한 줄 요약

본 논문은 임베딩 네트워크와 최적화 기반 이터레이터를 공동으로 학습하는 협력 학습 프레임워크를 제시하여 실제 이미지를 StyleGAN의 잠재 공간에 효율적으로 임베딩하고, 실시간 추론과 경쟁력 있는 역삽입 품질을 달성한다.

ABSTRACT

The latent code of the recent popular model StyleGAN has learned disentangled representations thanks to the multi-layer style-based generator. Embedding a given image back to the latent space of StyleGAN enables wide interesting semantic image editing applications. Although previous works are able to yield impressive inversion results based on an optimization framework, which however suffers from the efficiency issue. In this work, we propose a novel collaborative learning framework that consists of an efficient embedding network and an optimization-based iterator. On one hand, with the progress of training, the embedding network gives a reasonable latent code initialization for the iterator. On the other hand, the updated latent code from the iterator in turn supervises the embedding network. In the end, high-quality latent code can be obtained efficiently with a single forward pass through our embedding network. Extensive experiments demonstrate the effectiveness and efficiency of our work.

연구 동기 및 목표

실시간 편집을 위해 실제 이미지를 StyleGAN 잠재 공간으로 효율적으로 역삽입하는 것을 동기화한다.
이미지를 W+ 잠재 코드로 매핑하기 위해 신원(identity)와 속성(attributes)을 구분해 표현하는 임베딩 네트워크를 개발한다.
이터레이터의 보정이 임베딩 네트워크를 감독하는 협력 루프를 활용한다.
페어링된 잠재 코드나 오프라인 최적화를 필요로 하지 않고도 빠르고 고품질의 역삽입을 달성한다.
빠른 임베딩으로 가능해진 광범위한 의미 편집 응용을 보여준다.

제안 방법

잠재 코드 역삽입을 위한 임베딩 네트워크와 최적화 기반 이터레이터를 갖춘 협력 프레임워크를 제안한다.
두 개의 인코더(identity와 attribute)를 사용하고, 이들의 특징을 역정규화(denormalization)로 합쳐 W+의 w_e를 예측한다.
초기화 이터레이터를 w_e로 시작하여 MSE와 LPIPS를 결합한 손실 L_opt를 사용해 w_o로 최적화한다.
잠재 코드(L_w), 이미지(L_mse), 지각적 신호(L_per)의 손실로 임베딩 네트워크를 감독한다.
검증된 감독 신호를 보존하기 위해 캐시 메커니즘을 활용하여 온라인으로 반복한다.

실험 결과

연구 질문

RQ1임베딩 네트워크와 최적화 기반 이터레이터를 결합하면 오프라인 최적화보다 더 빠르게 높은 품질의 StyleGAN 역삽입을 생성할 수 있는가?
RQ2임베딩 네트워크에서 신원(identity)와 속성(attribute) 정보를 구분해 표현하는 것이 잠재 코드의 정확도와 편집 품질을 향상시키는가?
RQ3협력 학습이 수렴 속도 및 역삽입 지표(PSNR, SSIM, LPIPS)에 있어 최첨단 방법과 비교해 어떤 영향을 미치는가?

주요 결과

제안 방법은 경쟁력 있는 역삽입 품질을 달성하는 동시에 이전 방법 중 가장 빠른 접근법보다 약 500배 더 빠르다.
본 연구의 LPIPS은 0.16(CelebA-HQ) 및 0.11(CACD), PSNR은 31.47(CelebA-HQ) 및 32.05(CACD), SSIM은 0.83(두 데이터셋).
이터레이터는 임베딩 네트워크가 제공하는 더 나은 초기화 덕분에 수렴 속도가 빨라지고 상한 성능이 더 좋아진다.
신원(identity) 및 속성(attribute) 인코더를 분리하면 단일 ResNet 기반 인코더보다 역삽입 품질이 향상된다.
캐시 메커니즘은 이터레이터의 최근 결과가 미흡하더라도 임베딩 네트워크가 강력한 감독 신호를 받도록 보장한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.