QUICK REVIEW

[论文解读] Collaborative Learning for Faster StyleGAN Embedding

Shanyan Guan, Ying Tai|arXiv (Cornell University)|Jul 3, 2020

Generative Adversarial Networks and Image Synthesis参考文献 40被引用 68

一句话总结

该论文提出一个协作学习框架，联合训练一个嵌入网络和一个基于优化的迭代器，以高效将真实图像嵌入 StyleGAN 的潜在空间，实现实时推理并具备竞争性的反演质量。

ABSTRACT

The latent code of the recent popular model StyleGAN has learned disentangled representations thanks to the multi-layer style-based generator. Embedding a given image back to the latent space of StyleGAN enables wide interesting semantic image editing applications. Although previous works are able to yield impressive inversion results based on an optimization framework, which however suffers from the efficiency issue. In this work, we propose a novel collaborative learning framework that consists of an efficient embedding network and an optimization-based iterator. On one hand, with the progress of training, the embedding network gives a reasonable latent code initialization for the iterator. On the other hand, the updated latent code from the iterator in turn supervises the embedding network. In the end, high-quality latent code can be obtained efficiently with a single forward pass through our embedding network. Extensive experiments demonstrate the effectiveness and efficiency of our work.

研究动机与目标

Motivate efficient inversion of real images into StyleGAN latent space for real-time editing.
Develop an embedding network that disentangles identity and attributes to map images to W+ latent codes.
Leverage a collaborative loop where iterator refinements supervise the embedding network.
Achieve fast, high-quality inversion without requiring paired latent codes or offline optimization.
Demonstrate broad semantic editing applications enabled by fast embedding.

提出的方法

提出一个包含嵌入网络和基于优化的迭代器的潜在码反演的协作框架。
使用两个编码器（身份和属性），其特征通过去归一化合并以预测 W+ 中的 w_e。
用 w_e 初始化迭代器并通过损失 L_opt，使其优化到 w_o，L_opt 结合 MSE 与 LPIPS。
以潜在码损失（L_w）、图像损失（L_mse）和感知损失（L_per）来监督嵌入网络。
在线迭代并使用缓存机制以保留最佳监督信号并加速收敛。

实验结果

研究问题

RQ1一个配备嵌入网络的优化型迭代器能否比离线优化更快地产生高质量的 StyleGAN 反演？
RQ2在嵌入网络中解耦身份信息和属性信息是否能提高潜在码的准确性与编辑质量？
RQ3与最先进方法相比，协作学习如何影响收敛速度和反演指标（PSNR、SSIM、LPIPS）？

主要发现

方法	PSNR（CelebA-HQ）	SSIM（CelebA-HQ）	LPIPS（CelebA-HQ）	PSNR（CACD）	SSIM（CACD）	LPIPS（CACD）
Image2StyleGAN	29.72	0.75	0.18	31.39	0.80	0.12
StyleGAN-Encoder	32.08	0.85	0.18	33.10	0.85	0.11
Image2StyleGAN++	32.46	0.90	0.22	34.40	0.90	0.15
Ours	31.47	0.83	0.16	32.05	0.83	0.11

本方法在反演质量方面具有竞争力，同时比最快的前人方法大约快500倍。
Our achieves LPIPS of 0.16 (CelebA-HQ) and 0.11 (CACD), with PSNR 31.47 (CelebA-HQ) and 32.05 (CACD), and SSIM 0.83 (both datasets).
迭代器从嵌入网络提供的更好初始化中受益，从而实现更快收敛和更好的上界性能。
解耦的身份与属性编码器相比单一基于 ResNet 的编码器提升反演质量。
缓存机制确保嵌入网络在迭代器最近结果不佳时仍能接收到强监督信号。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。