QUICK REVIEW

[论文解读] Multi-mapping Image-to-Image Translation via Learning Disentanglement

Xiaoming Yu, Yuanqi Chen|arXiv (Cornell University)|Sep 17, 2019

Multimodal Machine Learning Applications被引用 45

一句话总结

本文提出 DMIT：一个无监督的统一框架，学习解耦的内容和风格表示，以实现单模型的多域和多模态图像到图像翻译。

ABSTRACT

Recent advances of image-to-image translation focus on learning the one-to-many mapping from two aspects: multi-modal translation and multi-domain translation. However, the existing methods only consider one of the two perspectives, which makes them unable to solve each other's problem. To address this issue, we propose a novel unified model, which bridges these two objectives. First, we disentangle the input images into the latent representations by an encoder-decoder architecture with a conditional adversarial training in the feature space. Then, we encourage the generator to learn multi-mappings by a random cross-domain translation. As a result, we can manipulate different parts of the latent representations to perform multi-modal and multi-domain translations simultaneously. Experiments demonstrate that our method outperforms state-of-the-art methods.

研究动机与目标

将多域和多模态 I2I 翻译桥接到一个统一框架。
学习跨域共享的解耦内容与风格表示。
通过随机域/风格采样和潜在回归实现跨域翻译与多样化输出。
对齐跨域潜在表示以提升翻译质量和多样性。

提出的方法

使用 E_c 和 E_s 将输入图像解耦为内容(C)与风格(S)空间。
使用一个基于风格的统一生成器 G，条件为域标签 d 和风格 s，生成 x = G(C(x), S(x), d)。
在潜在空间中使用类似 cVAE 的目标和条件对抗损失，以解耦路径进行训练。
通过随机跨域翻译和潜在回归 (L_reg) 激励多样性与完整输出分布。
采用统一的条件判别器 D_c 和像素空间 GAN D_x，在各领域之间匹配真实与生成分布。
共同优化：min_{G,E_c,E_s} max_{D_c,D_x} (L_D-Path + L_T-Path), 其中包含 L_cVAE、L^c_GAN、L_reg 和 L^x_GAN。

实验结果

研究问题

RQ1如何在一个无监督框架中统一多域和多模态 I2I 翻译？
RQ2将内容与风格解耦、并跨域对齐潜在空间，是否能在大量域中实现多样化且高质量的翻译？
RQ3随机跨域采样与潜在回归是否提高输出分布的覆盖率和生成多样性？
RQ4单一统一模型是否能处理域不可数的语义图像合成？

主要发现

DMIT 在季节转换任务上相较基线具备更优的 FID 分数。
DMIT 取得更高的 LPIPS 多样性分数，表示同一域内对同一输入的输出更具多样性。
消融研究表明 Translation Path (T-Path) 与 Disentanglement Path (D-Path) 对质量与多样性均必需。
潜在回归 (L_reg) 与 L^x_GAN 提升了风格/内容使用的多样性与正确性。
与 SISGAN、Paired-D GAN、TAGAN 相比，DMIT 在语义图像合成上表现出色，FID、主观感知分数、PSNR 与 SSIM 均优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。