QUICK REVIEW

[论文解读] Few-Shot Unsupervised Image-to-Image Translation

Ming-Yu Liu, Xun Huang|arXiv (Cornell University)|May 5, 2019

Generative Adversarial Networks and Image Synthesis参考文献 53被引用 80

一句话总结

介绍了 FUNIT，一种少样本、无监督的图像到图像翻译框架，在测试时仅使用少量目标类别的样本即可将来自源类别的图像翻译为未见目标类别的相似图像。

ABSTRACT

Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design. Through extensive experimental validation and comparisons to several baseline methods on benchmark datasets, we verify the effectiveness of the proposed framework. Our implementation and datasets are available at https://github.com/NVlabs/FUNIT .

研究动机与目标

在无监督图像到图像翻译中激发并实现少样本泛化。
学习一个模型，在给定测试时仅使用少量目标类别样本的情况下，将内容图像翻译为未见目标类别的相似图像。
研究训练类别多样性如何影响少样本翻译能力。
在多数据集、若干评估指标上展示翻译质量和分布匹配。

提出的方法

使用一个条件生成器 G，它接受内容图像 x 和一组 K 个目标类别图像 {y1,...,yK} 以生成翻译图像 x̄。
将 G 分解为内容编码器 Ex、类别编码器 Ey，以及带 AdaIN 残差块的解码器 Fx。
Ey 通过对 K 个目标类别图像的潜在表示求平均来计算类别潜在编码 zy。
解码器中的 AdaIN 将 zy 注入以控制全局外观，同时 Ex 维持内容结构。
训练一个多任务对抗判别器 D，每个源类别有一个输出，用于在给定源类别条件下区分真实与翻译图像。
用 GAN 损失、内容重建损失 Lr 与特征匹配损失 Lf 联合优化目标。

实验结果

研究问题

RQ1单个翻译模型是否能够在测试时仅凭少量样本泛化到未见的目标类别？
RQ2在训练阶段看到的源类别数量增加会如何影响少样本翻译性能？
RQ3改变目标类别样本数 K 对翻译质量与分布匹配有何影响？
RQ4所提出的损失（GAN、内容重建、特征匹配）在少样本设置中是否对性能有显著贡献？
RQ5该框架是否可以用于通过生成图像来改进少样本分类任务？

主要发现

FUNIT 在翻译准确性、内容保留、照片逼真度与分布匹配方面在 1–20 条样本设置中优于基线。
翻译准确性（测试分类器 Top-5）在动物脸部数据中随 K 增加从 1 提升到 15–20 时达到 73.69–83.57%，在鸟类数据中达到 49.01–55.63%。
随着 K 增大，平均 inception score 与 FID 提高，表明更好的照片真实感与分布对齐。
更多源类别在训练中被看到时表现提升，表明对未见目标类别的泛化能力更好。
人工评估显示 FUNIT 生成的输出在多个样本数量水平下比公平/不公平基线更忠实于目标类别。
FUNIT 也可用于通过为新类别生成额外带标签的样本来提高少样本分类性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。