QUICK REVIEW

[论文解读] GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data

Shuchang Zhou, Taihong Xiao|arXiv (Cornell University)|May 14, 2017

Generative Adversarial Networks and Image Synthesis参考文献 22被引用 49

一句话总结

GeneGAN 提出了一种确定性生成模型，能够从无配对的弱标签数据中学习解耦的对象属性子空间——仅使用表示属性存在/不存在的 0/1 标签（如眼镜或微笑）。通过利用对抗性训练和循环重建，该方法实现了精确的对象转换，例如在无配对图像或显式对象分割的情况下，实现人脸间眼镜的互换。

ABSTRACT

Object Transfiguration replaces an object in an image with another object from a second image. For example it can perform tasks like "putting exactly those eyeglasses from image A on the nose of the person in image B". Usage of exemplar images allows more precise specification of desired modifications and improves the diversity of conditional image generation. However, previous methods that rely on feature space operations, require paired data and/or appearance models for training or disentangling objects from background. In this work, we propose a model that can learn object transfiguration from two unpaired sets of images: one set containing images that "have" that kind of object, and the other set being the opposite, with the mild constraint that the objects be located approximately at the same place. For example, the training data can be one set of reference face images that have eyeglasses, and another set of images that have not, both of which spatially aligned by face landmarks. Despite the weak 0/1 labels, our model can learn an "eyeglasses" subspace that contain multiple representatives of different types of glasses. Consequently, we can perform fine-grained control of generated images, like swapping the glasses in two images by swapping the projected components in the "eyeglasses" subspace, to create novel images of people wearing eyeglasses. Overall, our deterministic generative model learns disentangled attribute subspaces from weakly labeled data by adversarial training. Experiments on CelebA and Multi-PIE datasets validate the effectiveness of the proposed model on real world data, in generating images with specified eyeglasses, smiling, hair styles, and lighting conditions etc. The code is available online.

研究动机与目标

解决在无需配对训练数据或显式对象分割的情况下进行对象转换的挑战。
通过使用示例图像指定所需属性（如眼镜或面部表情）来实现对图像生成的细粒度控制。
从弱监督数据（0/1 标签）中学习解耦的属性子空间，以支持多样化且逼真的图像操作。
开发一种对称且稳定的训练框架，利用循环重建损失来改善生成模型训练，而无需可逆映射。

提出的方法

训练一个具有编码器-解码器架构的条件生成模型，以将背景特征与对象特定特征解耦。
使用对抗性训练确保图像重建和生成的真实性，并通过循环一致性损失稳定训练过程。
通过地标（例如人脸地标）实现空间对齐，将两个无配对数据集中的图像对齐——一个包含属性（如眼镜），一个不包含。
将对象特征投影到一个学习得到的属性子空间中，以支持属性在图像间的插值、缩放和互换。
通过在解码器中替换对象特征向量而保持背景特征固定，实现对象转换。
假设潜在空间为线性空间，以在潜在空间中执行特征相加或互换等操作，实现自然外观的编辑。

实验结果

研究问题

RQ1生成模型能否在无配对图像或显式分割的情况下，仅从无配对的弱标签数据中学习解耦的属性子空间？
RQ2仅使用表示属性存在的 0/1 标签，能否通过对称的训练目标实现对象转换？
RQ3该模型能否泛化到未见过的数据，并在不同身份之间实现逼真的属性互换（如眼镜、发型、光照）？
RQ4所学习的属性子空间是否支持无伪影的有意义插值与属性操作？
RQ5与具有循环损失的 GAN 相比，该模型在重建质量与属性一致性方面表现如何？

主要发现

GeneGAN 从无配对数据中成功学习到了‘眼镜’属性子空间，实现了在无配对样本下人脸间眼镜的精确互换。
该模型在未见数据上泛化良好，如在 Wider Face 数据集上的实验表明，即使在约束较少的环境下，也能生成逼真的编辑结果。
在学习到的属性子空间内进行插值，可实现不同发型和面部属性之间的自然过渡，证实了特征的解耦性。
GeneGAN 在重建一致性与属性保真度方面优于 DiscoGAN，伪影极少，且更好地保留了身份与背景信息。
循环重建损失提升了训练稳定性，并实现了对称学习，即使源域与目标域具有不同的内在维度亦可适用。
通过特征互换实现的对象转换生成了高质量、逼真的图像，结果表现出与源属性风格的高度对齐（如发丝方向、微笑强度）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。