QUICK REVIEW

[论文解读] Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation

Bowen Li, Xiaojuan Qi|arXiv (Cornell University)|Jan 1, 2020

Image Processing Techniques and Applications被引用 33

一句话总结

引入一个具备词级判别器和词级监督的轻量级GAN，基于自然语言描述编辑图像，在显著减少参数量的同时实现强力操控。

ABSTRACT

We propose a novel lightweight generative adversarial network for efficient image manipulation using natural language descriptions. To achieve this, a new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word-level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text. Furthermore, thanks to the explicit training signal related to each word, the discriminator can also be simplified to have a lightweight structure. Compared with the state of the art, our method has a much smaller number of parameters, but still achieves a competitive manipulation performance. Extensive experimental results demonstrate that our method can better disentangle different visual attributes, then correctly map them to corresponding semantic words, and thus achieve a more accurate image modification using natural language descriptions.

研究动机与目标

在内存受限设备上激发自然语言驱动的高效图像编辑。
开发一个词级判别器，为生成器提供细粒度的词级反馈。
通过将词映射到视觉属性来促进解耦的属性操控。
在不牺牲操控质量的前提下，降低模型复杂度，与最新方法相比保持竞争力。

提出的方法

引入一个使用词区域相关性提供逐词反馈的词级判别器。
对词进行词性标注标签，将名词和形容词作为监督目标。
计算词-区域相关性 m = w^T v，然后归一化以获得类似注意力的权重 α 和 β，推导出词感知特征 n 和逐词相关 δ。
使用文本编码器、两个图像编码器（Inception-v3 和 VGG-16）、上采样与残差块以及注意力机制来训练一个轻量级生成器。
在生成器目标中结合无条件和有条件对抗损失、感知损失、词级损失以及 DAMSM 文本-图像匹配损失；判别器优化无条件/有条件对抗损失及词级监督。
利用双重图像编码器在生成不同阶段平衡语义表示（Inception-v3）和细节精炼（VGG-16）。

实验结果

研究问题

RQ1词级判别器是否能提供足够细粒度的监督，使轻量级生成器能够基于文本准确地操纵图像？
RQ2提出的词级监督是否相较现有的词级判别器在视觉属性的解耦和映射到语义词方面有改进？
RQ3在标准数据集上，轻量模型在 FID、准确率、真实度方面的性能与效率相比最新方法 ManiGAN 如何？
RQ4在不同复杂度的数据集（CUB 与 COCO）上，该方法在维持内存效率的同时是否具有鲁棒性？

主要发现

所提出的方法在 CUB (8.02) 和 COCO (12.39) 的 FID 比 ManiGAN (CUB 9.75，COCO 25.08) 更好。
在 CUB（65.94 准确率，57.82 真实感）和 COCO（77.97 准确率，67.53 真实感）上，所提出的方法高于 ManiGAN（CUB 34.06 准确率，42.18 真实感；COCO 22.03 准确率，32.47 真实感）。
轻量化模型使用的参数显著更少（NoP-G 18.5M；NoP-D 71.8M）相比 ManiGAN（NoP-G 41.1M；NoP-D 169.4M），并显示更快的 epoch 运行时间（RPE）和推理时间（IT）。
消融实验显示移除词级判别器会降低性能并破坏属性-词映射；用其他词级判别器替换会导致注意力和属性映射不够准确。
定性结果显示，与 ManiGAN 相比，属性修改更清晰、准确，且对文本无关内容的保留更好。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。