QUICK REVIEW

[论文解读] DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis

Minfeng Zhu, Pingbo Pan|arXiv (Cornell University)|Apr 2, 2019

Generative Adversarial Networks and Image Synthesis参考文献 33被引用 45

一句话总结

DM-GAN 引入一个带有内存写入门和响应门的动态内存模块，以将初始低质量图像细化为高分辨率、文本条件图像，在 CUB 与 COCO 数据集上优于先前的方法。

ABSTRACT

In this paper, we focus on generating realistic images from text descriptions. Current methods first generate an initial image with rough shape and color, and then refine the initial image to a high-resolution one. Most existing text-to-image synthesis methods have two main problems. (1) These methods depend heavily on the quality of the initial images. If the initial image is not well initialized, the following processes can hardly refine the image to a satisfactory quality. (2) Each word contributes a different level of importance when depicting different image contents, however, unchanged text representation is used in existing image refinement processes. In this paper, we propose the Dynamic Memory Generative Adversarial Network (DM-GAN) to generate high-quality images. The proposed method introduces a dynamic memory module to refine fuzzy image contents, when the initial images are not well generated. A memory writing gate is designed to select the important text information based on the initial image content, which enables our method to accurately generate images from the text description. We also utilize a response gate to adaptively fuse the information read from the memories and the image features. We evaluate the DM-GAN model on the Caltech-UCSD Birds 200 dataset and the Microsoft Common Objects in Context dataset. Experimental results demonstrate that our DM-GAN model performs favorably against the state-of-the-art approaches.

研究动机与目标

解决多阶段文本到图像合成中对初始图像质量的依赖。
在细化阶段处理字幕中单词重要性的不同程度。
提出一个动态内存模块，将相关文本写入内存并读取以细化图像。

提出的方法

使用标准生成器根据文本生成初始图像。
应用带键值内存的动态内存细化阶段：通过内存写入门将词语写入内存。
使用键寻址和值读取从内存中读取以获得内存输出。
通过响应门将内存输出与图像特征融合，实现自适应细化。
使用对抗损失、条件增强损失和 DAMSM 损失进行训练。
将 64x64 的初始图像放大到 128x128 和 256x256，由于内存约束，细化迭代次数有限。

实验结果

研究问题

RQ1当初始图像质量较低时，动态内存如何提高文本条件图像细化的保真度？
RQ2在细化过程中是否可以自适应选择单词级重要性，以更好地匹配文本描述？
RQ3整合基于内存的细化阶段是否可以提升文本到图像合成的标准评估指标？
RQ4内存写入门和响应门对最终图像质量的影响是什么？

主要发现

DM-GAN 在 CUB 和 COCO 数据集上取得比现有方法更高的 Inception Score（CUB：4.75 对 4.36；COCO：30.49 对 25.89）。
相较 AttnGAN，DM-GAN 将 Fréchet Inception Distance (FID) 降低（CUB：16.09 对 23.98；COCO：32.64 对 35.49）。
DM-GAN 提升了相对于 AttnGAN 的 R-precision（CUB：72.31 对 67.82；COCO：88.56 对 85.47）。
消融研究表明动态内存、内存写入门和响应门各自对性能提升有贡献。
定性结果显示更生动的细节和更好的全局结构，尤其是在多主体描述时。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。