QUICK REVIEW

[论文解读] Self-Attention Generative Adversarial Networks

Han Zhang, Ian Goodfellow|arXiv (Cornell University)|May 21, 2018

Generative Adversarial Networks and Image Synthesis被引用 2,202

一句话总结

将自注意力引入GAN，以建模图像中的长程依赖，使用谱归一化和 TTUR 稳定训练，并在 ImageNet 类条件生成上达到最先进水平。

ABSTRACT

In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.

研究动机与目标

受到卷积式 GAN 在捕捉复杂图像中的长程依赖和全局结构方面的局限性启发。
提出一种自注意力机制，使生成器和判别器中的特征图能够实现全局交互。
在生成器和判别器上对谱归一化进行稳定化处理，并采用两时间尺度更新规则（TTUR）。
在 ImageNet 上评估 SAGAN，以证明相对于此前的 GANs 生成质量和分布相似性方面的改进。

提出的方法

引入自注意力模块，计算跨所有空间位置的特征加权和，以建模长程依赖。
对 f、g、h 空间应用 1x1 卷积以生成注意力空间，并通过 s_ij = f(x_i)^T g(x_j) 计算注意力权重，输出 y_i = gamma o_i + x_i 其中 o_i 汇聚被关注的特征。
在生成器和判别器中放置注意力模块，以在生成的图像中强化全局一致性以及在真伪判定中的全局一致性。
对生成器和判别器均使用谱归一化，以稳定 Lipschitz 常数和训练动态。
在使用正则化判别器时，采用不平衡学习率的 TTUR，以改善收敛性。
使用对比性对抗损失和生成器中的条件批量归一化，以及判别器中的投影作为条件化机制。

实验结果

研究问题

RQ1将自注意力集成到 GAN 架构中是否能提高对长程依赖和全局图像结构的建模能力？
RQ2在判别器之外对生成器应用谱归一化，如何影响 GAN 的训练稳定性和样本质量？
RQ3在使用正则化判别器时，TTUR 对 GAN 训练是否有益？

主要发现

自注意力提升了图像合成质量，SAGAN 相较基线具有更高的 Inception Score 和更低的 FID。
在中高层特征图（如 32x32 和 64x64）处的注意力连接比在最低分辨率的注意力连接表现更好。
自注意力块在建模长程依赖方面优于等效的残差块，尤其对于复杂的几何结构。
在 ImageNet 上，结合自注意力和稳定化技术的 SAGAN 达到了 Inception Score 52.52 和 FID 18.65，超越了以往的工作。
可视化显示注意力集中在语义上连贯的对象部分，而不仅仅是空间邻近关系，从而能够建模如狗的腿等结构。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。