QUICK REVIEW

[论文解读] Generative Image Inpainting with Contextual Attention

Jiahui Yu, Zhe Lin|arXiv (Cornell University)|Jan 24, 2018

Generative Adversarial Networks and Image Synthesis参考文献 37被引用 214

一句话总结

引入一个两阶段生成式修补网络，配备新颖的上下文注意力层，显式借用远处背景补丁来填充缺失区域，在人脸、纹理和自然图像等方面实现更高质量的结果。

ABSTRACT

Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted structures or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness of convolutional neural networks in explicitly borrowing or copying information from distant spatial locations. On the other hand, traditional texture and patch synthesis approaches are particularly suitable when it needs to borrow textures from the surrounding regions. Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions. The model is a feed-forward, fully convolutional neural network which can process images with multiple holes at arbitrary locations and with variable sizes during the test time. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that our proposed approach generates higher-quality inpainting results than existing ones. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting.

研究动机与目标

激发在图像修补中对更好远程上下文建模的需求。
提出一个具有上下文注意力模块的统一前向网络，以从远离区域借用信息。
通过改进的损失函数和两阶段的粗到细架构提高训练稳定性和速度。
展示在包括 CelebA、CelebA-HQ、DTD、ImageNet 和 Places2 在内的多样数据集上的适用性。

提出的方法

提出一个两阶段的粗到细网络，粗略阶段重建缺失内容，细化阶段改善结果。
引入一种新颖的上下文注意力层，使用余弦相似度、softmax 加权和反卷积将前景补丁与背景补丁匹配以重建补丁。
使用两种 Wasserstein GAN 损失（全局和局部）结合重构损失来稳定训练并同时保证全局与局部保真。
实现空间折扣重构损失，减少对孔洞中心处的过度惩罚，促进学习。
采用对上下文注意力机制的内存高效策略，包括补丁采样/步长和输入的可选下采样。
端到端训练，结合重构损失与双 GAN 目标函数，实现更快的收敛和更好的视觉质量。

实验结果

研究问题

RQ1上下文注意力机制是否能够显式借用远处背景补丁来提升修补质量？
RQ2带全局与局部对抗监督的两阶段粗到细生成框架是否优于先前的修补模型？
RQ3空间折扣重构损失和基于注意力的融合如何影响训练稳定性和最终图像保真度？
RQ4所提出的方法是否在多样的数据域（如人脸、纹理和自然场景）中有效？

主要发现

方法	ell1 损失	ell2 损失	PSNR	TV 损失
PatchMatch [3]	16.1%	3.9%	16.62	25.0%
Baseline model	9.4%	2.4%	18.15	25.7%
Our method	8.6%	2.1%	18.91	25.3%

带有上下文注意力的完整模型在多个数据集上比基线模型生成更真实的修补，瑕疵更少。
注意力图可视化了哪些背景补丁对填充每个前景像素最相关，表明上下文借用取得成功。
Places2 上的定量结果显示：PatchMatch 16.1% ell1，3.9% ell2，PSNR 16.62，TV 25.0%；Baseline 9.4% ell1，2.4% ell2，PSNR 18.15，TV 25.7%；Our method 8.6% ell1，2.1% ell2，PSNR 18.91，TV 25.3%。
所提出的两阶段网络和上下文注意力实现了更快的训练并减少了对后处理（如图像混合）的需求。
模型在 CelebA、CelebA-HQ、DTD、ImageNet 和 Places2 数据集上实现了良好的泛化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。