QUICK REVIEW

[论文解读] Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

Hongyu Liu, Bin Jiang|arXiv (Cornell University)|Jul 14, 2020

Generative Adversarial Networks and Image Synthesis参考文献 39被引用 28

一句话总结

该论文提出了一种带有特征等化的互 encoder-decoder 网络用于图像修复，通过利用浅层特征恢复纹理、深层特征恢复结构，联合恢复图像的结构与纹理。该方法引入了双边传播激活函数和通道重加权机制，以均衡结构分支与纹理分支之间的特征表示，显著减少了模糊和伪影，且在 Paris StreetView、Place2 和 CelebA 等基准数据集上达到了最先进性能。

ABSTRACT

Deep encoder-decoder based CNNs have advanced image inpainting methods for hole filling. While existing methods recover structures and textures step-by-step in the hole regions, they typically use two encoder-decoders for separate recovery. The CNN features of each encoder are learned to capture either missing structures or textures without considering them as a whole. The insufficient utilization of these encoder features limit the performance of recovering both structures and textures. In this paper, we propose a mutual encoder-decoder CNN for joint recovery of both. We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively. The deep layer features are sent to a structure branch and the shallow layer features are sent to a texture branch. In each branch, we fill holes in multiple scales of the CNN features. The filled CNN features from both branches are concatenated and then equalized. During feature equalization, we reweigh channel attentions first and propose a bilateral propagation activation function to enable spatial equalization. To this end, the filled CNN features of structure and texture mutually benefit each other to represent image content at all feature levels. We use the equalized feature to supplement decoder features for output image generation through skip connections. Experiments on the benchmark datasets show the proposed method is effective to recover structures and textures and performs favorably against state-of-the-art approaches.

研究动机与目标

解决深度图像修复方法中恢复的结构与纹理之间不一致的问题。
通过联合建模结构与纹理特征而非顺序或独立处理，提升视觉质量。
减少由于 CNN 特征空间中特征错位导致的孔洞区域模糊与伪影。
通过一种新颖的特征等化机制，增强孔洞边界及孔洞内部的特征一致性。

提出的方法

该方法采用共享编码器并设置独立分支：浅层用于纹理特征，深层用于结构特征。
在纹理分支与结构分支中均采用多尺度孔洞填充，通过三个部分卷积流实现，其卷积核大小逐层递增。
将两个分支的特征拼接后，通过自注意力机制进行通道重加权，以对齐分支间的注意力图。
提出一种双边传播激活（BPA）函数，以强制实现空间一致性：通过全局传播保证边界一致性，通过局部操作保持相似性。
将等化后的特征进行融合，并通过跳跃连接送入解码器，以在所有特征层级提升重建质量。
网络采用端到端训练，结合感知损失与对抗损失，以增强真实感与结构一致性。

实验结果

研究问题

RQ1在 CNN 特征空间中联合建模结构与纹理特征是否能提升图像修复质量？
RQ2结构分支与纹理分支之间的特征等化机制如何影响视觉一致性与伪影减少？
RQ3所提出的双边传播激活函数是否在保持局部与全局特征一致性方面优于非局部注意力机制？
RQ4独立的结构分支与纹理分支在最终修复性能中分别起到多大贡献？

主要发现

在 Paris StreetView 数据集上，所提方法的 Fréchet Inception Distance (FID) 达到 25.10，显著优于无等化机制的基线方法（29.11）以及 SOTA 方法 CSA（29.8%）。
在 CelebA 数据集的人工主观评估中，所提方法获得 56.4% 的投票认为其结果最逼真，超过 CSA（29.6%）与 GC（5.3%）。
消融实验表明，若移除纹理分支则会损失细节，若移除结构分支则会导致结构元素缺失，验证了两个分支的必要性。
特征等化机制，尤其是双边传播激活函数，显著减少了可见伪影与模糊，如在 Paris StreetView 与 Place2 上的定性对比所示。
在 Place2 数据集上，该方法取得 FID 21.26，优于无等化机制的基线（29.11），证明了特征等化机制的有效性。
非局部聚合与特征等化的结合进一步提升了性能，FID 从 24.07 降低至 21.26，表明二者具有互补优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。