QUICK REVIEW

[论文解读] Shift-Net: Image Inpainting via Deep Feature Rearrangement

Zhaoyi Yan, Xiaoming Li|arXiv (Cornell University)|Jan 29, 2018

Generative Adversarial Networks and Image Synthesis参考文献 43被引用 46

一句话总结

Shift-Net 在 U-Net 上增加了一条 shift-connection 层，用于重新排列深层编码器特征以进行修复，产生清晰纹理和合理结构。它在端到端训练中结合指导、重建和对抗损失。

ABSTRACT

Deep convolutional networks (CNNs) have exhibited their potential in image inpainting for producing plausible results. However, in most existing methods, e.g., context encoder, the missing parts are predicted by propagating the surrounding convolutional features through a fully connected layer, which intends to produce semantically plausible but blurry result. In this paper, we introduce a special shift-connection layer to the U-Net architecture, namely Shift-Net, for filling in missing regions of any shape with sharp structures and fine-detailed textures. To this end, the encoder feature of the known region is shifted to serve as an estimation of the missing parts. A guidance loss is introduced on decoder feature to minimize the distance between the decoder feature after fully connected layer and the ground-truth encoder feature of the missing parts. With such constraint, the decoder feature in missing region can be used to guide the shift of encoder feature in known region. An end-to-end learning algorithm is further developed to train the Shift-Net. Experiments on the Paris StreetView and Places datasets demonstrate the efficiency and effectiveness of our Shift-Net in producing sharper, fine-detailed, and visually plausible results. The codes and pre-trained models are available at https://github.com/Zhaoyi-Yan/Shift-Net.

研究动机与目标

激发改进修复以保留全局结构和细腻纹理。
提出 shift-connection 层，将已知区域的信息传递到缺失区域。
利用指导损失使解码器和编码器在缺失区域的特征保持对齐。
训练一个端到端模型，结合基于示例的修复和基于 CNN 的修复优点。
在 Paris StreetView 和 Places 数据集上展示效率与效果。

提出的方法

在 U-Net 中添加 shift-connection 层，以在编码器和解码器特征之间执行深层特征重新排列。
定义基于最近邻的移位操作，其中解码器缺失区域的特征通过移位后的编码器特征更新（Phi_L-l^{shift}(I)）。
引入指导损失 L_g，使缺失区域的解码器特征与真实的编码器特征保持一致。
将 L1 重建损失、L_g 和对抗损失结合起来进行端到端训练。
使用 Adam 进行训练，利用指定的权衡项 lambda_g 和 lambda_adv 来平衡损失。

实验结果

研究问题

RQ1基于移位的特征重新排列是否能在缺失区域修复方面优于纯 CNN 方法？
RQ2指导损失是否提高缺失区域编码器和解码器特征的对齐？
RQ3就纹理细节和真实感而言，Shift-Net 与最先进的基于示例和基于 CNN 的修复方法相比如何？
RQ4在网络内放置 shift 层的位置与修复性能之间存在哪些权衡？
RQ5该方法是否足够高效，能够在大规模数据集和现实世界图像上实际使用？

主要发现

方法	PSNR	SSIM	平均 L2 损失
Content-Aware Fill [1]	23.71	0.74	0.0617
context encoder [2] (l2 + adversarial loss)	24.16	0.87	0.0313
MNPS [4]	25.98	0.89	0.0258
Ours	26.51	0.90	0.0208

Shift-Net 在 Paris StreetView 和 Places 数据集上实现了比以往方法更锐利、细腻的纹理。
在 Paris StreetView 上，Shift-Net 的 PSNR 为 26.51，SSIM 为 0.90，平均 L2 损失 0.0208，优于 Content-Aware Fill、Context Encoder 和 MNPS。
Shift-Net 的速度显著快于 MNPS，处理 256×256 图像大约 80 ms，而约 40 秒。
消融研究表明，指导损失和 shift-connection 层均有助于改进结果并减少伪影。
该方法可推广到现实世界图像和任意区域的修复，包括对象移除。
基于最近邻的移位操作对于相较于随机移位连接的性能提升至关重要。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。