QUICK REVIEW

[論文レビュー] Shift-Net: Image Inpainting via Deep Feature Rearrangement

Zhaoyi Yan, Xiaoming Li|arXiv (Cornell University)|Jan 29, 2018

Generative Adversarial Networks and Image Synthesis参考文献 43被引用数 46

ひとこと要約

Shift-NetはU-Netにシフト接続層を追加して深層エンコーダ特徴を欠損領域の補完のために再配置し、鋭いテクスチャと妥当な構造を生み出します。ガイダンス、再構成、および敵対的損失を用いてエンドツーエンドで訓練します。

ABSTRACT

Deep convolutional networks (CNNs) have exhibited their potential in image inpainting for producing plausible results. However, in most existing methods, e.g., context encoder, the missing parts are predicted by propagating the surrounding convolutional features through a fully connected layer, which intends to produce semantically plausible but blurry result. In this paper, we introduce a special shift-connection layer to the U-Net architecture, namely Shift-Net, for filling in missing regions of any shape with sharp structures and fine-detailed textures. To this end, the encoder feature of the known region is shifted to serve as an estimation of the missing parts. A guidance loss is introduced on decoder feature to minimize the distance between the decoder feature after fully connected layer and the ground-truth encoder feature of the missing parts. With such constraint, the decoder feature in missing region can be used to guide the shift of encoder feature in known region. An end-to-end learning algorithm is further developed to train the Shift-Net. Experiments on the Paris StreetView and Places datasets demonstrate the efficiency and effectiveness of our Shift-Net in producing sharper, fine-detailed, and visually plausible results. The codes and pre-trained models are available at https://github.com/Zhaoyi-Yan/Shift-Net.

研究の動機と目的

グローバルな構造と繊細なテクスチャを保つための欠損領域の補完の改善を動機づける。
既知領域から欠落領域へ情報を伝えるシフト接続層の提案。
欠損領域のデコーダ特徴とエンコーダ特徴を一致させるためのガイダンス損失を活用。
事例ベースとCNNベースの欠損補完の利点を組み合わせたエンドツーエンドモデルを訓練。
Paris StreetViewとPlacesデータセットでの効率性と有効性を実証。

提案手法

エンコーダとデコーダの特徴間で深部特徴の再配置を行うために、U-Netへシフト接続層を追加。
デコーダの欠損領域特徴を、シフトされたエンコーダ特徴（Φ_L-l^{shift}(I)）を介して更新する最近傍ベースのシフト演算を定義。
欠損領域のデコーダ特徴を真のエンコーダ特徴に一致させるよう制約するガイダンス損失L_gを導入。
L1再構成損失、L_g、および敵対的損失を組み合わせてエンドツーエンド訓練。
Adamで訓練し、損失をバランスさせるためのラムダ_gおよびラムダ_advを指定したトレードオフを使用。

実験結果

リサーチクエスチョン

RQ1シフトベースの特徴再配置は、純粋なCNNベース手法よりも欠損領域の復元を改善できるか？
RQ2ガイダンス損失は欠損領域におけるエンコーダ特徴とデコーダ特徴の整合を改善するか？
RQ3Shift-Netはテクスチャの細部とリアリズムの点で、最先端の事例ベースおよびCNNベースの欠損補完法とどう比較されるか？
RQ4ネットワーク内のシフト層の配置と欠損補完性能とのトレードオフはどうなるか？
RQ5大規模データセットや実世界の画像で実用に十分な効率性があるか？

主な発見

方法	PSNR	SSIM	平均L2損失
Content-Aware Fill [1]	23.71	0.74	0.0617
context encoder [2] (l2 + adversarial loss)	24.16	0.87	0.0313
MNPS [4]	25.98	0.89	0.0258
Ours	26.51	0.90	0.0208

Shift-NetはParis StreetViewとPlacesデータセットで先行法よりも鋭く繊細なテクスチャを実現。
Paris StreetViewでは、Shift-NetはPSNR 26.51、SSIM 0.90、平均L2損失0.0208を達成し、Content-Aware Fill、Context Encoder、MNPSを上回る。
Shift-NetはMNPSよりはるかに高速で、256×256画像を約80msで処理し、約40秒に対して高速。
アブレーション実験では、ガイダンス損失とシフト接続層の両方が結果の改善とアーチファクト低減に寄与。
本手法は実世界の画像および任意の領域欠損補完（物体削除を含む）へ一般化。
最近傍ベースのシフト演算は、ランダムなシフト接続と比較した性能向上に不可欠。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。