QUICK REVIEW

[論文レビュー] Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

Zili Yi, Qiang Tang|arXiv (Cornell University)|May 19, 2020

Advanced Image Processing Techniques参考文献 33被引用数 30

ひとこと要約

本論文は Contextual Residual Aggregation (CRA) を導入し、低解像度の埋めを予測し、文脈パッチからの高周波的残差を集約することで超高解像度の画像修復を実現し、軽量で高速なモデルで8Kの修復を達成する。

ABSTRACT

Recently data-driven image inpainting methods have made inspiring progress, impacting fundamental image editing tasks such as object removal and damaged image repairing. These methods are more effective than classic approaches, however, due to memory limitations they can only handle low-resolution inputs, typically smaller than 1K. Meanwhile, the resolution of photos captured with mobile devices increases up to 8K. Naive up-sampling of the low-resolution inpainted result can merely yield a large yet blurry result. Whereas, adding a high-frequency residual image onto the large blurry image can generate a sharp result, rich in details and textures. Motivated by this, we propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches, thus only requiring a low-resolution prediction from the network. Since convolutional layers of the neural network only need to operate on low-resolution inputs and outputs, the cost of memory and computing power is thus well suppressed. Moreover, the need for high-resolution training datasets is alleviated. In our experiments, we train the proposed model on small images with resolutions 512x512 and perform inference on high-resolution images, achieving compelling inpainting quality. Our model can inpaint images as large as 8K with considerable hole sizes, which is intractable with previous learning-based approaches. We further elaborate on the light-weight design of the network architecture, achieving real-time performance on 2K images on a GTX 1080 Ti GPU. Codes are available at: Atlas200dk/sample-imageinpainting-HiFill.

研究の動機と目的

メモリ制約の下で、超高解像度画像（最大8K）への修復を動機づける。
欠損領域の鋭い高周波残差を生成できる低解像度予測パイプラインを開発する。
効率的なゲート付き畳み込みとマルチスケールアテンション転送を備えた軽量ネットワークアーキテクチャを提案する。
周囲領域から高周波のディテールを転送するために文脈ベースの残差集約を活用する。
低解像データでの学習が高解像度推論へ一般化でき、品質を維持できることを示す。

提案手法

ダウンサンプルされた入力から低解像度の修復結果を予測し、それを高解像度化して大きなブラー画像へアップサンプルする。
大きなブラー画像を元画像から差し引いて高周波残差を計算し、文脈的アテンションスコアを用いて残差を集約する。
Attention Computing Module (ACM) を用いて穴内パッチと穴外パッチのパッチごとのコサイン類似度を計算する。
Attention Transfer Module (ATM) を用いて共有アテンションスコアを用いた重み付き平均で多層の特徴レベルで穴内パッチを埋める。
文脈パッチ全体で残差を集約して、アップサンプリングされたブラーに加えると鋭い穴内結果を生成する。
Light Weight Gated Convolutions (LWGC) を採用して細身で高速なジェネレーターを構築し、粗い段階と refine 段階の二段階ネットワークとする。
WGAN-GP adversarial loss と reconstruction loss を用いて学習し、実世界の穴を模擬する乱雑なマスクを使用する。

実験結果

リサーチクエスチョン

RQ1低解像度の予測と残差集約で、メモリと計算要件を削減しつつ超高解像度の修復を達成できるか？
RQ22K–8K 解像度で、文脈ベースの残差集約は従来のアテンションベースおよびパッチベースの修復法と品質と速度の点でどう比較されるか？
RQ3大きな穴の修復において、どのアーキテクチャ選択（LWGC、マルチスケールアテンション転送）が性能と効率を最も改善するか？
RQ4512×512 データでの学習が、はるかに大きな画像で高品質な修復を行うのに十分か？

主な発見

Image Size	L1	MS-SSIM	FID	IS	Time (ms)
512×512	5.439	0.8840	4.898	17.72	25
1024×1024	5.439	0.8840	4.899	17.72	31
2048×2048	5.492	0.8840	4.893	17.85	37
4096×4096	5.503	0.8840	4.895	17.81	87.3

CRA は、制限されたメモリと計算下で、最大25%の大きな穴を伴う8Kまでの修復を可能にする。
提案モデルは GTX 1080 Ti で 2K 画像に対して実時性能を達成。
Places2 で強力な定量結果を示し、最小の L1、試験サイズ全体で競争力の MS-SSIM と FID。
共有アテンションスコアとマルチスケールアテンション転送を備えた CRA は、品質を維持しつつパラメータと計算を削減する。
LWGC バリアント（LWGC sc および LWGC pw）は、品質の低下を最小限に抑えつつ大幅な効率向上を提供する。
他の学習ベース手法と比較して、CRA はより速く、特に高解像度入力で視覚品質がより良いまたは同等である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。