QUICK REVIEW

[論文レビュー] Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild

Hshmat Sahak, Daniel Watson|arXiv (Cornell University)|Feb 15, 2023

Advanced Image Processing Techniques被引用数 23

ひとこと要約

SR3+ を導入、盲信号（out-of-distribution）画像超解像の拡散ベースモデル。高次劣化とノイズ条件付けデータ拡張を組み合わせ、巨大データセットで訓練、RealSR/DRealSRでゼロショット評価において最先端の結果を達成。

ABSTRACT

Diffusion models have shown promising results on single-image super-resolution and other image- to-image translation tasks. Despite this success, they have not outperformed state-of-the-art GAN models on the more challenging blind super-resolution task, where the input images are out of distribution, with unknown degradations. This paper introduces SR3+, a diffusion-based model for blind super-resolution, establishing a new state-of-the-art. To this end, we advocate self-supervised training with a combination of composite, parameterized degradations for self-supervised training, and noise-conditioing augmentation during training and testing. With these innovations, a large-scale convolutional architecture, and large-scale datasets, SR3+ greatly outperforms SR3. It outperforms Real-ESRGAN when trained on the same data, with a DRealSR FID score of 36.82 vs. 37.22, which further improves to FID of 32.37 with larger models, and further still with larger training sets.

研究の動機と目的

未知の劣化がある野外設定で堅牢な盲の単一画像超解像を動機づける。
外部分布外の入力でも効果的な拡散ベースモデルを開発する。
自己教師付き訓練を組み合わせた合成劣化とノイズ条件付け拡張を活用して一般化を改善する。
モデルサイズと訓練データを増やすと顕著な利得を得ることを示す。

提案手法

条件付き超解像のために畳み込みUNetベースの拡散モデル（SR3+-style）を使用。
訓練時に高次元、パラメトリック劣化を適用して実世界の汚損を模擬。
ノイズ条件付け拡張を組み込み、頑健性を向上させ、テスト時の条件付けを可能に。
大規模データセット（DF2K+OST まで、そして最大61M画像）で訓練して性能を拡張。
RealSRとDRealSRでのゼロショット評価をFID, PSNR, SSIM 指標で実施。

Figure 2 : The SR3+ data pipeline applies a sequence of degradations to HR training images (like Real-ESRGAN but without additive noise). To form the conditioning signal for the neural denoiser, we up-sample the LR image and applied noise conditioning augmentation.

実験結果

リサーチクエスチョン

RQ1拡散ベースモデルは Real-World の劣化下で盲超解像において最先端の性能を達成できるか？
RQ2高次劣化とノイズ条件付け拡張は out-of-distribution 入力への頑健性を相乗的に改善するか？
RQ3SR3+ の RealSR/DRealSR ベンチマークでモデルサイズと訓練データのスケーリングが性能にどう影響するか？
RQ4盲SR における知覚忠実度（FID）と参照ベースの指標（PSNR/SSIM）とのトレードオフは？
RQ5SR3+ はゼロショット設定でデータセットや画像内容が異なる場合に頑健か（例：テキスト多めの画像）？

主な発見

モデル	FID(10k) RealSR ↓	FID(10k) DRealSR ↓	PSNR RealSR ↑	PSNR DRealSR ↑	SSIM RealSR ↑	SSIM DRealSR ↑
Real-ESRGAN	34.21	37.22	25.14	25.85	0.7279	0.7808
SR3+ (40M, DF2K + OST)	31.97	?	24.84	25.18	0.6827	0.7201
SR3+ (400M, DF2K + OST)	27.34	?	23.84	24.36	0.662	0.719
SR3+ (400M, 61M Dataset)	24.32	32.37	24.89	25.74	0.6922	0.7547

40Mパラメータの SR3+ は RealSR および DRealSR に対して Real-ESRGAN と同等レベルの FID(10k) を達成。
同じデータで訓練したより大きな SR3+ モデル（400M）は FID を改善し Real-ESRGAN を RealSR で上回り、DRealSR での差を縮める。
高次劣化とノイズ条件付け拡張を組み合わせて訓練すると FID が著しく改善（片方の要素を除去すると FID が >10 ポイント悪化）。
はるかに大規模な 61M 画像データセットを使用すると FID がさらに 32.37 まで改善、400M パラメータで、質感はより現実的で一貫性が増す。
テスト時のノイズ条件付け拡張 t_eval ≈ 0.1 は質感を改善し、合理的な整合性を維持するが、t_eval を高くしすぎると整合性を崩し幻影が生じるリスク。
SR3+ は一般に Real-ESRGAN よりシャープで現実的な質感を得られるが、PSNR/SSIMで評価すると高周波のテキスト表現で劣る可能性がある；これらの指標はマルチモーダル出力の妥当な高周波ディテールをペナルティすることがある。

Figure 3 : Sample comparison between Real-ESRGAN and various SR3+ models (ours). We observe that Real-ESRGAN often suffers from oversmoothing and excessive contrast, while SR3+ is capable of generating high-fidelity, realistic textures.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。