QUICK REVIEW

[論文レビュー] DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

Xinqi Lin, Jingwen He|arXiv (Cornell University)|Aug 29, 2023

Advanced Image Processing Techniques被引用数 47

ひとこと要約

DiffBIR は回復モジュールと凍結された Stable Diffusion の事前分布を組み合わせた2段階パイプラインを使い、一般画像と顔の両方に対して現実的で忠実なブラインド画像復元を実現します。LAControlNet と潜在画像ガイダンスを導入し、現実性と忠実度のバランスを取ります。

ABSTRACT

We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks in a unified framework. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. Each stage is developed independently but they work seamlessly in a cascaded manner. In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results. For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details. Specifically, IRControlNet is trained based on specially produced condition images without distracting noisy content for stable generation performance. Moreover, we design a region-adaptive restoration guidance that can modify the denoising process during inference without model re-training, allowing users to balance realness and fidelity through a tunable guidance scale. Extensive experiments have demonstrated DiffBIR's superiority over state-of-the-art approaches for blind image super-resolution, blind face restoration and blind image denoising tasks on both synthetic and real-world datasets. The code is available at https://github.com/XPixelGroup/DiffBIR.

研究の動機と目的

未知の劣化を伴う一般画像のブラインド画像復元を拡張する。
現実味のある再現性のために劣化除去段と拡散事前生成段を組み合わせる。
画像の忠実度と知覚的現実感のトレードオフをユーザーが調整可能にする。
LAControlNet（Injective modulation network）を活用して再学習なしに Stable Diffusion を適用適応する。
ブラインド画像超解像とブラインド顔復元の両タスクで優れた性能を示す。

提案手法

多様な劣化に対して事前学習された Restoration Module（SwinIR ベース）を用い、一般化性能を高める2段階パイプラインを採用する。
劣化・再生成の手掛かりを潜在拡散過程に注入して Stable Diffusion を並列でファインチューニングする。
潜在画像ガイダンスを導入し、拡散サンプリング時の忠実度と現実感のトレードオフを controllable にする。
現実世界の低品質画像をシミュレートするために、ぼかし、リサイズ、ノイズ、および高次劣化を含む劣化モデルを採用する。
Restoration Module のL2ピクセル損失と拡散段の潜在拡散目的を用いて訓練する。
推論時に勾配スケールパラメータで I_reg と I_diff の間の遷移を制御可能にする。

(a) Visual comparison of blind image super-resolution (BSR) methods on real-world low-quality images.

実験結果

リサーチクエスチョン

RQ1 DiffBIR は顔以外の一般的で未知の劣化に対して現実的な復元を達成できるか？
RQ2 事前学習済みの Stable Diffusion 事前分布を組み込むことで、ブラインド復元の忠実度と現実感にどのような影響があるか？
RQ3 LAControlNet ベースのファインチューニングは生成能力を維持しつつタスク固有の復元を可能にするか？
RQ4 モデルを再訓練せずにユーザーは忠実度と現実感のトレードオフを制御できるか？
RQ5 DiffBIR は BSR および BFR のベンチマークで最先端手法と比較してどの程度の性能を示すか？

主な発見

データセット	指標	DDNM	GDP	Real-ESRGAN+	BSRGAN	SwinIR-GAN	FeMaSR	DiffBIR(Ours)	備考
RealSRSet	MANIQA↑	0.4535	0.4581	0.5376	0.5640	0.5295	0.5247	0.5906	Best among listed methods
RealSRSet	NIQE↓	6.8415	5.0626	5.7401	5.6074	5.6093	5.2353	6.0738	Lower is better
Real47	MANIQA↑	0.4813	0.5237	0.5900	0.5889	0.5721	0.5718	0.6293	Best among listed methods
Real47	NIQE↓	6.4768	3.9866	3.9103	4.0338	3.9910	4.1731	3.9240	Lower is better

DiffBIR は synthetic および real データセットで現実世界の BSR・BFR の新しいベースラインを設定する。
RealSRSet と Real47 において複数のベースラインと比較して知覚品質（MANIQA）が優れている。
BFR では、合成データと実データの両方で高い忠実度と現実感を実現し、IDS および FID 指標が良好。
RM（復元モジュール）と LAControlNet による2段階設計は、単一段階手法で問題となりがちな過度の平滑化や誤ったディテールを回避する。
潜在画像ガイダンスは忠実な復元から高現実主義的テクスチャまでの調整可能なスペクトルを提供する。
アブレーション研究は、復元モジュールの必須性、Stable Diffusion のファインチューニングの必要性、LAControlNet が ControlNet より有効であることを確認する。

(b) Visual comparison of blind face restoration (BFR) methods on real-world low-quality face images.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。