QUICK REVIEW

[論文レビュー] Exploiting Diffusion Prior for Real-World Image Super-Resolution

Jianyi Wang, Zongsheng Yue|arXiv (Cornell University)|May 11, 2023

Advanced Image Processing Techniques被引用数 10

ひとこと要約

この論文は、事前訓練済みのテキスト-画像拡散モデルを固定事前として用い、軽量な時刻対応エンコーダをファインチューニングし、 controllable feature wrapping module と progressive aggregation sampling を用いて、拡散モデルの再訓練なしに現実世界のブラインド超解像を実現する。

ABSTRACT

We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution (SR). Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at https://github.com/IceClear/StableSR.

研究の動機と目的

再訓練せずに、事前訓練済みの拡散モデルの生成 prior を保持するブラインドSR手法を動機づけ、開発する。
低負荷の時刻対応エンコーダを提案し、低解像度（LR）入力を用いて凍結された拡散モデルを条件付ける。
拡散再構成時の忠実度と現実感のバランスを取るための controllable feature wrapping モジュールを導入する。
任意に大きい出力を扱い、タイル由来のアーチファクトを回避するための progressive aggregation sampling 戦略を開発する。
最先端手法と比較して、合成データと現実世界のSRベンチマークで優れた性能を示す。

提案手法

凍結された Stable Diffusion モデルに結合した軽量な時刻対応エンコーダをファインチューニングし、マルチスケール特徴変調（SFT）を介して SR を条件付ける。
拡散ステップ全体で条件強度が適応するよう、時刻対応ガイダンスを組み込み、初期反復でより強いガイダンスを可能にする。
エンコーダ・デコーダ特徴を統合する controllable feature wrapping (CFW) モジュールを追加し、忠実度と現実感をトレードオフする可調整重み w を用いる。
拡散出力の色ずれを抑制するため、ピクセル領域およびウェーブレットベースのバリアントを用いたカラー補正を適用する。
拡張パッチを重ね合わせ、拡散反復中にガウス加重融合を用いる progressive aggregation sampling 戦略を用いて、任意解像度を処理する。

Figure 1 : Qualitative comparisons of BSRGAN (Zhang et al., 2021b ) , Real-ESRGAN+ (Wang et al., 2021c ) , FeMaSR (Chen et al., 2022 ) , LDM (Rombach et al., 2022 ) , and our StableSR on real-world examples. ( Zoom in for details )

実験結果

リサーチクエスチョン

RQ1事前訓練済みの拡散モデルを再訓練せずに、現実世界のブラインドSRに活用するにはどうすればよいか？
RQ2生成 priors を保持しつつ凍結された diffusion priors を LR 画像に条件付けるのに必要な軽量コンポーネントは何か？
RQ3拡散ベースの SR において、忠実度と現実感のトレードオフを制御可能に管理できるか？
RQ4境界アーティファクトなしに拡散ベースの SR で任意の画像解像度を達成できるか？
RQ5拡散 priors ベースの SR 手法は、合成データおよび現実世界のベンチマークにおいて、既存の現実世界 SR のベースラインを上回るか？

主な発見

データセット	PSNR	SSIM	LPIPS	FID	CLIP-IQA	MUSIQ
DIV2K Valid	24.62	0.5970	0.5276	49.49	0.3534	28.57
RealSR	27.30	0.7579	0.3570		0.3687	38.26
DRealSR	30.19	0.8148	0.3938		0.3744	26.93
DPED-iphone	-	-	-	-	0.4496	45.60

StableSR は、FID、CLIP-IQA、MUSIQ などの知覚指標で、合成データと現実世界のベンチマークにおいて最先端の SR 手法を上回る。
時刻対応ガイダンスは、推論中に拡散条件付け強度を適応的に調整することで忠実度とシャープさを向上させる。
controllable feature wrapping は、高忠実度の構造と現実的な質感の間に調整可能なバランスを提供し、実用的な忠実度-現実感のトレードオフを達成する（最適は約 w=0.5）。
Progressive aggregation sampling は、512x512 を超える解像度の安定した SR を、タイルベースの境界アーティファクトなしで可能にする。
カラー補正（ピクセル領域およびウェーブレットベース）は、色ずれを減少させ、視覚品質を向上させる。

Figure 2 : Framework of StableSR. We first finetune the time-aware encoder that is attached to a fixed pre-trained Stable Diffusion model. Features are combined with trainable spatial feature transform (SFT) layers. Such a simple yet effective design is capable of leveraging rich diffusion prior for

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。