QUICK REVIEW

[論文レビュー] ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting

Zongsheng Yue, Jianyi Wang|arXiv (Cornell University)|Jul 23, 2023

Advanced Image Processing Techniques被引用数 57

ひとこと要約

ResShiftは、残差をシフトすることでHRとLRを行き来する拡散ベースのSRモデルを導入し、15回のサンプリングステップのみで競争力のある結果を達成し、ポストアクセラレーションなしで効果を発揮します。

ABSTRACT

Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed due to the requirements of hundreds or even thousands of sampling steps. Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration. Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual between them, substantially improving the transition efficiency. Additionally, an elaborate noise schedule is developed to flexibly control the shifting speed and the noise strength during the diffusion process. Extensive experiments demonstrate that the proposed method obtains superior or at least comparable performance to current state-of-the-art methods on both synthetic and real-world datasets, even only with 15 sampling steps. Our code and model are available at https://github.com/zsyOAOA/ResShift.

研究の動機と目的

品質を損なうことなく推論速度を改善した Diffusion-based SRを動機づける。
残差をシフトすることでLR画像からHR画像を回復する拡散過程を設計する。
拡散中のシフト速度とノイズ強度を制御する柔軟なノイズスケジュールを開発する。
計算負荷を削減する潜在空間での訓練と推論を可能にする。

提案手法

残差 e0 = y0 - x0を順次シフトさせることにより、HRとLR画像の間のマルコフ連鎖を構築する。
driftが残差に比例し、調整可能なノイズ項を伴う遷移 q(x_t|x_{t-1}, y0) を定義する（Eq. 1）。
解析的に扱える周辺分布 q(x_t|x0,y0)（Eq. 2）と扱いやすい逆過程 p_theta(x_{t-1}|x_t,y0)（Eq. 4）を導出する。
逆平均を x0を予測するネットワーク f_theta でパラメータ化し（Eq. 7）、重み付きKL目的関数をデノイズ風の損失（Eq. 8）に簡略化して学習する。
オプションとして、生画像ではなく潜在コード上で動作するVQGANによる潜在空間での訓練を行う。

実験結果

リサーチクエスチョン

RQ1Can a diffusion model tailored to LR-to-HR restoration reduce inference steps while preserving SR fidelity and realism?
RQ2Does shifting the residual between HR and LR provide a more efficient diffusion process than Gaussian-noise-based diffusion for SR?
RQ3How does a flexible noise schedule affect the fidelity-realism trade-off in SR results?
RQ4What is the performance and efficiency of ResShift on synthetic and real-world SR benchmarks compared to state-of-the-art methods?

主な発見

手法	PSNR ↑	SSIM ↑	LPIPS ↓	CLIPIQA ↑	MUSIQ ↑
ESRGAN	20.67	0.448	0.485	0.451	43.615
RealSR-JPEG	23.11	0.591	0.326	0.537	46.981
BSRGAN	24.42	0.659	0.259	0.581	54.697
SwinIR	23.99	0.667	0.238	0.564	53.790
RealESRGAN	24.04	0.665	0.254	0.523	52.538
DASR	24.75	0.675	0.250	0.536	48.337
LDM-15	24.89	0.670	0.269	0.512	46.419
ResShift	25.01	0.677	0.231	0.592	53.660

ResShift achieves competitive or superior PSNR/SSIM and better perceptual realism (LPIPS, CLIPIQA) with as few as 15 sampling steps.
The proposed residual-shifting diffusion kernel enables a shorter Markov chain than conventional diffusion SR methods, boosting inference efficiency.
A flexible noise schedule (kappa and eta_t) provides a fidelity-realism trade-off and can emulate diffusion dynamics similar to latent diffusion models under certain settings.
In experiments on ImageNet-Test, ResShift surpasses several baselines in PSNR and LPIPS while maintaining strong CLIPIQA and MUSIQ scores on real-world datasets.
Latent-space implementation via VQGAN further reduces training-time overhead without changing the core diffusion formulation.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。