[논문 리뷰] DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
DiffBIR 은 복구 모듈과 frozen Stable Diffusion prior 를 결합한 2단계 파이프라인을 사용하여 일반 이미지와 얼굴 이미지 모두에 대해 현실적이고 충실한 블라인드 이미지 복원을 가능하게 한다. LAControlNet 과 latent image guidance 를 도입해 현실감과 충실도 간의 균형을 맞춘다.
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks in a unified framework. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. Each stage is developed independently but they work seamlessly in a cascaded manner. In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results. For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details. Specifically, IRControlNet is trained based on specially produced condition images without distracting noisy content for stable generation performance. Moreover, we design a region-adaptive restoration guidance that can modify the denoising process during inference without model re-training, allowing users to balance realness and fidelity through a tunable guidance scale. Extensive experiments have demonstrated DiffBIR's superiority over state-of-the-art approaches for blind image super-resolution, blind face restoration and blind image denoising tasks on both synthetic and real-world datasets. The code is available at https://github.com/XPixelGroup/DiffBIR.
연구 동기 및 목표
- Extend blind image restoration to general images with unknown degradations.
- Combine a degradation-removal stage with a diffusion-prior generation stage for realism.
- Enable user-controlled trade-off between image fidelity and perceptual realism.
- Leverage an injective modulation network (LAControlNet) to adapt Stable Diffusion without retraining.
- Demonstrate superior performance on both blind image super-resolution and blind face restoration tasks.
제안 방법
- 두 단계 파이프라인을 사용한다: 먼저 다양한 열화에 걸쳐 SwinIR 기반의 Restoration Module 을 예비 학습시켜 일반화한다.
- Finetune a parallel LAControlNet on Stable Diffusion by injecting the degraded-regenerated cue into the latent diffusion process.
- Introduce latent image guidance to enable a controllable fidelity-realness trade-off during diffusion sampling.
- Employ a degradation model that includes blur, resize, noise, and high-order degradations to simulate real-world LQ images.
- Train with L2 pixel loss for the restoration module and a latent diffusion objective for the diffusion stage.
- Allow inference-time control via a gradient-scale parameter to transition between I_reg and I_diff.

실험 결과
연구 질문
- RQ1 Can DiffBIR achieve realistic restoration for general, unknown degradations beyond faces?
- RQ2 How does integrating a pre-trained Stable Diffusion prior affect fidelity and realism in blind restoration?
- RQ3 Does the LAControlNet-based finetuning preserve generative capabilities while enabling task-specific restoration?
- RQ4 Can users control the fidelity-realness trade-off without retraining the model?
- RQ5 How does DiffBIR perform on both BSR and BFR benchmarks compared to state-of-the-art methods?
주요 결과
| 데이터셋 | 지표 | DDNM | GDP | Real-ESRGAN+ | BSRGAN | SwinIR-GAN | FeMaSR | DiffBIR(Ours) | 비고 |
|---|---|---|---|---|---|---|---|---|---|
| RealSRSet | MANIQA↑ | 0.4535 | 0.4581 | 0.5376 | 0.5640 | 0.5295 | 0.5247 | 0.5906 | Best among listed methods |
| RealSRSet | NIQE↓ | 6.8415 | 5.0626 | 5.7401 | 5.6074 | 5.6093 | 5.2353 | 6.0738 | Lower is better |
| Real47 | MANIQA↑ | 0.4813 | 0.5237 | 0.5900 | 0.5889 | 0.5721 | 0.5718 | 0.6293 | Best among listed methods |
| Real47 | NIQE↓ | 6.4768 | 3.9866 | 3.9103 | 4.0338 | 3.9910 | 4.1731 | 3.9240 | Lower is better |
- DiffBIR sets new baselines in real-world BSR and BFR across synthetic and real datasets.
- It achieves superior perceptual quality (MANIQA) on RealSRSet and Real47 compared with multiple baselines.
- For BFR, it delivers strong fidelity and realism, with favorable IDS and FID metrics on synthetic and real datasets.
- The two-stage design with RM and LAControlNet avoids over-smoothing and wrong details that plague single-stage methods.
- Latent image guidance provides a tunable spectrum from faithful restoration to high-realism textures.
- Ablation studies confirm the necessity of the restoration module, finetuning Stable Diffusion, and efficacy of LAControlNet over ControlNet.

더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.