QUICK REVIEW

[논문 리뷰] DiffIR: Efficient Diffusion Model for Image Restoration

Bin Xia, Yulun Zhang|arXiv (Cornell University)|2023. 03. 16.

Image and Signal Denoising Methods인용 수 16

한 줄 요약

DiffIR는 컴팩트한 IR 사전 표현과 이중 단계 학습 스킴을 사용하여 확산 모델을 이미지 복원에 적합하게 만들고, 이전 DM 기반 IR 방법들보다 훨씬 적은 반복과 더 낮은 계산으로 최첨단 결과를 달성합니다.

ABSTRACT

Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network. However, different from image synthesis, image restoration (IR) has a strong constraint to generate results in accordance with ground-truth. Thus, for IR, traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient. To address this issue, we propose an efficient DM for IR (DiffIR), which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network. Specifically, DiffIR has two training stages: pretraining and training DM. In pretraining, we input ground-truth images into CPEN$_{S1}$ to capture a compact IR prior representation (IPR) to guide DIRformer. In the second stage, we train the DM to directly estimate the same IRP as pretrained CPEN$_{S1}$ only using LQ images. We observe that since the IPR is only a compact vector, DiffIR can use fewer iterations than traditional DM to obtain accurate estimations and generate more stable and realistic results. Since the iterations are few, our DiffIR can adopt a joint optimization of CPEN$_{S2}$, DIRformer, and denoising network, which can further reduce the estimation error influence. We conduct extensive experiments on several IR tasks and achieve SOTA performance while consuming less computational costs. Code is available at \url{https://github.com/Zj-BinXia/DiffIR}.

연구 동기 및 목표

그림의 대부분 입력 픽셀이 주어지는 상황에서 이미지 복원을 위한 확산 모델의 효율적 사용을 동기 부여한다.
복원을 안내하기 위한 컴팩트 IR 사전 표현(IPR)을 개발한다.
IR을 위해 CPEN과 확산 모델을 활용하는 이중 단계 학습 스킴을 제안한다.
CPENS2, DIRformer, 그리고 제거 네트워크의 공동 최적화를 가능하게 하여 추정 오차를 줄인다.

제안 방법

Ground-truth 이미지로부터 컴팩트 IR 사전 표현을 추출하기 위해 CPEN을 도입한다.
IPR을 복원에 활용하기 위한 DMTA 및 DGFN을 갖춘 Dynamic IRformer(DIRformer)을 제안한다.
Stage 1에서 재구성 손실로 CPEN S1과 DIRformer를 함께 최적화하여 학습한다.
Stage 2에서 저품질 이미지로부터 IPR를 추정하기 위해 확산 모델을 훈련시키고, 컴팩트 잠재 벡터 및 공동 최적화를 사용한다.
Diffusion 프레임워크 내에서 CPEN S2와 제거 네트워크를 사용하여 IPR를 반복적으로 정제하고 이미지를 복원한다.

Figure 1: The Mult-Adds are measured on 256 $\times$ 256 inputs. Our DiffIR achieves SOTA performance on IR tasks. Notably, LDM [ 50 ] and RePaint [ 40 ] are DM-based methods, and DiffIR is 1000 $\times$ more efficient than RePaint while achieving better performance.

실험 결과

연구 질문

RQ1확산 모델이 IR 작업에서 전체 이미지가 아닌 컴팩트 IR 벡터에서 효과적으로 작동할 수 있는가?
RQ2이중 단계 학습(그라운드 트루스 가이드라인과 저품질 가이던스)이 복원 품질과 안정성을 향상시키는가?
RQ3CPEN S2, DIRformer, 제거 네트워크의 공동 최적화가 추정 오차의 확산과 왜곡을 줄이는가?
RQ4DiffIR는 인페인팅, SR, 디블러링에서 최첨단 DM 기반 IR 방법들과 비교하여 어떤 성능을 보이는가?

주요 결과

DiffIR은 여러 IR 작업에서 최첨단 성능을 달성하면서도 훨씬 적은 반복과 더 낮은 계산 자원으로 동작한다.
CPEN에 의해 가이드된 컴팩트 IPR은 경량 DIRformer로 효과적인 복원을 가능하게 한다.
CPEN S2, DIRformer, 제거 네트워크의 공동 최적화는 복원 품질에 대한 추정 오차의 영향을 완화한다.
실험에서 DiffIR은 RePaint 및 LDM보다 훨씬 효율적이며, 인페인팅, SR, 디블러링에서 몇몇 DM 기반 벤치마크를 능가한다.
추출 연구는 DiffIR S2 설계의 이점, 공동 학습 스킴, 그리고 역 확산에서 IPR 추정을 위한 분산 노이즈를 피하는 이점을 보여준다.

Figure 2: The overview of the proposed DiffIR, which consists of DIRformer, CPEN, and denoising network. DiffIR has two training stages: (a) In the first stage, CPEN S1 takes the ground-truth image as input and outputs an IPR $\mathbf{Z}$ to guide DIRformer to restore images. We optimize the CPEN S1

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.