QUICK REVIEW

[논문 리뷰] Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models

Guanhua Zhang, Jiabao Ji|arXiv (Cornell University)|2023. 04. 06.

Generative Adversarial Networks and Image Synthesis인용 수 17

한 줄 요약

CoPaint-TT는 확산 기반 인페인팅 동안 공개된 영역과 비공개 영역을 일관되게 업데이트하여 참조 이미지와의 불일치를 줄이고 일관성과 품질을 개선하는 베이지안 가이드 인페인팅 방법을 제시합니다.

ABSTRACT

Image inpainting refers to the task of generating a complete, natural image based on a partially revealed reference image. Recently, many research interests have been focused on addressing this problem using fixed diffusion models. These approaches typically directly replace the revealed region of the intermediate or final generated images with that of the reference image or its variants. However, since the unrevealed regions are not directly modified to match the context, it results in incoherence between revealed and unrevealed regions. To address the incoherence problem, a small number of methods introduce a rigorous Bayesian framework, but they tend to introduce mismatches between the generated and the reference images due to the approximation errors in computing the posterior distributions. In this paper, we propose COPAINT, which can coherently inpaint the whole image without introducing mismatches. COPAINT also uses the Bayesian framework to jointly modify both revealed and unrevealed regions, but approximates the posterior distribution in a way that allows the errors to gradually drop to zero throughout the denoising steps, thus strongly penalizing any mismatches with the reference image. Our experiments verify that COPAINT can outperform the existing diffusion-based methods under both objective and subjective metrics. The codes are available at https://github.com/UCSB-NLP-Chang/CoPaint/.

연구 동기 및 목표

확산 모델로 인한 공개된 영역과 비공개 영역 간의 불일치를 피하기 위해 일관된 이미지 인페인팅을 촉진
공개된 모든 영역과 비공개 영역을 확산 과정에서 함께 업데이트하는 베이지안 프레임워크를 제안한다.
디노이징 단계들을 통해 인페인팅 오차를 최소화하는 계산적으로 실현 가능한 알고리즘(CoPaint 및 CoPaint-TT)을 개발한다.
CelebA-HQ 및 ImageNet에서 기존의 확산 기반 인페인팅 방법보다 향상된 일관성과 품질을 입증한다.
품질과 효율의 균형을 위한 실용적 변형 및 분석(타임 트래블 포함)을 제공한다.

제안 방법

고정된 사전 학습된 확산 모델을 채택하고, 공개 영역이 참조와 일치한다는 제약 하에 인페인팅을 후방 샘플링으로 공식화한다.
한 단계 생성 값 주위에 중심을 둔 가우시안 우도에 의해 인페인팅 제약이 시행되도록 근사된 사후를 도출하여 계산 가능하게 최적화를 가능하게 한다.
최종 생성을 근사하기 위해 one-step generation f_theta^(t)(X_t)를 도입하여 계산을 줄인다.
인페인팅 제약을 만족하도록 X_T를 최적화하고 사전에 의한 정규화를 통해 노이즈 제거 성공적으로 수정(CoPaint) 알고리즘을 기술한다.
다중 단계 근사 및 타임 트래블(CoPaint-TT)과 같은 추가 설계를 도입하여 디노이징 중 근사 오차를 점진적으로 줄인다.
근사된 사후로부터 탐욕적 샘플링 절차를 제공하여 최종 X_0를 얻고 끝에서 인페인팅 제약을 만족하도록 오차를 소멸시킨다.

Figure 2: The trajectory of the gap between $\bm{f}_{\theta}^{(t)}(\tilde{\bm{X}}_{t})$ and $\tilde{\bm{X}}_{0}$ along the unconditional diffusion denoising process. We report the pixel-wise averaged Euclidean distance between the two.

실험 결과

연구 질문

RQ1확산 기반 인페인팅 중에 공개된 영역과 비공개 영역을 불일치를 만들지 않으면서 일관되게 수정하는 베이지안 프레임워크를 사용할 수 있는가?
RQ2인페인팅 제약을 강제할 때 사후 샘플링을 어떻게 실용적으로 만들 수 있으며, 한 단계 생성이 최종 출력을 어떻게 제어하는가?
RQ3일관성 중심의 방법(CoPaint 및 CoPaint-TT)이 표준 데이터셋에서 기존의 확산 기반 인페인팅 베이스라인보다 우수한가?
RQ4다중 단계 근사, 타임 트래블과 같은 추가 설계가 인페인팅 품질과 효율성에 어떤 영향을 주는가?
RQ5이전 방법에 비해 계산량을 줄이면서도 인페인팅 품질을 보존 또는 향상시킬 수 있는가?

주요 결과

CoPaint 및 그 변형인 CoPaint-TT가 CelebA-HQ 및 ImageNet에서 여러 확산 기반 베이스라인보다 인페인팅 품질과 일관성을 더 우수하게 달성했다.
CoPaint-TT는 평가 데이터셋에서 RePaint 대비 평균 LPIPS 감소가 현저하게 나타내며(약 19% 상대 감소), ImageNet에서 계산 예산의 감소도 보고되었다.
한 단계 생성 접근법은 디노이징이 진행될수록 인페인팅 제약을 점진적으로 강화하는 실용적인 근사를 가능하게 하며, 이상적 설정에서 마지막 단계에서 제약 오차를 제로로 만들 수 있다.
타임 트래블과 다중 단계 근사를 추가하면 초기 단계 근사 오차를 더 줄이고 자기 일관성과 샘플 품질을 향상시킬 수 있다.
실험 전반에 걸쳐 CoPaint 변형은 baselines에 비해 주관적 인간 평가에서 경쟁력을 보였고, CoPaint-TT는 일관성 중심 평가에서 유리한 결과를 얻었다.

Figure 3: Time-performance trade-off on CelebA-HQ ( left ) and ImageNet ( right ). The x-axis indicates the average time ( $\downarrow$ ) to process one image, and the y-axis is the average LPIPS ( $\downarrow$ ).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.