QUICK REVIEW

[논문 리뷰] LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

Guangcong Zheng, Xianpan Zhou|arXiv (Cornell University)|2023. 03. 30.

Advanced Image and Video Retrieval Techniques인용 수 8

한 줄 요약

LayoutDiffusion은 Layout Fusion Module과 Object-aware Cross Attention을 통해 구조적 이미지 패치를 레이아웃과 융합하여 제어 가능한 레이아웃-투-이미지 생성을 가능하게 하는 단일 스테이지 확산 모델로, 기존 방법들보다 품질과 제어성을 향상시킵니다.

ABSTRACT

Recently, diffusion models have achieved great success in image synthesis. However, when it comes to the layout-to-image generation where an image often has a complex scene of multiple objects, how to make strong control over both the global layout map and each detailed object remains a challenging task. In this paper, we propose a diffusion model named LayoutDiffusion that can obtain higher generation quality and greater controllability than the previous works. To overcome the difficult multimodal fusion of image and layout, we propose to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form. Moreover, Layout Fusion Module (LFM) and Object-aware Cross Attention (OaCA) are proposed to model the relationship among multiple objects and designed to be object-aware and position-sensitive, allowing for precisely controlling the spatial related information. Extensive experiments show that our LayoutDiffusion outperforms the previous SOTA methods on FID, CAS by relatively 46.35%, 26.70% on COCO-stuff and 44.29%, 41.82% on VG. Code is available at https://github.com/ZGCTroy/LayoutDiffusion.

연구 동기 및 목표

레이아웃-투-이미지 생성에서 텍스트 유도 확산 방법을 넘어 제어성과 품질을 개선하려는 동기를 제시한다.
이미지 패치를 레이아웃과 유사한 객체로 다루는 통합 다중 모달 융합 메커니즘을 개발한다.
모든 디노이징 단계에서 레이아웃을 조건으로 한 엔드-투-엔드 단일 스테이지 확산을 가능하게 한다.

제안 방법

레이아웃을 다중 객체 임베딩으로 표현하고 Layout Fusion Module (LFM)을 통해 이미지 특징과 융합한다.
구조적 이미지 패치를 영역 정보를 포함한 패치로 구성하여 이미지와 레이아웃을 공통의 공간에서 통합한다.
Diffusion 동안 로컬이고 객체에 민감한 컨디셔닝을 수행하기 위해 Object-aware Cross Attention (OaCA)을 제안한다.
추가 분류기 없이도 제어 가능한 확산을 위해 분류기 비의존 가이드를 적용한다.
더 빠른 조건부 생성을 위한 DPM-solver 변형으로 확산 샘플링 속도를 최적화한다.

Figure 2 : The whole pipeline of LayoutDiffusion. The layout that consisted of bounding box $b$ and objects categories $c$ is transformed into embedding $B_{\mathcal{L}},C_{\mathcal{L}},L$ . Then Layout Fusion Module fuses layout embedding $L$ to output the fused layout embedding $L^{\prime}$ . Fina

실험 결과

연구 질문

RQ1다중 모달 패치와 레이아웃의 융합을 하나의 통합 형태로 다루어 레이아웃-투-이미지 생성을 어떻게 개선할 수 있을까?
RQ2LFM과 OaCA가 이미지 품질, 다양성, 객체 수준의 정밀한 제어를 기존 방법보다 향상시키는가?
RQ3레이아웃 가이드를 갖춘 엔드-투-엔드 단일 스테이지 확산이 표준 벤치마크에서 기존의 GAN 및 확산 기반 접근법을 능가할 수 있는가?

주요 결과

LayoutDiffusion은 COCO-Stuff 및 Visual Genome에서 기존 방법보다 더 높은 생성 품질과 더 강한 제어성을 달성한다.
비구조적 접근을 통해 이미지와 레이아웃을 통합된 공간에서 효과적으로 융합할 수 있다.
LFM은 다중 객체의 글로벌하고 관계적인 이해를 향상시킨다.
OaCA는 객체 인식과 위치 감지를 개선하는 객체 인지적 교차 주의(attention)를 제공한다.
분류기 없는 가이드와 가속 샘플링(DPM-solver)은 품질을 유지하면서 조건부 생성을 빠르게 수행한다.
정량적 결과는 LayoutDiffusion이 평가된 데이터셋에서 FID, IS, DS, CAS, YOLOScore와 같은 지표에서 SOTA 방법을 능가함을 보여준다.

Figure 3 : Visualization of comparision with SOTA methods on COCO-stuff 256 $\times$ 256. LayoutDiffusion has better generation quality and stronger controllability compared to the other methods.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.