QUICK REVIEW

[論文レビュー] LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

Guangcong Zheng, Xianpan Zhou|arXiv (Cornell University)|Mar 30, 2023

Advanced Image and Video Retrieval Techniques被引用数 8

ひとこと要約

LayoutDiffusionは、Layout Fusion ModuleとObject-aware Cross Attentionを通じて構造的画像パッチとレイアウトを統合するワンステージ拡散モデルであり、従来手法よりも品質と制御性を高めたレイアウトから画像生成を実現します。

ABSTRACT

Recently, diffusion models have achieved great success in image synthesis. However, when it comes to the layout-to-image generation where an image often has a complex scene of multiple objects, how to make strong control over both the global layout map and each detailed object remains a challenging task. In this paper, we propose a diffusion model named LayoutDiffusion that can obtain higher generation quality and greater controllability than the previous works. To overcome the difficult multimodal fusion of image and layout, we propose to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form. Moreover, Layout Fusion Module (LFM) and Object-aware Cross Attention (OaCA) are proposed to model the relationship among multiple objects and designed to be object-aware and position-sensitive, allowing for precisely controlling the spatial related information. Extensive experiments show that our LayoutDiffusion outperforms the previous SOTA methods on FID, CAS by relatively 46.35%, 26.70% on COCO-stuff and 44.29%, 41.82% on VG. Code is available at https://github.com/ZGCTroy/LayoutDiffusion.

研究の動機と目的

テキスト誘導拡散法を超えるレイアウト-to-画像生成の制御性と品質の向上を動機付ける。
画像パッチをレイアウト類似オブジェクトとして扱う統一的な多モーダル統合機構を開発する。
全denoisingステップでのレイアウト条件付きのエンドツーエンド、一ステージ拡散を実現する。

提案手法

レイアウトを複数オブジェクトの埋め込みとして表現し、Layout Fusion Module（LFM）を介して画像特徴と統合する。
領域情報を持つ構造的な画像パッチを構築し、画像とレイアウトを共通の空間に統一する。
拡散過程で局所的かつオブジェクト感応的な条件付けを行うObject-aware Cross Attention（OaCA）を提案する。
分類器なしガイダンスをレイアウト条件付き拡散に適用し、追加の分類器なしで制御性を向上させる。
条件付き生成を高速化するためにDPM-solver系のサンプリング速度を最適化する。

Figure 2 : The whole pipeline of LayoutDiffusion. The layout that consisted of bounding box $b$ and objects categories $c$ is transformed into embedding $B_{\mathcal{L}},C_{\mathcal{L}},L$ . Then Layout Fusion Module fuses layout embedding $L$ to output the fused layout embedding $L^{\prime}$ . Fina

実験結果

リサーチクエスチョン

RQ1マルチモーダルな画像パッチとレイアウトの統一的な形での統合は、レイアウト-to-画像生成をどう改善できるか。
RQ2LFMとOaCAは、画像品質、多様性、およびオブジェクトレベルの正確な制御性を従来手法より向上させるか。
RQ3エンドツーエンドの一ステージ拡散でレイアウト指示を用いた条件付けは、標準ベンチマークで既存のGANベースおよび拡散ベースアプローチを上回るか。

主な発見

LayoutDiffusionはCOCO-StuffおよびVisual Genomeで従来法より高い生成品質とより強い制御性を達成する。
非構造的でない画像パッチアプローチは、画像とレイアウトを統一空間で効果的に融合できる。
LFMはレイアウト内の複数オブジェクトのグローバルおよび関係理解を向上させる。
OaCAはオブジェクト感応的な位置認識を伴うクロスアテンションを提供し、出力のオブジェクト配置と認識を改善する。
分類器なしガイダンスと高速サンプリング（DPM-solver）は、品質を維持しつつ条件付き生成を高速化する。
定量的結果は、評価データセットでFID、IS、DS、CAS、YOLOScoreなどの指標でLayoutDiffusionがSOTA手法を上回ることを示す。

Figure 3 : Visualization of comparision with SOTA methods on COCO-stuff 256 $\times$ 256. LayoutDiffusion has better generation quality and stronger controllability compared to the other methods.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。