QUICK REVIEW

[論文レビュー] LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Linoy Tsaban, Apolinário Passos|arXiv (Cornell University)|Jul 2, 2023

Generative Adversarial Networks and Image Synthesis参考文献 16被引用数 8

ひとこと要約

LEDITSは DDPM inversionと SEGA のセマンティックガイダンスを組み合わせて、モデルアーキテクチャを変えずに軽量かつ柔軟に実画像を編集します。

ABSTRACT

Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.

研究の動機と目的

テキスト誘導拡散モデルを用いた実画像編集を推進し、編集課題に対処する。
DDPM inversionをSEGAと軽量に統合して、実画像上で意味的ガイド付きの編集を可能にする。
DDPM inversionとSEGAの組み合わせが、忠実性と意味的制御を維持しつつ多様な編集を生み出すことを示す。
アプローチが軽量で、モデルアーキテクチャの変更を必要としないことを示す。

提案手法

入力画像に対してDDPM inversionを実行し、反転潜在変数とノイズマップを得る。
ターゲットプロンプトとSEGAコンセプトをエンコードして条件ベクトルを得る。
SEGAガイダンスを反映したepsilon_thetaを用いたDDPMアップデートをTから1までのデノイズループを実行する。
SEGAガイド付き拡散過程内で事前計算されたノイズマップZ_tを用いて潜在変数x_{t-1}を更新する。
最終潜在変数x_0をデコードして編集後の画像を生成する。
二つの編集ワークフローを比較する：SEGA編集を伴う純粋な inversionと、ジョイント inversion+ターゲットプロンプト編集の組み合わせ。柔軟性と頑健性を強調する。

Figure 1: LEDITS- DDPM inversion with semantic guidance for real image editing. Real images edited purely with DDPM inversion and with both DDPM inversion and semantic guidance (LEDITS). In this combined approach we first apply DDPM Inversion on the input image, and then edit by performing the rever

実験結果

リサーチクエスチョン

RQ1LEDITSEditは実画像を忠実に編集しつつ、 substantial または subtle な編集を可能にするか？
RQ2DDPM inversionとSEGAを組み合わせることで、元画像への忠実性を保ちつつターゲット指示の変更を可能にするか？
RQ3LEDITSは純粋な inversion または Prompt-to-Prompt と比べて柔軟性と制御性がどうか？
RQ4SEGAガイダンスベクトルはLEDITSフレームワーク内で頑健性と単調性を保持するか？

主な発見

LEDITSはアーキテクチャの変更なしで最先端手法と競合する定性的編集を提供する。
DDPM inversionとSEGAセマンティクスを組み合わせることで柔軟な制御を提供する。
LEDITSにおけるSEGAガイダンスは頑健性と単調性の特性を保持する。
二つの編集ワークフローをサポートし、純粋な inversion または純粋な SEGA 編集を超える多様性と汎用性を実現する。
統合は軽量で、両技術の長所を維持する。

Figure 2: LEDITS overview. Top: inversion of the input image. We first apply DDPM inversion on the original image to obtain the inverted latents and corresponding noise maps. Bottom: We use the inverted latents to drive the reverse diffusion process with semantic guidance. In each denoising step we

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。