QUICK REVIEW

[論文レビュー] Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models

Daiki Miyake, Akihiro Iohara|arXiv (Cornell University)|May 26, 2023

Advanced Vision and Imaging被引用数 18

ひとこと要約

この論文は negative-prompt inversion を導入します。拡散モデルの前方伝播ベースの画像反転法で、最適化ベースの方法に近い再構成品質を達成しつつ、30x超の速度アップを実現し、迅速な編集を可能にします。

ABSTRACT

In image editing employing diffusion models, it is crucial to preserve the reconstruction fidelity to the original image while changing its style. Although existing methods ensure reconstruction fidelity through optimization, a drawback of these is the significant amount of time required for optimization. In this paper, we propose negative-prompt inversion, a method capable of achieving equivalent reconstruction solely through forward propagation without optimization, thereby enabling ultrafast editing processes. We experimentally demonstrate that the reconstruction fidelity of our method is comparable to that of existing methods, allowing for inversion at a resolution of 512 pixels and with 50 sampling steps within approximately 5 seconds, which is more than 30 times faster than null-text inversion. Reduction of the computation time by the proposed method further allows us to use a larger number of sampling steps in diffusion models to improve the reconstruction fidelity with a moderate increase in computation time.

研究の動機と目的

拡散モデルを用いた編集のための実画像の高速かつ高忠実度の再構成を促進する。
逆変換における最適化を排除しつつ、最適化ベースの手法と同等の再構成品質を維持する。
prompt-to-prompt などの既存の編集フレームワークと統合することで、実画像の迅速な編集を実現する。

提案手法

CFG における最適化済みの null-text 埋め込みを実際の prompt 埋め込みに置換して、前方のみの再構成を可能にする。
バックプロパゲーションなしに潜在軌跡を導出するために、DDIM inversion と CFG を活用する。
補足（Appendix）として、最適な null-text 埋め込みは条件付きテキスト埋め込み C によって近似できるという理論的正当化。
negativ e-prompt inversion が null-text inversion に近い再構成品質を達成する一方で約30xの速度アップを実現することを示す。
prompt-to-prompt 等の画像編集手法との互換性を示し、実画像の迅速な編集を実現する。

実験結果

リサーチクエスチョン

RQ1拡散モデルによる実画像の再構成を、最適化なしで高忠実度を維持しつつ達成できるか？
RQ2null-text 埋め込みを prompt 埋め込みに置換することは、再構成および編集品質にどう影響するか？
RQ3negative-prompt inversion、null-text inversion、および CFG を用いた DDIM inversion の間で、速度と品質のトレードオフはどうなるか？
RQ4この手法は prompt-to-prompt のような既存の編集パイプラインと実画像編集に互換性があるか？

主な発見

Method	PSNR ↑	LPIPS ↓	Speed (s)
DDIM inversion with CFG	14.05(0) .34)	0.5278(0) .0222)	4.611(0) .028)
Null-text inversion	26.11(0) .81)	0.0745(0) .0067)	129.768(2) .965)
Negative-prompt inversion (Ours)	23.38(0) .66)	0.1603(0) .0155)	4.627(0) .020)

Negative-prompt inversion は forward 計算のみを可能にしつつ、再構成品質を null-text inversion に匹敵させる。
本手法は 512x512 の画像で ~5 秒、50 sampling steps、null-text inversion より約 ~30x 速い。
再構成品質は CFG を用いた DDIM inversion よりも依然として有意に優れている。
Negative-prompt inversion を prompt-to-prompt のような編集手法と組み合わせると、実画像の高速編集を実現できる。
サンプリングステップをさらに増やすと、最適化ベースの方法に対する速度を犠牲にすることなく再構成品質を向上させる。
Inversion の成功は画像によって異なり、いくつかの失敗例（特に人の顔の再構成における失敗）が制限事項として議論されている。）

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。