QUICK REVIEW

[論文レビュー] Unsupervised Semantic Correspondence Using Stable Diffusion

Eric Hedlin, Gopal Sharma|arXiv (Cornell University)|May 24, 2023

Generative Adversarial Networks and Image Synthesis被引用数 22

ひとこと要約

この論文は Stable Diffusion を活用したセマンティック対応のための教師なし手法を提示し、複数データセットで競争力のある PCK スコアを達成し、従来の教師なしベースラインを改善している。

ABSTRACT

Text-to-image diffusion models are now capable of generating images that are often indistinguishable from real images. To generate such images, these models must understand the semantics of the objects they are asked to generate. In this work we show that, without any training, one can leverage this semantic knowledge within diffusion models to find semantic correspondences - locations in multiple images that have the same semantic meaning. Specifically, given an image, we optimize the prompt embeddings of these models for maximum attention on the regions of interest. These optimized embeddings capture semantic information about the location, which can then be transferred to another image. By doing so we obtain results on par with the strongly supervised state of the art on the PF-Willow dataset and significantly outperform (20.9% relative for the SPair-71k dataset) any existing weakly or unsupervised method on PF-Willow, CUB-200 and SPair-71k datasets.

研究の動機と目的

生成的拡散モデルを通じて教師なしのセマンティック対応を動機づける。
Stable Diffusion に guided された埋め込み最適化アプローチを提案し、画像間の意味的パーツを揃える。
標準ベンチマーク（CUB-200、PF-Willow、SPair-71k）で、監視あり・弱監視ベースラインと比較して評価。

提案手法

Stable Diffusion の潜在空間で埋め込みを最適化し、意味的パーツを揃える。
拡散モデルの注意マップとトークンレベルの推論を活用して対応する領域を特定する。
データセット全体で PCK 指標を用いて、強い監視・弱監視・教師なしベースラインと比較する。

Figure 4 : Attention maps for each of the tokens corresponding to the sentence "A picture of a cat"

実験結果

リサーチクエスチョン

RQ1拡散ベースの埋め込みを使用して、明示的な監視なしでどれだけ良い教師なしセマンティック対応を実現できるか？
RQ2拡散ベースの表現は、標準ベンチマーク全体で既存の教師なしおよび弱監視手法と比較して競争力のある PCK パフォーマンスを提供するか？
RQ3トークンレベルの注意と最適化された埋め込みが対応精度に与える影響はどの程度か？
RQ4提案手法は CUB-200、PF-Willow、SPair-71k で DINO+MLS、VGG+MLS、PWarpC-NC-Net などのベースラインとどうランク付けされるか？

主な発見

手法	CUB-200 PCK@0.05	CUB-200 PCK@0.1	PF-Willow PCK@0.05	PF-Willow PCK@0.1	SPair-71k PCK@0.05	SPair-71k PCK@0.1
Strong supervision \| CHM [69]	-	-	52.7	79.4	27.2	46.3
VAT [15]	-	-	52.8	81.6	35.0	55.5
CATs++ [16]	-	-	56.7	81.2	–	59.8
PWarpC-NC-Net* res101 [70]	-	-	48.0	76.2	21.5	37.1
Weak supervision \| PMD [21]	-	-	40.3	74.7	–	26.5
PSCNet-SE [52]	-	-	42.6	75.1	–	27.0
VGG+MLS [56]	18.3	25.8	41.2	63.2	–	27.4
DINO+MLS [56,71]	52.0	67.0	45.0	66.5	–	31.1
ASIC [46]	57.9	75.9	53.0	76.3	–	36.9
PWarpC-NC-Net res101 [70]	–	–	45.0	75.9	18.2	35.3
Unsupervised \| DINO+NN [57]	52.8	68.3	40.1	60.1	–	33.3
Our method	61.6	77.5	53.0	84.3	28.9	45.4

我々の手法は CUB-200 で 61.6 PCK@0.05 および 77.5 PCK@0.1 を達成。
PF-Willow で 53.0 PCK@0.05 および 84.3 PCK@0.1 を達成。
SPair-71k で 28.9 PCK@0.05 および 45.4 PCK@0.1 を達成。
既存の教師なしベースライン（DINO+NN）と比較して、検討対象データセットで報告スコアを改善している。
表はデータセット全体で競争力のある性能を示し、特定の指標で PF-Willow および SPair-71k で強い改善を示す。
PWarpC-NC-Net および他のベースラインは強みが異なる；我々の手法は一貫していくつかの教師なしベースラインより上位にランクされる。

Figure 5 : Attention maps for each of the tokens corresponding to the sentence "A bird’s left eye"

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。