QUICK REVIEW

[論文レビュー] Shape-of-You: Fused Gromov-Wasserstein Optimal Transport for Semantic Correspondence in-the-Wild

Jiin Im, Sisung Liu|arXiv (Cornell University)|Mar 12, 2026

3D Shape Modeling and Analysis被引用数 0

ひとこと要約

Shape-of-You (SoY) は 3D 幾何リフティングを取り入れた融合型 Gromov-Wasserstein OT フレームワークを用い、幾何学を意識した疑似ラベルを生成する。 Explicit な幾何注釈なしでロバストな意味的対応を行う軽量アダプタを訓練し、SPair-71k と AP-10k で最先端の結果を達成。

ABSTRACT

Semantic correspondence is essential for handling diverse in-the-wild images lacking explicit correspondence annotations. While recent 2D foundation models offer powerful features, adapting them for unsupervised learning via nearest-neighbor pseudo-labels has key limitations: it operates locally, ignoring structural relationships, and consequently its reliance on 2D appearance fails to resolve geometric ambiguities arising from symmetries or repetitive features. In this work, we address this by reformulating pseudo-label generation as a Fused Gromov-Wasserstein (FGW) problem, which jointly optimizes inter-feature similarity and intra-structural consistency. Our framework, Shape-of-You (SoY), leverages a 3D foundation model to define this intra-structure in the geometric space, resolving abovementioned ambiguity. However, since FGW is a computationally prohibitive quadratic problem, we approximate it through anchor-based linearization. The resulting probabilistic transport plan provides a structurally consistent but noisy supervisory signal. Thus, we introduce a soft-target loss dynamically blending guidance from this plan with network predictions to build a learning framework robust to this noise. SoY achieves state-of-the-art performance on SPair-71k and AP-10k datasets, establishing a new benchmark in semantic correspondence without explicit geometric annotations. Code is available at Shape-of-You.

研究の動機と目的

explicit な幾何注釈なしに野外画像で意味的対応を動機づける。
2D 外観ベース最近傍の疑似ラベルの限界に、グローバル構造を組み込んで対処する。
疑似ラベル生成を FGW として定式化し、アンカーベースの線形化で計算コストを軽減する。
ノイズの多い疑似ラベルを扱うために、ソフトターゲット損失を用いた軽量アダプタネットワークを訓練する。

提案手法

画像からセマンティック特徴と 3D 座標をリフトした 3D 基盤モデルを用いてパッチ集合を定義する。
初期のセマンティック不均衡 OT（UOT）プランを、アンカーとしての 2D 特徴のコサイン類似度を用いて計算する。
3D 内部構造距離と K アンカー（K=64）を用いたGWコストのアンカーベース線形化により反復的に洗練させる。
セマンティックコストと線形化された幾何コストを総コストとして融合し、UOT の精緻化された輸送プランを得る。
確率的輸送プランに由来するソフトターゲット損失を用いた軽量アダプタを訓練し、監督のための密結合対応損失を併用する。

実験結果

リサーチクエスチョン

RQ1意味的対応のための疑似ラベル生成は、特徴間の類似性と内部構造（3D）一貫性の両方をどう活用できるか。
RQ22D特徴を3D幾何へリフティングすることで、明示的な 3D 注釈なしでも幾何学的あいまいさ（遮蔽、視点変化など）への頑健性が向上するか。
RQ3アンカー基盤の GW の線形化は、幾何情報を考慮した監督の実行可能で効果的な近似を提供できるか。
RQ4確率的輸送プランからのソフトターゲット損失は、ノイズの多い疑似ラベル下での学習を改善するか。

主な発見

SoY は SPair-71k で PCK@0.1 の最先端を 67.9%、 intra-species で 68.0% の AP-10k を達成。
SoY は 18 カテゴリ中 17 カテゴリで最高または二番目の性能を達成。
内部構造のアブレーションでは、3D 幾何距離が 2D や純粋な意味的内部構造よりも疑似ラベル品質で優れていることを示す。
アンカー基盤 FGW とサイクル整合アンカー選択は幾何学的あいまいさへの頑健性を向上。
最終的に学習されたアダプタは、推論時に幾何情報を考慮したマッチングを実現し、反復最適化を必要としない。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。