QUICK REVIEW

[論文レビュー] UniStitch: Unifying Semantic and Geometric Features for Image Stitching

Yuan Mei, Lang Nie|arXiv (Cornell University)|Mar 11, 2026

Advanced Image and Video Retrieval Techniques被引用数 0

ひとこと要約

UniStitchは、Neural Point TransformerとAdaptive Mixture of Expertsを用いて意味特徴と幾何学的キーポイントを統合する統一フレームワークを提案し、ドメイン内・ドメイン外の両方のステッチ性能で最先端を達成します。

ABSTRACT

Traditional image stitching methods estimate warps from hand-crafted geometric features, whereas recent learning-based solutions leverage semantic features from neural networks instead. These two lines of research have largely diverged along separate evolution, with virtually no meaningful convergence to date. In this paper, we take a pioneering step to bridge this gap by unifying semantic and geometric features with UniStitch, a unified image stitching framework from multimodal features. To align discrete geometric features (i.e., keypoint) with continuous semantic feature maps, we present a Neural Point Transformer (NPT) module, which transforms unordered, sparse 1D geometric keypoints into ordered, dense 2D semantic maps. Then, to integrate the advantages of both representations, an Adaptive Mixture of Experts (AMoE) module is designed to fuse geometric and semantic representations. It dynamically shifts focus toward more reliable features during the fusion process, allowing the model to handle complex scenes, especially when either modality might be compromised. The fused representation can be adopted into common deep stitching pipelines, delivering significant performance gains over any single feature. Experiments show that UniStitch outperforms existing state-of-the-art methods with a large margin, paving the way for a unified paradigm between traditional and learning-based image stitching.

研究の動機と目的

従来の幾何学的特徴と学習された意味特徴のギャップを画像ステッチで埋める。
マルチモーダル特徴を整列・融合・ワープさせて堅牢なパノラマ作成のパイプラインを開発する。
潜在空間正則化を通じて、一方のモダリティが信頼できない場合の頑健性を可能にする。
メモリ使用量を削減する新しいFFDベースTPSで高解像度ワープの効率を向上させる。
ドメイン外シナリオを含む多様なデータセットで一般化を実証する。

提案手法

画像対から幾何キーポイント/ディスクリプタを抽出する。
意味ブランチはResNet-18を用いてマルチスケールの意味マップを生成する。
幾何ブランチはNeural Point Transformerを用いて疎なキーポイントを密な幾何マップに変換する。
セルごとに最大プーリングを行い、キーポイント特徴をグリッド整列幾何マップへ投影する。
Adaptive Mixture of Experts (AMoE)と潜在空間モダリティロバストナー（MR）でモダリティを融合する。
VRAM使用量を削減し推論を高速化するFFDベースTPSを用いてグローバルからローカルへのワープを予測する。

実験結果

リサーチクエスチョン

RQ1意味特徴と幾何特徴を統合して画像ステッチの頑健性と品質を向上させることは可能か。
RQ2未整理のキーポイントを意味マップと整列するグリッド整列幾何表現へどう変換するのか。
RQ3モダリティ認識型の専門家を用いた適応融合は、難易度の高いシーンや一方のモダリティが信頼できない場合に性能を向上させるか。
RQ4高解像度のワープを整列精度を損なうことなく効率的に計算できるか。

主な発見

方法	mPSNR_easy	mPSNR_moderate	mPSNR_hard	mPSNR_average	mSSIM_easy	mSSIM_moderate	mSSIM_hard	mSSIM_average
APAP	26.77	22.88	18.75	22.39	0.868	0.770	0.587	0.726
SPW	25.82	21.49	15.85	20.52	0.844	0.693	0.434	0.634
LPC	25.01	21.27	17.34	20.82	0.815	0.673	0.485	0.640
UDIS	23.53	19.73	17.42	19.94	0.761	0.545	0.376	0.542
UDIS++	27.58	23.75	20.04	23.41	0.880	0.792	0.632	0.755
DunHuangStitch	27.19	23.05	19.10	22.61	0.875	0.767	0.564	0.718
StabStitch++	29.92	24.93	20.46	24.63	0.927	0.845	0.664	0.797
RopStitch	29.93	24.96	20.60	24.70	0.926	0.845	0.672	0.800
Ours	30.34	25.37	20.90	25.07	0.932	0.857	0.691	0.813

UniStitchはドメイン内外のデータセットで最先端手法を上回り、より高いmPSNRとmSSIMを達成した。
AMoEベースの融合は意味的手がいと幾何的手がいのバランスを効果的に取れ、MRはモダリティ低下時の頑健性を向上させる。
FFDベースTPSは高解像度ステッチのメモリ使用量を大幅に削減し、整列品質を損なうことなく速度を向上させる。
対応したキーポイント（ディスクリプタ付き）を使用する方が、生のキーポイントより良い結果を得られ、特に難易度の高いシーンで学習された幾何特徴は大きな利点を提供する。
多様な幾何前提（SIFT, SURF, ORB, SuperPoint）を組み合わせてマッチングを行うことで、データセットを超えた普遍的な利得を得られる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。