QUICK REVIEW

[论文解读] UniStitch: Unifying Semantic and Geometric Features for Image Stitching

Yuan Mei, Lang Nie|arXiv (Cornell University)|Mar 11, 2026

Advanced Image and Video Retrieval Techniques被引用 0

一句话总结

UniStitch 提出一个统一框架，通过神经点转换器（Neural Point Transformer）与自适应专家混合（Adaptive Mixture of Experts）融合语义特征与几何关键点，在域内与域外场景下都达到最先进的拼接性能。

ABSTRACT

Traditional image stitching methods estimate warps from hand-crafted geometric features, whereas recent learning-based solutions leverage semantic features from neural networks instead. These two lines of research have largely diverged along separate evolution, with virtually no meaningful convergence to date. In this paper, we take a pioneering step to bridge this gap by unifying semantic and geometric features with UniStitch, a unified image stitching framework from multimodal features. To align discrete geometric features (i.e., keypoint) with continuous semantic feature maps, we present a Neural Point Transformer (NPT) module, which transforms unordered, sparse 1D geometric keypoints into ordered, dense 2D semantic maps. Then, to integrate the advantages of both representations, an Adaptive Mixture of Experts (AMoE) module is designed to fuse geometric and semantic representations. It dynamically shifts focus toward more reliable features during the fusion process, allowing the model to handle complex scenes, especially when either modality might be compromised. The fused representation can be adopted into common deep stitching pipelines, delivering significant performance gains over any single feature. Experiments show that UniStitch outperforms existing state-of-the-art methods with a large margin, paving the way for a unified paradigm between traditional and learning-based image stitching.

研究动机与目标

弥合图像拼接中传统几何特征与学习得到的语义特征之间的差距。
开发一个对齐、融合并变形多模态特征以实现鲁棒全景拼接的流水线。
通过潜在空间正则化在一种模态不可靠时实现鲁棒性。
提出基于自由形变（FFD）的 TPS 以减少内存占用、提升高分辨率变形的效率。
在包括域外场景在内的多样数据集上展示泛化能力。

提出的方法

从图像对中提取几何关键点/描述子。
语义分支使用 ResNet-18 生成多尺度语义图。
几何分支使用神经点转换器将稀疏关键点转换为密集几何图。
将关键点特征投射到网格对齐的几何图中，每个单元格进行最大池化。
用自适应专家混合（AMoE）和潜在空间模态鲁棒化器（MR）融合模态。
使用基于自由形变（FFD）的 TPS 预测全局到局部的变形，降低显存占用并加速推理。

实验结果

研究问题

RQ1语义特征与几何特征是否能够有效地统一用于图像拼接，从而提升鲁棒性和质量？
RQ2如何将无序的关键点转换为与语义图对齐的密集网格几何表示？
RQ3基于模态感知专家的自适应融合是否在复杂场景或某一模态不可靠时提升性能？
RQ4能否在不牺牲对齐精度的情况下高效计算高分辨率的变形？

主要发现

Method	mPSNR_easy	mPSNR_moderate	mPSNR_hard	mPSNR_average	mSSIM_easy	mSSIM_moderate	mSSIM_hard	mSSIM_average
APAP	26.77	22.88	18.75	22.39	0.868	0.770	0.587	0.726
SPW	25.82	21.49	15.85	20.52	0.844	0.693	0.434	0.634
LPC	25.01	21.27	17.34	20.82	0.815	0.673	0.485	0.640
UDIS	23.53	19.73	17.42	19.94	0.761	0.545	0.376	0.542
UDIS++	27.58	23.75	20.04	23.41	0.880	0.792	0.632	0.755
DunHuangStitch	27.19	23.05	19.10	22.61	0.875	0.767	0.564	0.718
StabStitch++	29.92	24.93	20.46	24.63	0.927	0.845	0.664	0.797
RopStitch	29.93	24.96	20.60	24.70	0.926	0.845	0.672	0.800
Ours	30.34	25.37	20.90	25.07	0.932	0.857	0.691	0.813

UniStitch 在域内和域外数据集上均超越最先进的方法，获得更高的 mPSNR 和 mSSIM。
基于 AMoE 的融合能有效平衡语义与几何线索，MR 在模态降级时提高鲁棒性。
基于 FFD 的 TPS 在不损害对齐质量的前提下显著降低高分辨率拼接的显存使用并提高速度。
使用匹配的关键点（含描述子）比仅使用原始关键点表现更好，学习得到的几何特征在挑战性场景中具有显著优势。
引入多样的几何先验（如 SIFT、SURF、ORB、SuperPoint 及其匹配）在各数据集上带来普遍收益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。