QUICK REVIEW

[논문 리뷰] UniStitch: Unifying Semantic and Geometric Features for Image Stitching

Yuan Mei, Lang Nie|arXiv (Cornell University)|2026. 03. 11.

Advanced Image and Video Retrieval Techniques인용 수 0

한 줄 요약

UniStitch는 Neural Point Transformer와 Adaptive Mixture of Experts를 통해 의미적 특징과 기하학적 키포인트를 융합하는 통합 프레임워크를 제안하고, 도메인 내외의 시나리오에서 최첨단 스티칭 성능을 달성한다.

ABSTRACT

Traditional image stitching methods estimate warps from hand-crafted geometric features, whereas recent learning-based solutions leverage semantic features from neural networks instead. These two lines of research have largely diverged along separate evolution, with virtually no meaningful convergence to date. In this paper, we take a pioneering step to bridge this gap by unifying semantic and geometric features with UniStitch, a unified image stitching framework from multimodal features. To align discrete geometric features (i.e., keypoint) with continuous semantic feature maps, we present a Neural Point Transformer (NPT) module, which transforms unordered, sparse 1D geometric keypoints into ordered, dense 2D semantic maps. Then, to integrate the advantages of both representations, an Adaptive Mixture of Experts (AMoE) module is designed to fuse geometric and semantic representations. It dynamically shifts focus toward more reliable features during the fusion process, allowing the model to handle complex scenes, especially when either modality might be compromised. The fused representation can be adopted into common deep stitching pipelines, delivering significant performance gains over any single feature. Experiments show that UniStitch outperforms existing state-of-the-art methods with a large margin, paving the way for a unified paradigm between traditional and learning-based image stitching.

연구 동기 및 목표

전통적인 기하학적 특징과 학습된 의미적 특징 간의 간극을 이미지 스티칭에서 메우기.
다중 모달 특징을 정렬, 융합 및 왜곡시켜 견고한 파노라마 생성을 위한 파이프라인 개발.
잠재공간 규제화를 통해 하나의 모달리티가 신뢰할 수 없을 때의 견고성 확보.
메모리 사용을 줄이기 위한 새로운 FFD 기반 TPS로 고해상도 왜곡의 효율성 향상.
다양한 데이터 세트 및 도메인 외 시나리오를 포함한 일반화 입증

제안 방법

이미지 쌍에서 기하학적 키포인트/디스크립터 추출.
의미적 분지는 ResNet-18을 사용하여 다중 스케일의 의미 맵 생성.
기하 분기는 Neural Point Transformer를 사용하여 sparse 키포인트를 dense 기하 맵으로 변환.
셀별 최대풀링으로 그리드 정렬 기하 맵에 키포인트 특징을 투사.
AMoE(적응형 혼합 전문가)와 잠재공간 모듈러리티(MR)로 모달리티 융합.
VRAM 사용을 줄이고 추론 속도를 높이는 FFD 기반 TPS로 글로벌-로컬 왜곡 예측

실험 결과

연구 질문

RQ1의미적 특징과 기하학적 특징을 효과적으로 통합하여 이미지 스티칭의 견고성과 품질을 향상시킬 수 있는가?
RQ2정렬되지 않은 키포인트를 의미 맵과 정렬된 격자 기하 표현으로 변환하여 의미 맵과 정렬되게 만들 수 있는가?
RQ3모달리티 인지형 전문가를 통한 적응형 융합이 어려운 장면에서 또는 한 모달리티가 불신뢰할 때 성능을 향상시키는가?
RQ4고해상도 왜곡을 품질 저하 없이 효율적으로 계산할 수 있는가?

주요 결과

UniStitch는 도메인 내/외 데이터에서 최첨단 방법보다 우수한 성능을 보여 더 높은 mPSNR 및 mSSIM 점수를 달성한다.
AMoE 기반 융합은 의미적 신호와 기하학적 신호의 균형을 효과적으로 맞추고, MR은 모달리티 저하 시 견고성을 향상시킨다.
FFD 기반 TPS는 고해상도 스티칭에서 메모리 사용을 크게 줄이고 속도를 증가시키며 정렬 품질에 악영향을 주지 않는다.
매칭된 키포인트(디스크립터 포함)를 사용하는 것이 원시 키포인트보다 더 나은 결과를 낳고, 학습된 기하 특징은 특히 도전적인 장면에서 강력한 이점을 제공한다.
다양한 기하학적 사전지식(SIFT, SURF, ORB, SuperPoint, 매칭 포함)을 포함시키면 데이터셋 전반에 걸쳐 보편적인 이득이 나타난다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.