QUICK REVIEW

[논문 리뷰] Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement

Hongying Zhang, ShuaiShuai Ma|arXiv (Cornell University)|2026. 03. 03.

Robotics and Sensor-Based Localization인용 수 0

한 줄 요약

SFDE를 제안하는, 시공간 및 주파수 도메인 표현을 함께 학습하는 세 가지 가지 분기 네트워크인 SFDE를 제안하여 뷰포인트 변화에 대한 강인성을 높인다.

ABSTRACT

Cross-view geo-localization (CVGL) aims to establish spatial correspondences between images captured from significantly different viewpoints and constitutes a fundamental technique for visual localization in GNSS-denied environments. Nevertheless, CVGL remains challenging due to severe geometric asymmetry, texture inconsistency across imaging domains, and the progressive degradation of discriminative local information. Existing methods predominantly rely on spatial domain feature alignment, which is inherently sensitive to large scale viewpoint variations and local disturbances. To alleviate these limitations, this paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), which leverages complementary representations from spatial and frequency domains. SFDE adopts a three branch parallel architecture to model global semantic context, local geometric structure, and statistical stability in the frequency domain, respectively, thereby characterizing consistency across domains from the perspectives of scene topology, multiscale structural patterns, and frequency invariance. The resulting complementary features are jointly optimized in a unified embedding space via progressive enhancement and coupled constraints, enabling the learning of cross-view representations with consistency across multiple granularities. Comprehensive experiments show that SFDE achieves competitive performance and in many cases even surpasses state-of-the-art methods, while maintaining a lightweight and computationally efficient design. {Our code is available at https://github.com/Mashuaishuai669/SFDE

연구 동기 및 목표

기하학적 비대칭성과 질감 불일치로 인한 교차 뷰 지오로컬라이제이션 문제를 해결한다.
상보적인 시공간 및 주파수 도메인 표현을 활용하여 교차 뷰 매칭을 개선한다.
글로벌 시맨틱스, 로컬 기하학, 주파수 안정성을 통합하는 다계층 공동 학습 프레임워크를 개발한다.
로컬 텍스처에서 중간 수준 패턴에 이르는 구조를 포착하는 다중 스케일 기하학적 모델링 접근법을 도입한다.
가볍고 효율적인 아키텍처로 경쟁력 있는 성능을 입증한다.

제안 방법

글로벌 시맨틱 일관성 분기 (GSCB), 로컬 기하학적 민감도 분기 (LGSB), 및 주파수 안정성 조정 분기 (FSAB)를 갖춘 세-브랜치 SFDE 네트워크.
ConvNeXt-Tiny 백본이 모든 분기에 공유 특징을 제공한다.
GSCB는 글로벌 풀링과 글로벌 시맨틱 앵커를 위한 다변화 임베딩 분류기를 사용한다.
LGSB는 다중 스케일 확장 합성곱, 상호작용 어텐션, 적응형 공간 피라미드 풀링을 활용하여 다중 스케일 기하학을 모델링한다.
FSAB는 진폭 스펙트럼과 위상 스펙트럼을 분리하고, 적응형 주파수 재가중을 적용하며, 주파수 도메인에서 어텐션 및 GELU 기반 융합을 사용한다.
공동 최적화 동안 교차 엔트로피, 대조 학습, 및 교차 도메인 정렬 손실이 분기를 감독한다.

실험 결과

연구 질문

RQ1심한 뷰포인트 변화에서 공간 및 주파수 도메인 프레임워크의 결합이 CVGL 강인성을 향상시킬 수 있는가?
RQ2전역 시맨틱, 로컬 기하학, 주파수 안정성 신호가 교차 뷰 임베딩 학습에서 서로 어떻게 보완하는가?
RQ3다중 스케일 기하학적 모델링이 UAV-위성 위치추정에서 로컬-글로벌 일관성을 향상시키는가?
RQ4적응형 주파수 강조가 교차 도메인 이미지 쌍 간의 구별성을 높일 수 있는가?

주요 결과

SFDE는 경쟁력 있는 성능을 달성하며 일부 상황에서 최신 방법을 상회한다.
세-브랜치 설계가 다양한 수준의 정보를 포착하여 교차 뷰 정렬을 개선한다.
경량 ConvNeXt-Tiny 백본과 다중 스케일 및 주파수 도메인 강화가 효율성과 정확도 사이의 균형을 이룬다.
LGSB는 다중 스케일 확장 합성곱과 적응형 풀링을 통해 시점 왜곡과 스케일 변화에 대한 강인성을 향상시킨다.
FSAB는 진폭 및 위상 스펙트럼을 적응형 재가중으로 활용하여 교차 도메인 매칭을 안정화한다.
이 아키텍처는 강력한 로컬라이제이션 성능을 제공하면서 계산 효율성을 유지한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.