QUICK REVIEW

[論文レビュー] Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement

Hongying Zhang, ShuaiShuai Ma|arXiv (Cornell University)|Mar 3, 2026

Robotics and Sensor-Based Localization被引用数 0

ひとこと要約

SFDEを提案する。三つのブランチを持つネットワークが空間および周波数領域表現を共同学習し、視点変化に対する頑健性を向上させるcross-view geolocationの手法。

ABSTRACT

Cross-view geo-localization (CVGL) aims to establish spatial correspondences between images captured from significantly different viewpoints and constitutes a fundamental technique for visual localization in GNSS-denied environments. Nevertheless, CVGL remains challenging due to severe geometric asymmetry, texture inconsistency across imaging domains, and the progressive degradation of discriminative local information. Existing methods predominantly rely on spatial domain feature alignment, which is inherently sensitive to large scale viewpoint variations and local disturbances. To alleviate these limitations, this paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), which leverages complementary representations from spatial and frequency domains. SFDE adopts a three branch parallel architecture to model global semantic context, local geometric structure, and statistical stability in the frequency domain, respectively, thereby characterizing consistency across domains from the perspectives of scene topology, multiscale structural patterns, and frequency invariance. The resulting complementary features are jointly optimized in a unified embedding space via progressive enhancement and coupled constraints, enabling the learning of cross-view representations with consistency across multiple granularities. Comprehensive experiments show that SFDE achieves competitive performance and in many cases even surpasses state-of-the-art methods, while maintaining a lightweight and computationally efficient design. {Our code is available at https://github.com/Mashuaishuai669/SFDE

研究の動機と目的

幾何的不対称性と質感不一致によるクロスビュー地理的位置推定の課題に対処する。
相補的な空間領域と周波数領域の表現を活用してクロスビューマッチングを改善する。
グローバル意味論、局所幾何、および周波数安定性を統合した多層結合学習フレームワークを開発する。
局所テクスチャから中レベルパターンまでの構造を捉える多スケール幾何モデリング手法を導入する。
軽量で効率的なアーキテクチャで競争力のある性能を示す。

提案手法

Global Semantic Consistency Branch (GSCB)、Local Geometric Sensitivity Branch (LGSB)、Frequency Stability Alignment Branch (FSAB) を備えた三分岐SFDEネットワーク。
ConvNeXt-Tiny バックボーンが全ブランチに共通特徴を提供。
GSCB はグローバルプーリングと多様化埋め込み分類子を用いてグローバルセマンティックアンカーを得る。
LGSB はマルチスケール拡張畳み込み、相互作用注意、適応的空間ピラミッドプーリングを用いて多スケール幾何をモデリング。
FSAB は振幅スペクトルと位相スペクトルを分割し、適応的周波数再重み付けを適用し、周波数領域で注意機構とGELUベースの融合を用いる。
共同最適化中にブランチを監督するのはクロスエントロピー、コントラスト学習、クロスドメイン整合性損失。

実験結果

リサーチクエスチョン

RQ1極端な視点変化下で spatial および frequency domain フレームワークを組み合わせることでCVGLの頑健性を改善できるか。
RQ2グローバル意味論、局所幾何、周波数安定性の手がかりはクロスビュー埋め込み学習で互いを補完するのか。
RQ3多スケール幾何モデリングは UAV-to-satellite ローカリゼーションの局所-to-global 一致性を高めるか。
RQ4適応的周波数強調はクロスドメイン画像ペア間の識別性を向上させるか。

主な発見

SFDEは競争力のある性能を達成し、ある状況では最先端手法を上回る。
三分岐設計は複数の粒度で情報を捉え、視点間の整合性を改善する。
軽量なConvNeXt-Tinyバックボーンに多スケールと周波数領域の強化を組み合わせ、効率と精度のバランスを実現。
LGSB は多スケール拡張畳み込みと適応プーリングを通じて視点歪みとスケール変動への頑健性を向上。
FSAB は振幅と位相スペクトルを適応的に再重み付けしてクロスドメインマッチングを安定化。
アーキテクチャは計算効率を維持しつつ強力なローカリゼーション性能を提供。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。