QUICK REVIEW

[論文レビュー] GRAD-Former: Gated Robust Attention-based Differential Transformer for Change Detection

Durgesh Ameta, Ujjwal Mishra|arXiv (Cornell University)|Mar 1, 2026

Remote-Sensing Image Classification被引用数 0

ひとこと要約

GRAD-FormerはAdaptive Feature Relevance And Refinement（AFRAR）と差分アテンションを備えたSiameseトランスフォーマーベースの変化検出モデルを提示し、非常に高解像度のリモートセンシング画像での変化を効率的に検出。パラメータ数を抑えつつ最先端の結果を達成。

ABSTRACT

Change detection (CD) in remote sensing aims to identify semantic differences between satellite images captured at different times. While deep learning has significantly advanced this field, existing approaches based on convolutional neural networks (CNNs), transformers and Selective State Space Models (SSMs) still struggle to precisely delineate change regions. In particular, traditional transformer-based methods suffer from quadratic computational complexity when applied to very high-resolution (VHR) satellite images and often perform poorly with limited training data, leading to under-utilization of the rich spatial information available in VHR imagery. We present GRAD-Former, a novel framework that enhances contextual understanding while maintaining efficiency through reduced model size. The proposed framework consists of a novel encoder with Adaptive Feature Relevance and Refinement (AFRAR) module, fusion and decoder blocks. AFRAR integrates global-local contextual awareness through two proposed components: the Selective Embedding Amplification (SEA) module and the Global-Local Feature Refinement (GLFR) module. SEA and GLFR leverage gating mechanisms and differential attention, respectively, which generates multiple softmax heaps to capture important features while minimizing the captured irreverent features. Multiple experiments across three challenging CD datasets (LEVIR-CD, CDD, DSIFN-CD) demonstrate GRAD-Former's superior performance compared to existing approaches. Notably, GRAD-Former outperforms the current state-of-the-art models across all the metrics and all the datasets while using fewer parameters. Our framework establishes a new benchmark for remote sensing change detection performance. Our code will be released at: https://github.com/Ujjwal238/GRAD-Former

研究の動機と目的

ノイズと既存のCNN/Transformer/SSMアプローチにおける特徴の非効率な使用を解決することで、VHRリモートセンシングにおける堅牢な変化検出（CD）を動機づける。
AFRAR（SEAとGLFR）を備えたGRAD-Formerを提案し、ノイズをフィルタリングしグローバル-ローカルの文脈を捉える。
差分統合（DA）を導入して、前変化・後変化・差分のセマンティック特徴を統合する。
複数の公開データセットにおいて、パラメータ数を抑えつつ最先端のCD性能を実証する。

提案手法

エンコーダー、フュージョンモジュール、デコーダーを備えたSiameseトランスフォーマー型CDフレームワークを提案。
AFRARモジュールを導入し、特徴をSEAとGLFRブランチに分割して選択的増幅と差分アテンションを行う。
SEAはL2正規化と学習可能パラメータを用いたゲート付き埋め込みで、関連チャンネルを増幅する。
GLFRは差分マルチヘッドアテンションを使用し、2つのソフトマックスヒープを学習可能なスカラーでスケーリングして、希薄でノイズに強いアテンションマップを作成する。
Differential Amalgamation（DA）は前変化、後変化、およびその差分を連結し、1x1畳み込みで特徴を融合する。
デコーダーはマルチステージの融合特徴を統合し、転置畳み込みでアップサンプル、残差ブロックを含み、2値の変化マップを出力する。

実験結果

リサーチクエスチョン

RQ1AFRARはVHR CDにおけるノイズと無関係な情報を効果的にフィルタリングし、変化領域の描画を改善できるか。
RQ2GLFRにおける差分アテンションは、グローバル-ローカルの文脈モデリングを改善しつつ計算量を削減できるか。
RQ3GRAD-Former全体のアーキテクチャは、標準ベンチマークにおいて少ないパラメータ数で優れたCD精度を達成できるか。

主な発見

Type	Method	Publication	CDD F1	CDD IoU	CDD OA	DSIFN-CD F1	DSIFN-CD IoU	DSIFN-CD OA	LEVIR-CD F1	LEVIR-CD IoU	LEVIR-CD OA
Transformer-based	GRAD-Former	-	97.57	95.26	99.43	93.14	87.16	97.65	91.52	84.36	99.14

GRAD-FormerはLEVIR-CD、DSIFN-CD、CDDデータセットで最先端の性能を達成。
モデルは報告されたすべての指標で既存手法を上回り、パラメータ数を抑えている。
アブレーション解析はAFRAR（SEAとGLFR）およびDAモジュールの性能向上への寄与を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。