QUICK REVIEW

[論文レビュー] Vision-Language Agents for Interactive Forest Change Analysis

James A. Brock, Ce Zhang|arXiv (Cornell University)|Jan 8, 2026

Remote-Sensing Image Classification被引用数 0

ひとこと要約

この論文は、森林変化の対話型解析を行うマルチレベル変化解釈ビジョン-言語バックボーンを備えたLLM駆動エージェントを提案し、Forest-Changeデータセットを導入する。

ABSTRACT

Modern forest monitoring workflows increasingly benefit from the growing availability of high-resolution satellite imagery and advances in deep learning. Two persistent challenges in this context are accurate pixel-level change detection and meaningful semantic change captioning for complex forest dynamics. While large language models (LLMs) are being adapted for interactive data exploration, their integration with vision-language models (VLMs) for remote sensing image change interpretation (RSICI) remains underexplored. To address this gap, we introduce an LLM-driven agent for integrated forest change analysis that supports natural language querying across multiple RSICI tasks. The proposed system builds upon a multi-level change interpretation (MCI) vision-language backbone with LLM-based orchestration. To facilitate adaptation and evaluation in forest environments, we further introduce the Forest-Change dataset, which comprises bi-temporal satellite imagery, pixel-level change masks, and multi-granularity semantic change captions generated using a combination of human annotation and rule-based methods. Experimental results show that the proposed system achieves mIoU and BLEU-4 scores of 67.10% and 40.17% on the Forest-Change dataset, and 88.13% and 34.41% on LEVIR-MCI-Trees, a tree-focused subset of LEVIR-MCI benchmark for joint change detection and captioning. These results highlight the potential of interactive, LLM-driven RSICI systems to improve accessibility, interpretability, and efficiency of forest change analysis. All data and code are publicly available at https://github.com/JamesBrockUoB/ForestChat.

研究の動機と目的

正確なピクセルレベルの森林変化検出と森林ダイナミクスの意味的キャプション作成の意味を持つ解釈。
複数のリモートセンシング変化解釈タスクに対する自然言語での問合せを可能にする。
森林環境での評価を促進するデータセットとベンチマークを提供する。

提案手法

マルチレベル変化解釈（MCI）ビジョン-言語バックボーンを開発する。
LLMベースのオーケストレーションを組み込み、インタラクティブRSICIワークフローを推進する。
ピクセルレベルのマスクと意味的変化キャプションを含む bi-temporal 画像を用いたForest-Changeデータセットを作成する。
Forest-Changeおよび LEVIR-MCI-Trees ベンチマークで変化検出とキャプション作成を評価する。
再現性と適応性を可能にするためデータとコードを公開する。

実験結果

リサーチクエスチョン

RQ1検出とキャプション作成を横断してRSICIタスクを効果的に調整できるLLM駆動エージェントは可能か。
RQ2森林文脈におけるピクセルレベルの変化検出と意味的キャプション作成において、LLMオーケストレーション付きMCI-VLMはどの程度機能するか。
RQ3Forest-Changeデータセットが対話型森林変化解析のベンチマーキングに与える影響は何か。

主な発見

Dataset	mIoU	BLEU-4
Forest-Change	67.10%	40.17%
LEVIR-MCI-Trees	88.13%	34.41%

Forest-Changeでシステムは67.10%のmIoUと40.17%のBLEU-4を達成。
LEVIR-MCI-Treesでシステムは88.13%のmIoUと34.41%のBLEU-4を達成。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。