Skip to main content
QUICK REVIEW

[论文解读] Vision-Language Agents for Interactive Forest Change Analysis

James A. Brock, Ce Zhang|arXiv (Cornell University)|Jan 8, 2026
Remote-Sensing Image Classification被引用 0
一句话总结

该论文提出了一种由大语言模型驱动的代理,具有多级变化解释的视觉-语言骨架,用于进行交互式森林变化分析,并引入了 Forest-Change 数据集。

ABSTRACT

Modern forest monitoring workflows increasingly benefit from the growing availability of high-resolution satellite imagery and advances in deep learning. Two persistent challenges in this context are accurate pixel-level change detection and meaningful semantic change captioning for complex forest dynamics. While large language models (LLMs) are being adapted for interactive data exploration, their integration with vision-language models (VLMs) for remote sensing image change interpretation (RSICI) remains underexplored. To address this gap, we introduce an LLM-driven agent for integrated forest change analysis that supports natural language querying across multiple RSICI tasks. The proposed system builds upon a multi-level change interpretation (MCI) vision-language backbone with LLM-based orchestration. To facilitate adaptation and evaluation in forest environments, we further introduce the Forest-Change dataset, which comprises bi-temporal satellite imagery, pixel-level change masks, and multi-granularity semantic change captions generated using a combination of human annotation and rule-based methods. Experimental results show that the proposed system achieves mIoU and BLEU-4 scores of 67.10% and 40.17% on the Forest-Change dataset, and 88.13% and 34.41% on LEVIR-MCI-Trees, a tree-focused subset of LEVIR-MCI benchmark for joint change detection and captioning. These results highlight the potential of interactive, LLM-driven RSICI systems to improve accessibility, interpretability, and efficiency of forest change analysis. All data and code are publicly available at https://github.com/JamesBrockUoB/ForestChat.

研究动机与目标

  • 解决像素级森林变化检测的准确性与森林动态的有意义的语义描述
  • 实现跨多个遥感变化解释任务的自然语言查询
  • 提供数据集和基准,以促进在森林环境中的评估

提出的方法

  • 开发一个多级变化解释 (MCI) 视觉-语言骨架
  • 结合基于LLM的编排来驱动交互式 RSICI 工作流
  • 创建 Forest-Change 数据集,包含双时相影像、像素级掩码和语义变化说明
  • 在 Forest-Change 和 LEVIR-MCI-Trees 基准上评估,以评估变化检测和描述
  • 公开数据与代码以实现复现和适配

实验结果

研究问题

  • RQ1LLM 驱动的代理是否能够有效协调跨检测与描述的 RSICI 任务?
  • RQ2带有 LLM 编排的 MCI-VLM 在森林场景的像素级变化检测和语义描述上表现如何?
  • RQ3Forest-Change 数据集对基准交互式森林变化分析有何影响?

主要发现

  • 在 Forest-Change 上,系统实现了 67.10% 的 mIoU 和 40.17% 的 BLEU-4。
  • 在 LEVIR-MCI-Trees 上,系统实现了 88.13% 的 mIoU 和 34.41% 的 BLEU-4。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。