QUICK REVIEW

[论文解读] A High-Level Survey of Optical Remote Sensing

Panagiotis Koletsis, Efthymiou, Vasilis|arXiv (Cornell University)|Feb 19, 2026

UAV Applications and Optimization被引用 0

一句话总结

本综述提供了一个以模态为中心的RGB光学遥感总览，涵盖主任务、数据集、基准以及新兴趋势，如基础模型和多模态视觉-语言方法。

ABSTRACT

In recent years, significant advances in computer vision have also propelled progress in remote sensing. Concurrently, the use of drones has expanded, with many organizations incorporating them into their operations. Most drones are equipped by default with RGB cameras, which are both robust and among the easiest sensors to use and interpret. The body of literature on optical remote sensing is vast, encompassing diverse tasks, capabilities, and methodologies. Each task or methodology could warrant a dedicated survey. This work provides a comprehensive overview of the capabilities of the field, while also presenting key information, such as datasets and insights. It aims to serve as a guide for researchers entering the field, offering high-level insights and helping them focus on areas most relevant to their interests. To the best of our knowledge, no existing survey addresses this holistic perspective.

研究动机与目标

提供一个统一的、以RGB为焦点的光学遥感总览，覆盖多任务与应用。
总结公开可用的RGB ORS任务数据集和基准。
识别当前趋势，如基础模型，并阐明开放的研究议题与差距。

提出的方法

将ORS任务分为分类、检测、分割、变化检测、视觉–语言、编辑、计数及其它任务。
回顾每个任务公开可用的数据集及其特征（分辨率、领域、类别、规模）。
综合最近的方法学趋势，包括CNN、Transformer、混合架构和基础模型。
分析 popular 数据集上各任务的最先进性能，以推导架构与方法论的洞见。
讨论RGB为基础的ORS的开放研究问题与未来方向。

Figure 1: Main tasks in optical remote sensing.

实验结果

研究问题

RQ1RGB基础光学遥感的核心任务是什么，如何组织？
RQ2每个RGB ORS任务最具代表性的数据集与基准有哪些？
RQ3哪些架构趋势（CNN、Transformer、混合）在各任务中占主导地位，基础模型如何影响RGB ORS？
RQ4在RGB遥感中，当前的差距与开放研究方向是什么，包括效率、鲁棒性和多任务学习？

主要发现

Task	Dataset	Metric	Method	Value
Classification	NWPU-RESISC45	OA	MGDNet	91.41
Classification	LoveDA	mIoU	GeRSP	50.56
Classification	LoveDA	mIoU	EMRT	50.89
Horizontal Object Detection	DIOR	mAP 50	FSoD-Net	71.80
Horizontal Object Detection	DIOR	mAP 50	RSADet	72.20
Oriented Object Detection	DOTA v1	mAP 50	S2ANet	79.42
Oriented Object Detection	DOTA v2	mAP 50	DODNet	80.66
Semantic Segmentation	LoveDA	mIoU	GeRSP	50.56
Instance Segmentation	WHU-Building	mIoU	MRANet	90.59
Binary Change Detection	s2looking	mIoU	SAM-CD	48.29
Semantic Change Detection	second	mIoU	HGINet	70.76
Image Captioning	RS5M	-	RS5M	-
Visual Grounding	RS5M-RSVG	-	RSVG	-

CNN在具有局部模式和高效性任务中表现出色，而Transformer擅长全局上下文和异质场景。
混合CNN–Transformer架构由于互补优势，在多任务中日益占主导地位。
基础模型在遥感领域出现，提出了若干RGB域模型和多模态能力，但针对具体任务的监督训练仍然强劲。
视觉–语言任务与Transformer设计相吻合，并受益于大规模预训练和多模态对齐策略。
RGB图像仍在平台上占主导，推动对可扩展和通用学习框架的需求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。