QUICK REVIEW

[論文レビュー] A High-Level Survey of Optical Remote Sensing

Panagiotis Koletsis, Efthymiou, Vasilis|arXiv (Cornell University)|Feb 19, 2026

UAV Applications and Optimization被引用数 0

ひとこと要約

このサーベイはRGB光学リモートセンシングのモダリティ中心の概要を提供し、主なタスク、データセット、ベンチマーク、 foundation modelsやマルチモーダル視覚言語アプローチなどの新興トレンドを網羅します。

ABSTRACT

In recent years, significant advances in computer vision have also propelled progress in remote sensing. Concurrently, the use of drones has expanded, with many organizations incorporating them into their operations. Most drones are equipped by default with RGB cameras, which are both robust and among the easiest sensors to use and interpret. The body of literature on optical remote sensing is vast, encompassing diverse tasks, capabilities, and methodologies. Each task or methodology could warrant a dedicated survey. This work provides a comprehensive overview of the capabilities of the field, while also presenting key information, such as datasets and insights. It aims to serve as a guide for researchers entering the field, offering high-level insights and helping them focus on areas most relevant to their interests. To the best of our knowledge, no existing survey addresses this holistic perspective.

研究の動機と目的

複数のタスクと応用を横断するRGBフォーカスの統一的な光学リモートセンシングの概要を提供する。
RGB ORSタスクの公開データセットとベンチマークを要約する。
foundation modelsのような現在のトレンドを特定し、未解決の研究トピックとギャップを明確にする。

提案手法

ORSタスクを分類、検出、セグメンテーション、変化検出、ビジョン–言語、編集、カウント、その他のタスクに分類する。
各タスクごとに公開されているデータセットとその特徴（解像度、領域、クラス、サイズ）をレビューする。
CNN、トランスフォーマー、ハイブリッドアーキテクチャ、foundation modelsを含む最近の方法論的動向を統合する。
人気データセットでのタスク別最先端性能を分析し、アーキテクチャと方法論の洞察を導く。
RGBベースのORSにおけるオープンな研究課題と将来の方向性を論じる。

Figure 1: Main tasks in optical remote sensing.

実験結果

リサーチクエスチョン

RQ1コアなRGBベースの光学リモートセンシングタスクは何で、それらはどのように整理されるべきか？
RQ2各RGB ORSタスクに最も代表的なデータセットとベンチマークは何か？
RQ3どのアーキテクチャ動向（CNN、トランスフォーマー、ハイブリッド）はタスクを横断して支配的で、 foundation modelsはRGB ORSにどのような影響を与えるか？
RQ4RGBリモートセンシングにおける現在のギャップと未解決の研究方向は何か（効率性、堅牢性、多タスク学習を含む）？

主な発見

Task	Dataset	Metric	Method	Value
Classification	NWPU-RESISC45	OA	MGDNet	91.41
Classification	LoveDA	mIoU	GeRSP	50.56
Classification	LoveDA	mIoU	EMRT	50.89
Horizontal Object Detection	DIOR	mAP 50	FSoD-Net	71.80
Horizontal Object Detection	DIOR	mAP 50	RSADet	72.20
Oriented Object Detection	DOTA v1	mAP 50	S2ANet	79.42
Oriented Object Detection	DOTA v2	mAP 50	DODNet	80.66
Semantic Segmentation	LoveDA	mIoU	GeRSP	50.56
Instance Segmentation	WHU-Building	mIoU	MRANet	90.59
Binary Change Detection	s2looking	mIoU	SAM-CD	48.29
Semantic Change Detection	second	mIoU	HGINet	70.76
Image Captioning	RS5M	-	RS5M	-
Visual Grounding	RS5M-RSVG	-	RSVG	-

CNNは局所的パターンと効率性の高いタスクで優れ、トランスフォーマーはグローバルコンテキストと異質なシーンを扱う。
ハイブリッドCNN–トランスフォーマーアーキテクチャは補完的な強みのためタスクを横断して支配的になる。
foundation modelsはRSで登場しており、RGBドメインモデルとマルチモーダル機能の提案がいくつかある一方で、タスク固有の教師あり学習は依然強力。
Vision–languageタスクはトランスフォーマー設計と一致し、大規模な事前学習とマルチモーダル整列戦略の恩恵を受ける。
RGB画像は引き続きプラットフォームを支配しており、スケーラブルで一般化可能な学習フレームワークの重要性を促す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。