QUICK REVIEW

[論文レビュー] Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern Analysis

Jing Wu, David Pichler|arXiv (Cornell University)|Mar 4, 2023

Smart Agriculture and AI被引用数 11

ひとこと要約

本論文は Agriculture-Vision を、生データの全域画像と大規模なラベルなしデータを用いた自己教師あり事前学習で拡張し、Pixel-to-Propagation Module を MoCo-V2 に統合し、農業パターン分析タスクで CNN および Swin Transformer のバックボーンを評価する。

ABSTRACT

A key challenge for much of the machine learning work on remote sensing and earth observation data is the difficulty in acquiring large amounts of accurately labeled data. This is particularly true for semantic segmentation tasks, which are much less common in the remote sensing domain because of the incredible difficulty in collecting precise, accurate, pixel-level annotations at scale. Recent efforts have addressed these challenges both through the creation of supervised datasets as well as the application of self-supervised methods. We continue these efforts on both fronts. First, we generate and release an improved version of the Agriculture-Vision dataset (Chiu et al., 2020b) to include raw, full-field imagery for greater experimental flexibility. Second, we extend this dataset with the release of 3600 large, high-resolution (10cm/pixel), full-field, red-green-blue and near-infrared images for pre-training. Third, we incorporate the Pixel-to-Propagation Module Xie et al. (2021b) originally built on the SimCLR framework into the framework of MoCo-V2 Chen et al.(2020b). Finally, we demonstrate the usefulness of this data by benchmarking different contrastive learning approaches on both downstream classification and semantic segmentation tasks. We explore both CNN and Swin Transformer Liu et al. (2021a) architectures within different frameworks based on MoCo-V2. Together, these approaches enable us to better detect key agricultural patterns of interest across a field from aerial imagery so that farmers may be alerted to problematic areas in a timely fashion to inform their management decisions. Furthermore, the release of these datasets will support numerous avenues of research for computer vision in remote sensing for agriculture.

研究の動機と目的

農業分野の semantic segmentation における大規模で正確にラベル付けされたリモート sensing データの不足に対処する。
前学習と評価のための拡張された、生データの全域データセット（AV+）を提供する。
多様なバックボーン（CNN および Swin Transformer）を用いた農業パターン分析タスクに対する自己教師あり学習アプローチをベンチマークする。
Pixel-to-Propagation Module（PPM）を MoCo-V2 に組み込み、AV+ へ Temporal Contrast の手法を適用して dense prediction タスクを向上させる。

提案手法

前処理用の生データ RGB および NIR 画像を含む全域 AV+ データを公開し、前学習を実施する（3600 枚、10 cm/pixel GSD）。
RGB+NIR を含む多チャネル前学習のために MoCo-V2 をインスタンスレベルの対比学習と共に適用する。
Pixel-to-Propagation Module（PPM）をピクセルレベルの事前タスクに統合し、密 Representation のための PixPro 損失を定義する。
多時相 AV+ データを活用するために temporal contrast（TemCo）を導入し、PPM と組み合わせた TemCo-PixPro を適用する。
MoCo ベースの事前学習とマルチヘッド projection を用いた Swin Transformer バックボーン（Swin-T）を探索し、時空間およびピクセルレベルのタスクに適用する。
固定エンコーダーと微調整済みエンコーダーを用いた AV+ の分類と semantic segmentation の2つの下流ベンチマークを使用する。

実験結果

リサーチクエスチョン

RQ1AV+ の生デ全域画像とラベルなしデータは、農業パターン分析のための前処理学習をどの程度改善するか。
RQ2MoCo-V2、MoCo-PixPro、TemCo、TemCo-PixPro は、CNN および Swin バックボーンの下流分類と segmentation タスクにどの程度の利得をもたらすか。
RQ3PPM と多時相コントラストを導入することで、空撮農業画像の密な予測は改善されるか。
RQ4RGB 対 RGB+NIR チャンネルは、前学習と下流タスクの性能にどのような影響を与えるか。
RQ5AV+-事前学習モデルの関連リモートセンシングタスク（例：EuroSAT）への転移性と AV+ 内の細分類セグメンテーションはどうなるか。

主な発見

AV+ の 3600 枚の生デ全域画像を用いた SSL 前学習の公開（3 TB を超えるラベルなしデータ）。
MoCo-PixPro および TemCo-PixPro は、MoCo-V2 および ImageNet 初期化よりも下流の segmentation および分類を一貫して向上させ、特に小さなバックボーンや固定エンコーダーの場合に効果的。
Swin-T ベースの MoCo 変種は、完全にファインチューニングした場合の segmentation で強い性能を示し、ImageNet 初期化のバックボーンよりもいくつかの設定で上回る。
PPM によるピクセルレベルの事前タスクは、バックボーン容量の増加（ResNet-18 から Swin-T）とともに segmentation 結果を改善する。
多時相 AV+ 画像を活用した temporal contrast（TemCo）と PP M の組み合わせ（TemCo-PixPro）は、パターン分析の時系列感度を高める。
Agriculture-Vision のベースラインと比較して、Swin-T および SSL 前学習アプローチは、複数の設定で mean IoU が高く、特に RGBN チャンネルで顕著である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。