QUICK REVIEW

[논문 리뷰] Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern Analysis

Jing Wu, David Pichler|arXiv (Cornell University)|2023. 03. 04.

Smart Agriculture and AI인용 수 11

한 줄 요약

본 논문은 Agriculture-Vision을 raw 풀필드 영상과 대규모 비라벨 데이터로 자체지도 학습(pre-training) 가능하도록 확장하고, Pixel-to-Propagation Module을 MoCo-V2에 통합하며, CNN과 Swin Transformer 백본을 농업 패턴 분석 작업에서 벤치마크합니다.

ABSTRACT

A key challenge for much of the machine learning work on remote sensing and earth observation data is the difficulty in acquiring large amounts of accurately labeled data. This is particularly true for semantic segmentation tasks, which are much less common in the remote sensing domain because of the incredible difficulty in collecting precise, accurate, pixel-level annotations at scale. Recent efforts have addressed these challenges both through the creation of supervised datasets as well as the application of self-supervised methods. We continue these efforts on both fronts. First, we generate and release an improved version of the Agriculture-Vision dataset (Chiu et al., 2020b) to include raw, full-field imagery for greater experimental flexibility. Second, we extend this dataset with the release of 3600 large, high-resolution (10cm/pixel), full-field, red-green-blue and near-infrared images for pre-training. Third, we incorporate the Pixel-to-Propagation Module Xie et al. (2021b) originally built on the SimCLR framework into the framework of MoCo-V2 Chen et al.(2020b). Finally, we demonstrate the usefulness of this data by benchmarking different contrastive learning approaches on both downstream classification and semantic segmentation tasks. We explore both CNN and Swin Transformer Liu et al. (2021a) architectures within different frameworks based on MoCo-V2. Together, these approaches enable us to better detect key agricultural patterns of interest across a field from aerial imagery so that farmers may be alerted to problematic areas in a timely fashion to inform their management decisions. Furthermore, the release of these datasets will support numerous avenues of research for computer vision in remote sensing for agriculture.

연구 동기 및 목표

농업에서의 의미론적 분할을 위한 대규모 정확히 라벨링된 원격 탐지 데이터의 부족 문제 해결.
프리트레이닝과 평가를 위한 확장된 원시 풀필드 데이터셋(AV+) 제공.
다양한 백본(CNN 및 Swin Transformer)을 이용한 농업 패턴 분석 작업에서 자체지도 학습 접근 방식 벤치마킹.
Pixel-to-Propagation Module(PPM)을 MoCo-V2에 도입하고 AV+에 Temporal Contrast 방법을 적용하여 밀집 예측(task)을 개선.

제안 방법

프리트레이닝용 원시 RGB 및 NIR 영상이 포함된 풀필드 AV+ 데이터(3600장, 해상도 10 cm/픽셀 GSD) 공개.
다중 채널(RGB+NIR) 프리트레이닝에 대해 MoCo-V2를 인스턴스 수준 대조로 적응.
픽셀 수준의 프리텍스트 작업을 위한 Pixel-to-Propagation Module(PPM) 도입 및 밀집 표현을 위한 PixPro 손실 정의.
다중 시계 AV+ 데이터를 활용하는 Temporal Contrast(TemCo) 도입 및 PPM과의 결합(TemCo-PixPro)으로 밀집 예측 개선.
MoCo 기반 프리트레이닝과 다중 헤드 투영을 갖춘 Swin Transformer 백본(Swin-T) 탐구, 시계열 및 픽셀 수준 작업에 적용.
고정 인코더와 미세 조정된 인코더로 AV+에서 분류 및 의미론적 분할의 두 가지 다운스트림 벤치마크를 사용.

실험 결과

연구 질문

RQ1RAW 풀필드 영상과 비라벨 데이터가 농업 패턴 분석을 위한 사전학습에 어떻게 기여하는가?
RQ2MoCo-V2, MoCo-PixPro, TemCo 및 TemCo-PixPro가 CNN 및 Swin 백본에서 다운스트림 분류 및 분할 작업에 어떠한 이점을 제공하는가?
RQ3PPM과 다중 시계 대비가 항공 농업 영상의 밀집 예측을 향상시키는가?
RQ4RGB vs RGB+NIR 채널이 프리트레이닝 및 다운스트림 작업의 성능에 어떤 영향을 미치는가?
RQ5AV+-사전학습 모델의 관련 원격탐지 작업(EuroSAT 등) 및 AV+ 내의 미세 분할로의 전달성은 어떠한가?

주요 결과

SSL 프리트레이닝용 원시 풀필드 이미지 3600장을 포함하는 AV+의 출시(비라벨 데이터가 3TB 이상).
MoCo-PixPro 및 TemCo-PixPro가 MoCo-V2 및 ImageNet 초기화에 비해 다운스트림 분할 및 분류를 일관되게 향상시키며, 특히 백본이 작고 고정 인코더인 경우 더 큰 이점.
Swin-T 기반 MoCo 변형은 전체 미세 조정 시 분할에서 강력한 성능을 보이며, 여러 설정에서 ImageNet 초기화 백본보다 우수한 성능을 보임.
PPM을 통한 픽셀 수준 프리텍스트 작업이 분할 결과를 개선하며, 백본 용량이 증가함에 따라 효과가 커짐(ResNet-18에서 Swin-T까지).
다중 시점 AV+ 영상(TemCo)을 활용한 시간 기반 대비와 이를 PPM과 결합한 TemCo-PixPro가 패턴 분석의 시간 민감도에서 이득을 제공.
Agriculture-Vision 벤치마크와 비교할 때 Swin-T 및 SSL 프리트레이닝 접근법이 여러 구성에서 평균 IoU를 더 높게 달성하며, 특히 RGBN 채널에서 우수한 성능을 보임.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.