QUICK REVIEW

[论文解读] Sparse Data Tree Canopy Segmentation: Fine-Tuning Leading Pretrained Models on Only 150 Images

David Szczecina, Hudson Sun|arXiv (Cornell University)|Jan 16, 2026

Remote Sensing in Agriculture被引用 0

一句话总结

该研究在极端数据稀缺条件下评估了五种现代架构（YOLOv11、Mask R-CNN、DeepLabV3、Swin-UNet、DINOv2）用于树冠分割的性能，发现基于CNN的模型（尤其是YOLOv11和Mask R-CNN）在对150张图像进行微调后优于基于Transformer的模型。

ABSTRACT

Tree canopy detection from aerial imagery is an important task for environmental monitoring, urban planning, and ecosystem analysis. Simulating real-life data annotation scarcity, the Solafune Tree Canopy Detection competition provides a small and imbalanced dataset of only 150 annotated images, posing significant challenges for training deep models without severe overfitting. In this work, we evaluate five representative architectures, YOLOv11, Mask R-CNN, DeepLabv3, Swin-UNet, and DINOv2, to assess their suitability for canopy segmentation under extreme data scarcity. Our experiments show that pretrained convolution-based models, particularly YOLOv11 and Mask R-CNN, generalize significantly better than pretrained transformer-based models. DeeplabV3, Swin-UNet and DINOv2 underperform likely due to differences between semantic and instance segmentation tasks, the high data requirements of Vision Transformers, and the lack of strong inductive biases. These findings confirm that transformer-based architectures struggle in low-data regimes without substantial pretraining or augmentation and that differences between semantic and instance segmentation further affect model performance. We provide a detailed analysis of training strategies, augmentation policies, and model behavior under the small-data constraint and demonstrate that lightweight CNN-based methods remain the most reliable for canopy detection on limited imagery.

研究动机与目标

评估哪些当代架构在极其有限的数据条件下更好地处理树冠分割；
分析归纳偏置、预训练和模型容量对小数据遥感任务泛化的影响；
在数据稀缺条件下比较实例分割与语义分割方法，以指导冠层映射的模型选择。

提出的方法

比较五种架构：YOLOv11 Seg、Mask R-CNN、DeepLabV3、Swin-UNet 和 DINOv2。
在Solafune 150图像冠层数据集上以4:1的训练/验证划分微调预训练权重。
在冻结骨干网络的同时，将DINOv2骨干改为密集分割头。
使用验证集的像素级准确率和隐藏测试集的加权 mAP（基于IoU）进行评估。
提供定性结果以及在小数据约束下的训练动态分析。

实验结果

研究问题

RQ1在极端数据稀缺下，哪些现代架构在冠层分割的实例分割性能上表现最好？
RQ2归纳偏置（CNN 与 Transformer）和预训练方案如何影响在150张遥感数据集上的泛化？
RQ3为何在此小数据冠层任务中，语义分割模型在实例级指标上表现不佳？

主要发现

基于CNN的模型（YOLOv11 和 Mask R-CNN）在测试集上实现了比基于Transformer的模型更高的加权mAP。
YOLOv11 Large在测试mAP达到最高0.281，较大的变体通常比小型变体表现更好。
Mask R-CNN在测试集上达到0.219的测试mAP，表现出稳定的训练和较好的泛化。
DeepLabV3、Swin-UNet 和DINOv2在测试mAP方面表现不佳，原因在于语义分割与实例分割的差异以及Transformer对数据的高需求。
验证mAP高估了泛化，因为验证集较小；更大型的CNN能够更好地捕捉任务的复杂性。
定性结果表明基于CNN的架构对区域级假阴性不那么敏感。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。