QUICK REVIEW

[论文解读] Satellite-Based Detection of Looted Archaeological Sites Using Machine Learning

Girmaw Abebe Tadesse, Titien Bartette|arXiv (Cornell University)|Feb 23, 2026

Archaeological Research and Protection被引用 0

一句话总结

该论文系统地比较在原始PlanetScope影像上训练的端到端CNN与使用手工特征和基础模型嵌入的传统ML模型在阿富汗被盗考古遗址检测上的表现；ImageNet预训练的CNN在具有空间掩模的设置下达到最佳F1（0.926），远超最强的传统ML方案（0.710）。

ABSTRACT

Looting at archaeological sites poses a severe risk to cultural heritage, yet monitoring thousands of remote locations remains operationally difficult. We present a scalable and satellite-based pipeline to detect looted archaeological sites, using PlanetScope monthly mosaics (4.7m/pixel) and a curated dataset of 1,943 archaeological sites in Afghanistan (898 looted, 1,045 preserved) with multi-year imagery (2016--2023) and site-footprint masks. We compare (i) end-to-end CNN classifiers trained on raw RGB patches and (ii) traditional machine learning (ML) trained on handcrafted spectral/texture features and embeddings from recent remote-sensing foundation models. Results indicate that ImageNet-pretrained CNNs combined with spatial masking reach an F1 score of 0.926, clearly surpassing the strongest traditional ML setup, which attains an F1 score of 0.710 using SatCLIP-V+RF+Mean, i.e., location and vision embeddings fed into a Random Forest with mean-based temporal aggregation. Ablation studies demonstrate that ImageNet pretraining (even in the presence of domain shift) and spatial masking enhance performance. In contrast, geospatial foundation model embeddings perform competitively with handcrafted features, suggesting that looting signatures are extremely localized. The repository is available at https://github.com/microsoft/looted_site_detection.

研究动机与目标

解决在监测成千上万偏远考古遗址的盗窃监测中的可扩展性挑战。
将基于原始影像的端到端CNN分类器与使用手工特征和基础模型嵌入的传统ML方法进行比较。
量化ImageNet预训练和空间掩模对盗窃检测的收益。
构建并共享一个大规模、跨多年时间范围的盗窃与完好遗址数据集，并附带空间 footprint。」
method中的每项都翻译成中文
research_questions中的每项都翻译成中文
key_findings中的每项都翻译成中文
table_headers翻译为中文
table_rows保持数字和表格单元值原样，仅翻译表格标题与描述文本

提出的方法

使用2016–2023年的PlanetScope月度镶嵌影像（4.7 m/像素）来创建以1 km×1 km为中心的遗址区域补丁。
评估两大方法族：对RGB补丁进行端到端CNN分类，以及在手工特征加基础模型嵌入上的传统ML方法。
将人工标注的遗址 footprints 作为空间掩模纳入，以引导模型。
比较多种CNN骨干网络（ResNet-18/34/50、EfficientNet-B0/B1）在有无ImageNet预训练及是否使用掩模的条件下的表现。
评估多年的影像在时间上的聚合策略（平均、中位数、拼接、PCA）对检测性能的影响。

Figure 1 : Overview of the archaeological sites in Afghanistan in this work. The sites are composed of $1045$ preserved and $898$ looted sites.

实验结果

研究问题

RQ1端到端在原始RGB影像上训练的CNN与使用手工特征和基础模型嵌入的传统ML管线在被盗遗址检测中的比较？
RQ2ImageNet预训练和空间掩模对检测性能的影响？
RQ3在给定时间标签噪声的情况下，单年训练是否比多年训练更具鲁棒性，哪种聚合策略最能在时间上保留区分信息？
RQ4需要多大规模的数据集和多少年的影像才能在阿富汗稳健检测盗窃模式？
RQ5对盗窃检测最具信息量的特征或嵌入是什么？

主要发现

模型/配置	准确率	精确率	召回率	F1	AUROC
EfficientNet-B0	0.923 ± 0.018	0.913 ± 0.037	0.923 ± 0.017	0.918 ± 0.018	0.966 ± 0.015
EfficientNet-B1	0.925 ± 0.013	0.910 ± 0.034	0.933 ± 0.034	0.921 ± 0.014	0.970 ± 0.007
ResNet-18	0.927 ± 0.022	0.904 ± 0.031	0.943 ± 0.016	0.923 ± 0.022	0.968 ± 0.013
ResNet-34	0.917 ± 0.018	0.888 ± 0.038	0.941 ± 0.011	0.913 ± 0.017	0.965 ± 0.006
ResNet-50	0.930 ± 0.016	0.915 ± 0.046	0.940 ± 0.029	0.926 ± 0.015	0.970 ± 0.009
SatCLIP-V + RF + Mean	0.716 ± 0.017	0.674 ± 0.021	0.751 ± 0.018	0.710 ± 0.015	0.781 ± 0.011
Handcrafted + XGB + PCA	0.718 ± 0.013	0.703 ± 0.014	0.678 ± 0.031	0.690 ± 0.018	0.786 ± 0.012
GeoRSCLIP + LR + PCA	0.690 ± 0.022	0.662 ± 0.019	0.674 ± 0.045	0.668 ± 0.030	0.751 ± 0.019
Satlas Pretrain + LR + Concat	0.623 ± 0.021	0.591 ± 0.026	0.610 ± 0.035	0.599 ± 0.022	0.676 ± 0.011
Prithvi EO 2.0 + LR + PCA	0.597 ± 0.038	0.563 ± 0.040	0.570 ± 0.058	0.566 ± 0.048	0.635 ± 0.029
SatMAE + GB + Concat	0.606 ± 0.023	0.577 ± 0.025	0.553 ± 0.033	0.565 ± 0.027	0.640 ± 0.018
DINOv3 + RF + Median	0.596 ± 0.028	0.566 ± 0.031	0.547 ± 0.035	0.556 ± 0.032	0.621 ± 0.022

在具有空间掩模且ImageNet预训练的CNN达到最高的平均F1（ResNet-50：0.926 ± 0.015）。
最强的传统ML设置（SatCLIP-V+RF+Mean）达到F1 = 0.710 ± 0.015，远低于CNN表现。
ImageNet预训练在各骨干网络上提升F1值（ResNet-34最大提升至+0.143）。
空间掩模通过聚焦站址 footprint 带来显著提升（F1从0.301增至0.455）。
基础模型嵌入与手工特征可以具备竞争力，但手工纹理特征（GLCM）对盗窃纹理模式提供强信号。
数据集包含阿富汗1,943个站点（898被盗，1,045完好）以及96个月的PlanetScope数据；时间一致性与年度特定训练可降低标签噪声。

Figure 2 : EfficientNet-B1 performance across individual years (2017–2023), pretrained with spatial masking. Error bars show std across folds.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。