QUICK REVIEW

[论文解读] Plant identification in an open-world (LifeCLEF 2016)

Hervé Goëau, Pierre Bonnet|ArXiv.org|Sep 25, 2025

Smart Agriculture and AI参考文献 9被引用 53

一句话总结

LifeCLEF 2016 植物识别任务在超过11万张图像、覆盖1000种西欧植物中评估开放集识别，比较基于CNN的系统并突出在拒绝未知类别上的挑战。

ABSTRACT

The LifeCLEF plant identification challenge aims at evaluating plant identification methods and systems at a very large scale, close to the conditions of a real-world biodiversity monitoring scenario. The 2016-th edition was actually conducted on a set of more than 110K images illustrating 1000 plant species living in West Europe, built through a large-scale participatory sensing platform initiated in 2011 and which now involves tens of thousands of contributors. The main novelty over the previous years is that the identification task was evaluated as an open-set recognition problem, i.e. a problem in which the recognition system has to be robust to unknown and never seen categories. Beyond the brute-force classification across the known classes of the training set, the big challenge was thus to automatically reject the false positive classification hits that are caused by the unknown classes. This overview presents more precisely the resources and assessments of the challenge, summarizes the approaches and systems employed by the participating research groups, and provides an analysis of the main outcomes.

研究动机与目标

在接近真实世界生物多样性监测的开放集条件下对大规模植物识别方法进行评估。
在识别已知物种的同时评估对未知和未看到的植物类别的鲁棒性。
提供基准数据集和指标，用于研究开放集性能和未知类别的拒绝。
分析不同的基于CNN和混合方法在干扰因子丰富的测试集上的表现。

提出的方法

使用来自 PlantCLEF 2015 的训练集，并为测试图像补充真实标签。
从 Pl@ntNet 查询中构建一个测试集，包含已知和未知类（开放集）。
以开放集设置下的平均精确度均值（mAP-open）及聚焦入侵物种的变体（mAP-open-invasive）评估提交结果。
每组允许最多4次运行，包括CNN和非CNN基线、集成模型以及元数据的使用。
评估对未知类别的拒绝策略，并在不同新颖程度下报告表现。

实验结果

研究问题

RQ1基于CNN的植物识别系统在存在大量未知类别的开放世界设置中表现如何？
RQ2未知类别干扰项对开放集植物识别的mAP有何影响？
RQ3明确的未知类别拒绝策略是否提高鲁棒性？在何种新颖性条件下？
RQ4在类似流式场景中，当未知查询比例增加时，性能如何下降？
RQ5架构、集成和元数据在开放集植物识别性能中的相对贡献是多少？

主要发现

运行	关键词	拒绝策略	mAP-open	mAP-open-invasive	mAP-closed
Bluefield Run4	VGGNet, combine outputs from a same observation	thresholds by class (train+validation)	0.742	0.717	0.827
SabanciU GebzeTU Run1	2x(VGGNet,GoogleNet) tuned with resp. 70k, 115k training images	GoogleNet 70k/70k Plant/ImageNet	0.738	0.704	0.806
SabanciU…Run3	SabanciUGebzeTU Run1	Manually removed 90 test images	0.737	0.703	0.807
Bluefield Run3	Bluefield Run 4	thresholds by class	0.736	0.718	0.82
SabanciU…Run2	SabanciUGebzeTU Run1	-	0.736	0.683	0.807
SabanciU…Run4	SabanciUGebzeTU Run1	-	0.735	0.695	0.802
CMP Run1	Bagging of 3xResNet-152	-	0.71	0.653	0.79
LIIR KUL Run3	CaffeNet, VGGNet16, 3xGoogleNet, adding 12k external plant images	threshold	0.703	0.674	0.761
LIIR KUL Run2	LIIR KUL Run 3	threshold	0.692	0.667	0.744
LIIR KUL Run1	LIIR KUL Run 3	threshold	0.669	0.652	0.708
UM Run4	VGGNet16	-	0.669	0.598	0.742
CMP Run2	ResNet-152	-	0.644	0.564	0.729
CMP Run3	ResNet-152 (2015training)	-	0.639	0.59	0.723
QUT Run3	1 ”general” GoogleNet, 6 ”organ” GoogleNets, observation combination	-	0.629	0.61	0.696
Floristic Run3	GoogleNet, metadata	-	0.627	0.533	0.693
UM Run1	VGGNet16	-	0.627	0.537	0.7
Floristic Run1	GoogleNet	-	0.619	0.541	0.694
Bluefield Run1	VGGNet	thresholds by class	0.611	0.6	0.692
Bluefield Run2	VGGNet	thresholds by class	0.611	0.6	0.693
Floristic Run2	GoogleNet	thresholds by class	0.611	0.538	0.681
QUT Run1	GoogleNet	-	0.601	0.563	0.672
UM Run3	VGGNet16 with dedicated and combined organ & species layers	-	0.589	0.509	0.652
QUT Run2	6 ”organ” GoogleNets, observation combination	-	0.564	0.562	0.641
UM Run2	VGGNet16 from scratch (without ImageNet2012)	-	0.481	0.446	0.552
QUT Run4	QUT Run3	threshold	0.367	0.359	0.378
BMETMITRun4	AlexNet & BVWs & metadata	-	0.174	0.144	0.213
BMETMITRun3	AlexNet & BVWs & metadata	threshold by classifier	0.17	0.125	0.197
BMETMITRun1	AlexNet	-	0.169	0.125	0.196
BMETMITRun2	BVWs (fisher vectors)	-	0.066	0.128	0.101

CNN-based systems dominated the top results, with the top 26 runs using CNNs.
Best configuration achieved mAP-open 0.718 for invasive-species monitoring, with gains mainly from observation-level pooling.
Open-set distractors degrade performance across all systems; however, CNNs remain relatively robust to unknown classes.
When novelty is high, mean average precision drops significantly (e.g., below 0.45 when only 25% of queries are known).
Rejection strategies provided limited additional benefits over CNN baselines under moderate novelty, suggesting room for adaptive open-set rejection methods.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。