[论文解读] Plant identification in an open-world (LifeCLEF 2016)
LifeCLEF 2016 植物识别任务在超过11万张图像、覆盖1000种西欧植物中评估开放集识别,比较基于CNN的系统并突出在拒绝未知类别上的挑战。
The LifeCLEF plant identification challenge aims at evaluating plant identification methods and systems at a very large scale, close to the conditions of a real-world biodiversity monitoring scenario. The 2016-th edition was actually conducted on a set of more than 110K images illustrating 1000 plant species living in West Europe, built through a large-scale participatory sensing platform initiated in 2011 and which now involves tens of thousands of contributors. The main novelty over the previous years is that the identification task was evaluated as an open-set recognition problem, i.e. a problem in which the recognition system has to be robust to unknown and never seen categories. Beyond the brute-force classification across the known classes of the training set, the big challenge was thus to automatically reject the false positive classification hits that are caused by the unknown classes. This overview presents more precisely the resources and assessments of the challenge, summarizes the approaches and systems employed by the participating research groups, and provides an analysis of the main outcomes.
研究动机与目标
- 在接近真实世界生物多样性监测的开放集条件下对大规模植物识别方法进行评估。
- 在识别已知物种的同时评估对未知和未看到的植物类别的鲁棒性。
- 提供基准数据集和指标,用于研究开放集性能和未知类别的拒绝。
- 分析不同的基于CNN和混合方法在干扰因子丰富的测试集上的表现。
提出的方法
- 使用来自 PlantCLEF 2015 的训练集,并为测试图像补充真实标签。
- 从 Pl@ntNet 查询中构建一个测试集,包含已知和未知类(开放集)。
- 以开放集设置下的平均精确度均值(mAP-open)及聚焦入侵物种的变体(mAP-open-invasive)评估提交结果。
- 每组允许最多4次运行,包括CNN和非CNN基线、集成模型以及元数据的使用。
- 评估对未知类别的拒绝策略,并在不同新颖程度下报告表现。
实验结果
研究问题
- RQ1基于CNN的植物识别系统在存在大量未知类别的开放世界设置中表现如何?
- RQ2未知类别干扰项对开放集植物识别的mAP有何影响?
- RQ3明确的未知类别拒绝策略是否提高鲁棒性?在何种新颖性条件下?
- RQ4在类似流式场景中,当未知查询比例增加时,性能如何下降?
- RQ5架构、集成和元数据在开放集植物识别性能中的相对贡献是多少?
主要发现
| 运行 | 关键词 | 拒绝策略 | mAP-open | mAP-open-invasive | mAP-closed |
|---|---|---|---|---|---|
| Bluefield Run4 | VGGNet, combine outputs from a same observation | thresholds by class (train+validation) | 0.742 | 0.717 | 0.827 |
| SabanciU GebzeTU Run1 | 2x(VGGNet,GoogleNet) tuned with resp. 70k, 115k training images | GoogleNet 70k/70k Plant/ImageNet | 0.738 | 0.704 | 0.806 |
| SabanciU…Run3 | SabanciUGebzeTU Run1 | Manually removed 90 test images | 0.737 | 0.703 | 0.807 |
| Bluefield Run3 | Bluefield Run 4 | thresholds by class | 0.736 | 0.718 | 0.82 |
| SabanciU…Run2 | SabanciUGebzeTU Run1 | - | 0.736 | 0.683 | 0.807 |
| SabanciU…Run4 | SabanciUGebzeTU Run1 | - | 0.735 | 0.695 | 0.802 |
| CMP Run1 | Bagging of 3xResNet-152 | - | 0.71 | 0.653 | 0.79 |
| LIIR KUL Run3 | CaffeNet, VGGNet16, 3xGoogleNet, adding 12k external plant images | threshold | 0.703 | 0.674 | 0.761 |
| LIIR KUL Run2 | LIIR KUL Run 3 | threshold | 0.692 | 0.667 | 0.744 |
| LIIR KUL Run1 | LIIR KUL Run 3 | threshold | 0.669 | 0.652 | 0.708 |
| UM Run4 | VGGNet16 | - | 0.669 | 0.598 | 0.742 |
| CMP Run2 | ResNet-152 | - | 0.644 | 0.564 | 0.729 |
| CMP Run3 | ResNet-152 (2015training) | - | 0.639 | 0.59 | 0.723 |
| QUT Run3 | 1 ”general” GoogleNet, 6 ”organ” GoogleNets, observation combination | - | 0.629 | 0.61 | 0.696 |
| Floristic Run3 | GoogleNet, metadata | - | 0.627 | 0.533 | 0.693 |
| UM Run1 | VGGNet16 | - | 0.627 | 0.537 | 0.7 |
| Floristic Run1 | GoogleNet | - | 0.619 | 0.541 | 0.694 |
| Bluefield Run1 | VGGNet | thresholds by class | 0.611 | 0.6 | 0.692 |
| Bluefield Run2 | VGGNet | thresholds by class | 0.611 | 0.6 | 0.693 |
| Floristic Run2 | GoogleNet | thresholds by class | 0.611 | 0.538 | 0.681 |
| QUT Run1 | GoogleNet | - | 0.601 | 0.563 | 0.672 |
| UM Run3 | VGGNet16 with dedicated and combined organ & species layers | - | 0.589 | 0.509 | 0.652 |
| QUT Run2 | 6 ”organ” GoogleNets, observation combination | - | 0.564 | 0.562 | 0.641 |
| UM Run2 | VGGNet16 from scratch (without ImageNet2012) | - | 0.481 | 0.446 | 0.552 |
| QUT Run4 | QUT Run3 | threshold | 0.367 | 0.359 | 0.378 |
| BMETMITRun4 | AlexNet & BVWs & metadata | - | 0.174 | 0.144 | 0.213 |
| BMETMITRun3 | AlexNet & BVWs & metadata | threshold by classifier | 0.17 | 0.125 | 0.197 |
| BMETMITRun1 | AlexNet | - | 0.169 | 0.125 | 0.196 |
| BMETMITRun2 | BVWs (fisher vectors) | - | 0.066 | 0.128 | 0.101 |
- CNN-based systems dominated the top results, with the top 26 runs using CNNs.
- Best configuration achieved mAP-open 0.718 for invasive-species monitoring, with gains mainly from observation-level pooling.
- Open-set distractors degrade performance across all systems; however, CNNs remain relatively robust to unknown classes.
- When novelty is high, mean average precision drops significantly (e.g., below 0.45 when only 25% of queries are known).
- Rejection strategies provided limited additional benefits over CNN baselines under moderate novelty, suggesting room for adaptive open-set rejection methods.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。