QUICK REVIEW

[論文レビュー] Overview of LifeCLEF Plant Identification task 2019: diving into data deficient tropical countries

Hervé Goëau, Pierre Bonnet|Agritrop (Cirad)|Sep 23, 2025

Species Distribution and Climate Change参考文献 1被引用数 23

ひとこと要約

LifeCLEF 2019 Plant Identification チャレンジは、データ不足の熱帯植物相を対象に自動植物識別を評価し、10k-species training set と 742-item field-tested test set を使用し、26 DL systems を6 teams で人間の専門家と比較した。結果は深層学習が専門家に遅れ、熱帯植物は特に難しいことを示している。

ABSTRACT

Automated identification of plants has improved considerably thanks to the recent progress in deep learning and the availability of training data. However, this profusion of data only concerns a few tens of thousands of species, while the planet has nearly 369K. The LifeCLEF 2019 Plant Identification challenge (or "PlantCLEF 2019") was designed to evaluate automated identification on the flora of data deficient regions. It is based on a dataset of 10K species mainly focused on the Guiana shield and the Northern Amazon rainforest, an area known to have one of the greatest diversity of plants and animals in the world. As in the previous edition, a comparison of the performance of the systems evaluated with the best tropical flora experts was carried out. This paper presents the resources and assessments of the challenge, summarizes the approaches and systems employed by the participating research groups, and provides an analysis of the main outcomes.

研究の動機と目的

新しい10k-species訓練データセットを用いて、熱帯のデータ不足地域（ギアナシールドと北部アマゾン）の自動植物識別性能を評価する。
現地で同定されたテストセットで、自動システムの性能と熱帯植物の専門家の性能を比較する。
データ品質とノイズの影響を分析し、標本画像などのデータ源から得られる潜在的改善を検討する。

提案手法

EoLとウェブ源から10k種の訓練データセットを組み立て、ノイズと重複を記録する。
評価用の高品質な742件の現地同定観測データセットを提供する。
各チームにつき最大10回の実行を評価し、Top1、Top3、Top5、MRR 指標を用い、5つの専門家アノテーションと比較する。
参加手法を要約し、CNNアーキテクチャ（例：Inception-ResNet-v2/v4、DenseNet）とデータ拡張を強調する。
人間の専門家とシステムの性能を比較し、訓練データ量とノイズの影響を分析する。

Figure 1: Regions of origin of the 10k species selected for PlantCLEF 2019: French Guiana, Suriname, Guyana, Brazil (states of Amapa, Para, Amazonas)

実験結果

リサーチクエスチョン

RQ1熱帯のデータ不足 flora に対する自動識別は、熱帯 flora の専門家と比較してどうなるか？
RQ2訓練データの品質とノイズ（重複、非植物画像、標本描画など）がDLの性能に与える影響は？
RQ3大規模でノイズの多い、しかし多様なデータセットで訓練した場合、深層学習システムは専門家との差を縮められるか？
RQ4データ拡張、クラス事前分布、および追加訓練データ（例：GBIF）などはシステム性能にどんな役割を果たすか？
RQ5標本データはデータ不足の熱帯地域で識別性能を改善する実現可能な道か？

主な発見

Team run	Top1 Expert	Top1 Whole	Top3 Expert	Top5 Expert	Top5 Whole	MRR Expert	MRR Whole
Holmes Run 2	0,316	0,247	0,376	0,419	0,357	0,362	0,298
Holmes Run 3	0,282	0,225	0,359	0,376	0,321	0,329	0,274
Holmes Run 1	0,248	0,222	0,325	0,368	0,325	0,302	0,269
CMP Run 7	0,085	0,078	0,145	0,197	0,168	0,124	0,111
CMP Run 2	0,077	0,061	0,145	0,188	0,162	0,117	0,097
CMP Run 6	0,068	0,057	0,154	0,188	0,163	0,112	0,096
CMP Run 1	0,068	0,069	0,145	0,171	0,158	0,107	0,099
CMP Run 3	0,068	0,066	0,128	0,188	0,156	0,110	0,099
CMP Run 4	0,060	0,053	0,128	0,162	0,160	0,097	0,090
MRIM Run 1	0,043	0,042	0,051	0,060	0,088	0,055	0,063
MRIM Run 8	0,034	0,046	0,068	0,103	0,102	0,057	0,068
MRIM Run 7	0,026	0,042	0,085	0,094	0,096	0,053	0,065
datvo06 Run 1	0,026	0,043	0,051	0,060	0,086	0,041	0,061
CMP Run 5	0,026	0,054	0,085	0,085	0,119	0,050	0,078
MRIM Run 10	0,026	0,034	0,068	0,068	0,085	0,047	0,057
MRIM Run 5	0,017	0,036	0,043	0,077	0,082	0,039	0,058
MRIM Run 3	0,017	0,030	0,060	0,077	0,088	0,043	0,054
MRIM Run 2	0,017	0,036	0,043	0,077	0,082	0,039	0,058
MRIM Run 6	0,017	0,028	0,051	0,077	0,078	0,037	0,049
MRIM Run 9	0,017	0,031	0,043	0,068	0,088	0,039	0,055
MRIM Run 4	0,009	0,027	0,060	0,077	0,077	0,038	0,049
MLRG SSN Run 1	0,000	0,000	0,000	0,000	0,000	0,000	0,000
Leowin Run 1	0,000	0,000	0,000	0,000	0,001	0,000	0,000
MLRG SSN Run 2	0,000	0,000	0,000	0,000	0,000	0,000	0,000
MLRG SSN Run 3	0,000	0,012	0,000	0,009	0,027	0,004	0,021
Leowin Run 2	0,000	0,000	0,000	0,000	0,001	0,000	0,000
Expert 1	0,675	-	0,684	0,684	-	0,679	-
Expert 2	0,598	-	0,607	0,607	-	0,603	-
Expert 3	0,376	-	0,402	0,402	-	0,389	-
Expert 4	0,325	-	0,530	0,530	-	0,425	-
Expert 5	0,154	-	0,154	0,154	-	0,154	-

DLシステムはテストセットで最高の熱帯 flora 専門家より著しく劣る（Top1 expert は最大で0.675、中央値は0.376）。
熱帯 flora のタスクは温帯 flora より notably harder で、専門家のトップ性能が低く、機械予測とのギャップが大きい。
最良の自動化システムはトップ専門家のおよそ半分程度の精度で、専門家と比較したTop1で約0.365のギャップ。
ノイズとデータ品質（重複、非植物画像）は、特に訓練画像が少ない種で性能に顕著な影響を与える。標本/ drawings の影響は決定的ではない。
標本記録（GBIF/標本デジ化）を訓練データに追加することで潜在的な利得が示唆され、事後評価でトップ1精度41%を達成。
補足分析では、訓練画像数が多いほど平均順位が一般的に改善される一方、重複の高割合は結果を劣化させる。

Figure 2: Scores between Experts and Machine

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。