Skip to main content
QUICK REVIEW

[论文解读] Deep learning powered real-time identification of insects using citizen science data

Shivani Chiranjeevi, Mojdeh Sadaati|arXiv (Cornell University)|Jun 4, 2023
Insect and Arachnid Ecology and Behavior被引用 11
一句话总结

InsectNet 使用一个大型公民科学昆虫图片数据集,结合自监督学习、迁移学习、OOD 检测和保形预测来识别2526种昆虫,准确率超过96%,并提供公共网页门户和可重用的 ML 栈。

ABSTRACT

Insect-pests significantly impact global agricultural productivity and quality. Effective management involves identifying the full insect community, including beneficial insects and harmful pests, to develop and implement integrated pest management strategies. Automated identification of insects under real-world conditions presents several challenges, including differentiating similar-looking species, intra-species dissimilarity and inter-species similarity, several life cycle stages, camouflage, diverse imaging conditions, and variability in insect orientation. A deep-learning model, InsectNet, is proposed to address these challenges. InsectNet is endowed with five key features: (a) utilization of a large dataset of insect images collected through citizen science; (b) label-free self-supervised learning for large models; (c) improving prediction accuracy for species with a small sample size; (d) enhancing model trustworthiness; and (e) democratizing access through streamlined MLOps. This approach allows accurate identification (>96% accuracy) of over 2500 insect species, including pollinator (e.g., butterflies, bees), parasitoid (e.g., some wasps and flies), predator species (e.g., lady beetles, mantises, dragonflies) and harmful pest species (e.g., armyworms, cutworms, grasshoppers, stink bugs). InsectNet can identify invasive species, provide fine-grained insect species identification, and work effectively in challenging backgrounds. It also can abstain from making predictions when uncertain, facilitating seamless human intervention and making it a practical and trustworthy tool. InsectNet can guide citizen science data collection, especially for invasive species where early detection is crucial. Similar approaches may transform other agricultural challenges like disease detection and underscore the importance of data collection, particularly through citizen science efforts..

研究动机与目标

  • Harness citizen science data to build a scalable, accurate insect classifier.
  • Handle metamorphosis and fine-grained species with high intra- and inter-species variability.
  • Improve reliability and trust through OOD detection and conformal prediction.
  • Democratize access via web portal and reusable MLOps workflows.

提出的方法

  • Assemble a dataset of 6M images from 2526 insect species from iNaturalist (subset of 13M insects, 100k species).
  • Apply label-free self-supervised learning to pre-train models on unlabeled data, followed by finetuning on labeled data.
  • Use AlphaNet to boost accuracy for low-sample species without retraining the base model.
  • Wrap the classifier with energy-based OOD detection to flag uncertain or out-of-distribution images.
  • Employ conformal predictions to output prediction sets with guaranteed confidence (≥97.5%).
  • Provide public access to the model, weights, quantized versions, and MLOps workflows via a web portal.
Figure 1: InsectNet in action. After an image is uploaded, InsectNet first performs out-of-distribution (OOD) detection. (Left) If OOD detection is true, InsectNet provides a warning along with its prediction. (Middle) If not OOD, InsectNet produces a prediction with no warning. (Right) Additionally
Figure 1: InsectNet in action. After an image is uploaded, InsectNet first performs out-of-distribution (OOD) detection. (Left) If OOD detection is true, InsectNet provides a warning along with its prediction. (Middle) If not OOD, InsectNet produces a prediction with no warning. (Right) Additionally

实验结果

研究问题

  • RQ1Can InsectNet accurately identify 2526 insect species across life stages using citizen science images?
  • RQ2Does self-supervised pre-training on large unlabeled datasets improve final species classification performance?
  • RQ3Can a post hoc wrapper (AlphaNet) improve low-sample species accuracy without harming overall accuracy?
  • RQ4Does energy-based OOD detection and conformal prediction enhance trust and safe human intervention?
  • RQ5Is the approach deployable via democratized access (web portal and MLOps) for practical use in IPM and CSPM?

主要发现

  • InsectNet achieves 96.4% classification accuracy across 2526 species; mean per-species accuracy is 94%.
  • Only 3.40% of species have accuracy below 80%, with most underrepresented species (>1000 images) performing well.
  • SSL pre-training with large unlabeled data and multi-dataset pre-training campaigns significantly boost representation quality.
  • AlphaNet improves low-sample species accuracy from 79.7% to 87.6% with only a 1.3% drop in overall accuracy (96.4% to 95.1%).
  • Out-of-distribution detection and conformal prediction provide fail-safes and confidence guarantees (≥97.5%), enabling trusted human intervention.
  • InsectNet can identify insects across metamorphosis stages and in challenging backgrounds, including invasive and beneficial species.
Figure 2: InsectNet is able to accurately identify insect species across the life cycle stages. Top left: charismatic species Danaus plexippus (Monarch butterfly), Bottom left: an invasive species Lycorma delicatula (Spotted lanternfly). Right panel: Examples of the ability of InsectNet to accuratel
Figure 2: InsectNet is able to accurately identify insect species across the life cycle stages. Top left: charismatic species Danaus plexippus (Monarch butterfly), Bottom left: an invasive species Lycorma delicatula (Spotted lanternfly). Right panel: Examples of the ability of InsectNet to accuratel

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。