[论文解读] Explainable vision transformer enabled convolutional neural network for plant disease identification: PlantXViT
PlantXViT 是一个轻量级混合 CNN–ViT 模型,用于植物病害识别,在五个公开数据集上优于若干最新 CNN,并通过 Grad-CAM 和 LIME 提供可解释性。
Plant diseases are the primary cause of crop losses globally, with an impact on the world economy. To deal with these issues, smart agriculture solutions are evolving that combine the Internet of Things and machine learning for early disease detection and control. Many such systems use vision-based machine learning methods for real-time disease detection and diagnosis. With the advancement in deep learning techniques, new methods have emerged that employ convolutional neural networks for plant disease detection and identification. Another trend in vision-based deep learning is the use of vision transformers, which have proved to be powerful models for classification and other problems. However, vision transformers have rarely been investigated for plant pathology applications. In this study, a Vision Transformer enabled Convolutional Neural Network model called "PlantXViT" is proposed for plant disease identification. The proposed model combines the capabilities of traditional convolutional neural networks with the Vision Transformers to efficiently identify a large number of plant diseases for several crops. The proposed model has a lightweight structure with only 0.8 million trainable parameters, which makes it suitable for IoT-based smart agriculture services. The performance of PlantXViT is evaluated on five publicly available datasets. The proposed PlantXViT network performs better than five state-of-the-art methods on all five datasets. The average accuracy for recognising plant diseases is shown to exceed 93.55%, 92.59%, and 98.33% on Apple, Maize, and Rice datasets, respectively, even under challenging background conditions. The efficiency in terms of explainability of the proposed model is evaluated using gradient-weighted class activation maps and Local Interpretable Model Agnostic Explanation.
研究动机与目标
- 需要准确且可解释的植物病害识别在智能农业中的动机。
- 提出一个结合 CNN 与 Vision Transformer 块的轻量级混合架构。
- 显示 PlantXViT 在多样化作物数据集上实现高准确性,同时保持可解释性。
提出的方法
- 使用两块预训练的 VGG16、一个 inception-v7 块,以及四个 transformer encoder 块来组成 PlantXViT。
- 将 CNN 特征图转换为 5x5 片段,线性投影,并用四块 Transformer encoder 处理。
- 使用 categorical cross-entropy 损失、Adam 优化器训练;学习率 0.0001,批量大小 16。
- 用 Grad-CAM 和 LIME 评估可解释性。
- 数据集预处理包括调整尺寸到 224x224x3,并使用五个公开植物病害数据集。
实验结果
研究问题
- RQ1PlantXViT 的混合架构在多样化的植物病害数据集上相较于最近的基于 CNN 的方法有多好?
- RQ2将 ViT 块与 CNN 特征集成是否同时提升准确性和可解释性?
- RQ3Grad-CAM 和 LIME 在解释 PlantXViT 跨数据集的预测方面有多有效?
- RQ4PlantXViT 的 ViT 组件的最佳 patch 大小是多少,以获得最佳性能?
- RQ5在不同数据集规模和类别不平衡的情况下,PlantXViT 的表现如何?
主要发现
- PlantXViT 在五个公开数据集上实现高准确性,在所有数据集上均超越五种最新的 CNN 基于方法。
- patch 大小实验表明 5x5 patch 在各数据集的准确性、精确率、召回率和 F1 上实现最佳总体性能。
- 可解释性分析(Grad-CAM 和 LIME)提供对模型决策的洞察,并突出对预测贡献的局部区域。
- 模型轻量级,训练参数约 0.85 百万,适用于 IoT 支持的智能农业设备。
- 在各数据集上,PlantXViT 展现出强的 ROC/AUC 性能和有竞争力的 Cohen’s kappa 分数,表明在多样条件下的可靠分类。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。