QUICK REVIEW

[论文解读] Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models

An Yan, Yu Wang|arXiv (Cornell University)|Oct 4, 2023

Machine Learning in Healthcare被引用 9

一句话总结

本论文通过将图像映射到由 GPT-4 派生的医疗概念，通过一个视觉-语言模型，推出鲁棒且可解释的医疗图像分类器，从而实现更好的泛化和可解释性。

ABSTRACT

Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new domains (e.g., patients with different ages). Second, these black-box models lack interpretability. When making diagnostic predictions, it is important to understand why a model makes a decision for trustworthy and safety considerations. In this paper, to address these two limitations, we propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model. We systematically evaluate our method on eight medical image classification datasets to verify its effectiveness. On challenging datasets with strong confounding factors, our method can mitigate spurious correlations thus substantially outperform standard visual encoders and other baselines. Finally, we show how classification with a small number of concepts brings a level of interpretability for understanding model decisions through case studies in real medical data.

研究动机与目标

推动对抗临床数据中的伪相关性的鲁棒医疗图像分类。
提出一个框架，使用来自 GPT-4 的自然语言概念来引导视觉特征的利用。
通过视觉-语言模型将图像与概念联系起来，以生成可解释的预测。
展示基于概念的分类器在混淆数据集上的鲁棒性提升，同时在标准基准上仍具竞争力的准确性。

提出的方法

以零-shot 方式从 GPT-4 中为每个疾病类别提取医疗概念。
使用视觉-语言模型（BioViL）将视觉特征投影到 GPT-4 派生的概念空间，以获得概念热力图。
通过对概念-图像相似性进行池化来计算概念向量，归一化分数，并将其输入到没有偏置的线性分类器中。
通过最终线性层权重（将概念与类别相连）以及逐实例贡献分析提供可解释性。
使用概念基础对数进行交叉熵损失训练，其中每个对数分数是概念分数的非负线性组合。

实验结果

研究问题

RQ1GPT-4 生成的概念能否提高对医学影像领域混淆因素的鲁棒性？
RQ2将视觉特征投影到概念空间是否能减少对伪相关性的依赖，同时保持准确性？
RQ3概念基础方法在全局和实例级别提供的可解释性程度有多大？
RQ4在标准基准与混淆数据集上的表现有何差异？

主要发现

在混淆数据集上，该方法显著优于基线，平均提升约 19 个百分点，相对于原始图像特征。
在具有强混淆因子的挑战性数据集上，该方法比 ERM、Fish、LISA 和 BioViL 特征更好地减轻伪相关性。
在没有显式混淆因子的标准基准上，该方法保持与纯视觉编码器和先前的 CBM 相当或优越。
模型通过最终层权重展示概念重要性并通过可视化逐实例概念贡献提供可解释性。
GPT-4 派生的概念在关键数据集上的准确性超过其他概念集合（ChatGPT、MIMIC-GPT4、Human）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。