QUICK REVIEW

[论文解读] COVID-VIT: Classification of COVID-19 from CT chest images based on vision transformer models

Xiaohong Gao, Qian Yu|arXiv (Cornell University)|Jul 4, 2021

COVID-19 diagnosis using AI参考文献 14被引用 43

一句话总结

该论文比较视觉变换器（ViT）和 DenseNet 在 CT 胸部影像中分类 COVID-19 的效果，在验证数据上 F1 值 ViT 优于 DenseNet。

ABSTRACT

This paper is responding to the MIA-COV19 challenge to classify COVID from non-COVID based on CT lung images. The COVID-19 virus has devastated the world in the last eighteen months by infecting more than 182 million people and causing over 3.9 million deaths. The overarching aim is to predict the diagnosis of the COVID-19 virus from chest radiographs, through the development of explainable vision transformer deep learning techniques, leading to population screening in a more rapid, accurate and transparent way. In this competition, there are 5381 three-dimensional (3D) datasets in total, including 1552 for training, 374 for evaluation and 3455 for testing. While most of the data volumes are in axial view, there are a number of subjects' data are in coronal or sagittal views with 1 or 2 slices are in axial view. Hence, while 3D data based classification is investigated, in this competition, 2D images remains the main focus. Two deep learning methods are studied, which are vision transformer (ViT) based on attention models and DenseNet that is built upon conventional convolutional neural network (CNN). Initial evaluation results based on validation datasets whereby the ground truth is known indicate that ViT performs better than DenseNet with F1 scores being 0.76 and 0.72 respectively. Codes are available at GitHub at .

研究动机与目标

针对人群筛查的胸部 CT 图像快速、可解释的 COVID-19 诊断。
将基于 ViT 的方法与传统 CNN（DenseNet）在 COVID 与非 COVID 分类上的性能进行比较。
利用 3D CT 数据集，同时聚焦于 2D 切片分类以实现实用性和性能提升。

提出的方法

使用 ViT 和 DenseNet 架构从胸部 CT 切片中分类 COVID-19。
在混合数据集（包含 3D 体积和切片，强调 2D 图像）的数据集上评估性能。
在验证数据上报告 F1 分数以比较模型（ViT 与 DenseNet）。
提供 GitHub 上的代码以实现可重复性。

实验结果

研究问题

RQ1在验证数据上，ViT 是否优于 DenseNet 来自 CT 胸部图像的 COVID-19 分类？
RQ2在本任务中 ViT 与 DenseNet 的对比 F1 分数是多少？
RQ3在评估的设置中，2D 切片型 CT 数据 versus 3D 数据在本分类任务中的有效性如何？
RQ4该方法是否具备可解释性并适用于实际的人群筛查？

主要发现

ViT 在验证数据上的 F1 分数为 0.76，高于 DenseNet 的 0.72。
该研究使用总计 5,381 个 3D 数据集，包含 1,552 个训练、374 个评估和 3,455 个测试样本。
大多数数据体积为轴位（axial），但部分被试包含冠状位或矢状位视图，含有 1–2 个轴向切片；2D 图像仍为主要关注对象。
COVID-ViT 方法的代码可在 GitHub 获取。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。