QUICK REVIEW

[論文レビュー] Vision Transformer for COVID-19 CXR Diagnosis using Chest X-ray Feature Corpus

Sang Joon Park, Gwanghyun Kim|arXiv (Cornell University)|Mar 12, 2021

COVID-19 diagnosis using AI参考文献 29被引用数 26

ひとこと要約

この論文は、事前学習バックボーンによって抽出された低レベルの胸部X線特徴コーパスを使用して、COVID-19および他の感染症を診断し、外部データセット全体で強い汎化性を示すVision Transformerを提案します。

ABSTRACT

Under the global COVID-19 crisis, developing robust diagnosis algorithm for COVID-19 using CXR is hampered by the lack of the well-curated COVID-19 data set, although CXR data with other disease are abundant. This situation is suitable for vision transformer architecture that can exploit the abundant unlabeled data using pre-training. However, the direct use of existing vision transformer that uses the corpus generated by the ResNet is not optimal for correct feature embedding. To mitigate this problem, we propose a novel vision Transformer by using the low-level CXR feature corpus that are obtained to extract the abnormal CXR features. Specifically, the backbone network is trained using large public datasets to obtain the abnormal features in routine diagnosis such as consolidation, glass-grass opacity (GGO), etc. Then, the embedded features from the backbone network are used as corpus for vision transformer training. We examine our model on various external test datasets acquired from totally different institutions to assess the generalization ability. Our experiments demonstrate that our method achieved the state-of-art performance and has better generalization capability, which are crucial for a widespread deployment.

研究の動機と目的

限られたラベル付きデータの中で、豊富なラベルなしCXRsを活用して、堅牢なCOVID-19 CXR診断を動機づける。
バックボーン由来の低レベルCXR特徴コーパスを用いて埋め込みを改善するVision Transformerを提案する。
異なる機関やデバイスからの外部データセットに対してモデルが良好に汎化することを示す。

提案手法

大規模な公開CXRデータセットでバックボーンネットワークを訓練し、低レベルの異常特徴（例: consolidation、GGO）を抽出する。
PCAMプーリングの前の中間バックボーン埋め込みから特徴コーパスを構築する。
投影された特徴をクラストークンを持つVision Transformerに入力して画像レベルの診断を行う。
深層テイラー分解に基づく局所化のための顕著性マップベースの解釈性手法を用いる。
複数の外部データセットに対してAUC、感度、特異度、精度で評価する。

実験結果

リサーチクエスチョン

RQ1バックボーン由来の低レベルCXR特徴コーパスで訓練した Vision Transformer は、COVID-19 CXR診断において標準ViTやベースラインを上回ることができるか？
RQ2低レベル特徴コーパスを使用することで、未知の、機関的に多様なデータへの汎化が向上するか？
RQ3バックボーンの事前訓練を考慮した場合、このアーキテクチャに自己教師あり事前学習は有益か？
RQ4一般化のために、バックボーンのファインチューニングはどの程度が有利か（固定か学習可能か）？

主な発見

3つの外部データセットでSOTA相当の性能と強い汎化性を達成（AUCは約0.91–0.95、平均感度は約87%、平均特異性は約91%）。
外部テスト全体でResNet-50ベースラインおよびViTベースのSOTAモデルを上回る。
バックボーンを訓練可能とすると、外部データセット全体でバックボーンのウェイトを固定するより良い結果が得られる。
提案モデルに対して自己教師付き事前学習はほとんど有益でなく、一部の設定では性能をわずかに阻害する可能性がある。
COVID-19および細菌感染の局在化を示す解釈可能な顕著性可視化を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。