QUICK REVIEW

[논문 리뷰] Vision Transformer for COVID-19 CXR Diagnosis using Chest X-ray Feature Corpus

Sang Joon Park, Gwanghyun Kim|arXiv (Cornell University)|2021. 03. 12.

COVID-19 diagnosis using AI참고 문헌 29인용 수 26

한 줄 요약

본 논문은 사전 학습된 백본(backbone)에 의해 추출된 저수준 chest X-ray 피처 말뭉치를 이용하여 COVID-19 및 다른 감염을 진단하고 외부 데이터셋 간에 강력한 일반화 성능을 보이는 Vision Transformer를 제시한다.

ABSTRACT

Under the global COVID-19 crisis, developing robust diagnosis algorithm for COVID-19 using CXR is hampered by the lack of the well-curated COVID-19 data set, although CXR data with other disease are abundant. This situation is suitable for vision transformer architecture that can exploit the abundant unlabeled data using pre-training. However, the direct use of existing vision transformer that uses the corpus generated by the ResNet is not optimal for correct feature embedding. To mitigate this problem, we propose a novel vision Transformer by using the low-level CXR feature corpus that are obtained to extract the abnormal CXR features. Specifically, the backbone network is trained using large public datasets to obtain the abnormal features in routine diagnosis such as consolidation, glass-grass opacity (GGO), etc. Then, the embedded features from the backbone network are used as corpus for vision transformer training. We examine our model on various external test datasets acquired from totally different institutions to assess the generalization ability. Our experiments demonstrate that our method achieved the state-of-art performance and has better generalization capability, which are crucial for a widespread deployment.

연구 동기 및 목표

제한된 라벨링 데이터 속에서 풍부한 비라벨 CXR을 활용해 COVID-19 CXR 진단의 견고함을 확보한다.
백본에서 파생된 저수준 CXR 피처 말뭉치를 이용해 임베딩 성능을 개선하는 Vision Transformer를 제안한다.
다른 기관과 기기에서 수집된 외부 데이터셋에 대해 모델이 일반화되는지 보인다.

제안 방법

대규모 공용 CXR 데이터셋에서 백본 네트워크를 학습해 저수준 이상 소견(예: consolidation, GGO) 특징을 추출한다.
PCAM 풀링 이전의 중간 백본 임베딩으로부터 피처 코퍼스를 구성한다.
프로젝션된 피처를 class token이 있는 Vision Transformer에 투입해 영상 수준 진단을 수행한다.
깊은 Taylor 분해를 이용한 시각화 기반 해석 가능한 민감도 맵 방법으로 로컬라이제이션을 제공한다.
다양한 외부 데이터셋에서 AUC, 민감도, 특이도, 정확도를 통해 평가한다.

실험 결과

연구 질문

RQ1백본 파생 저수준 CXR 피처 코퍼스를 학습한 Vision Transformer가 COVID-19 CXR 진단에서 표준 ViT 및 베이스라인보다 우수한가?
RQ2저수준 피처 코퍼스를 사용하면 보지 못한 기관별 데이터에 대한 일반화가 개선되는가?
RQ3백본 사전학습이 이 아키텍처에 대해 자기지도 사전학습 없이도 이점이 있는가?
RQ4일반화를 위해 어떤 수준의 백본 파인튜닝이 유리한가(고정 vs 학습 가능)?

주요 결과

세 가지 외부 데이터셋에서 상태-유사한 최첨단 성능과 강력한 일반화를 달성(AUC 약 0.91–0.95, 평균 민감도 약 87%, 평균 특이도 약 91%).
외부 테스트에서 ResNet-50 베이스라인 및 ViT 기반 SOTA 모델을 능가한다.
학습 가능한 백본이 외부 데이터셋에서 백본 가중치를 고정하는 것보다 더 나은 결과를 낸다.
자기지도 학습 사전학습은 제안된 모델에 거의 이점이 없거나 일부 구성에서 성능을 약간 저하시킬 수 있다.
COVID-19 및 세균 감염의 로컬라이제이션을 해석 가능한 시sal리 시각화로 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.