QUICK REVIEW

[논문 리뷰] Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification

Ruijie Zhang, Qiaozhe Zhang|arXiv (Cornell University)|2023. 06. 08.

Image Retrieval and Classification Techniques인용 수 10

한 줄 요약

MMIL-Transformer는 대규모 전슬라이드 영상에 대해 비근사(Self-attention)을 가능하게 하는 메신저 토큰이 포함된 계층적 MIL 프레임워크를 사용하여 CAMELYON16과 TCGA-NSCLC에서 강력한 성과를 달성한다.

ABSTRACT

Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD). The extremely high resolution and limited availability of region-level annotations make employing deep learning methods for WSI-based digital diagnosis challenging. Recently integrating multiple instance learning (MIL) and Transformer for WSI analysis shows very promising results. However, designing effective Transformers for this weakly-supervised high-resolution image analysis is an underexplored yet important problem. In this paper, we propose a Multi-level MIL (MMIL) scheme by introducing a hierarchical structure to MIL, which enables efficient handling of MIL tasks involving a large number of instances. Based on MMIL, we instantiated MMIL-Transformer, an efficient Transformer model with windowed exact self-attention for large-scale MIL tasks. To validate its effectiveness, we conducted a set of experiments on WSI classification tasks, where MMIL-Transformer demonstrate superior performance compared to existing state-of-the-art methods, i.e., 96.80% test AUC and 97.67% test accuracy on the CAMELYON16 dataset, 99.04% test AUC and 94.37% test accuracy on the TCGA-NSCLC dataset, respectively. All code and pre-trained models are available at: https://github.com/hustvl/MMIL-Transformer

연구 동기 및 목표

약한 라벨링이 된 고해상도 WSI의 도전 과제를 scalable한 다단계 MIL 프레임워크를 제안함으로써 해결한다.
대규모 인스턴스 집합 내에서 정확한 로컬 및 글로벌 자기 주의를 가능하게 하는 MMIL-Transformer를 도입한다.
CAMELYON16 및 TCGA-NSCLC 데이터 세트에서 WSI 분류 성능의 우수성을 시연한다.
정확성과 계산 효율의 균형을 맞추기 위한 구성 가능한 그룹화 및 마스킹 메커니즘을 제공한다.
재현성과 추가 연구를 위한 코드와 사전 학습 모델을 공개한다.

제안 방법

원래의 인스턴스를 하위 가방으로 묶어 더 높은 수준의 가방을 형성하는 미분가능한 다단계 MIL(MMIL) 공식화를 제안한다.
하위 가방을 생성하고 대상 주의를 가능하게 하기 위해 다양한 그룹화 연산자(좌표, 임베딩, 무작위, 순차, MSA 기반)를 도입한다.
하위 가방에 MSG 토큰을 부착하고 하위 가방 내에서 자기 주의를 수행한다; MSG 토큰을 사용하여 상위 수준의 가방을 구축한다.
메신저 기반 생성기를 사용하여 하위 가방의 MSG 토큰을 상위 수준의 가방으로 병합하고 최종 분류를 위한 CLS 토큰을 부착한다.
활성 인스턴스 수를 줄이고 성능을 높이기 위해 임베딩 차원별 마스킹 메커니즘을 도입한다.
하위 가방 분할 및 마스킹을 통한 자기 주의 오버헤드 감소를 보이는 복잡도 분석을 제공한다.

실험 결과

연구 질문

RQ1MMIL이 WSI에 대해 비근사(Self-attention)로 대규모 MIL을 처리할 수 있는가?
RQ2그룹화 및 마스킹이 WSI 분류의 정확도와 효율성에 어떤 영향을 미치는가?
RQ3MMIL-Transformer가 CAMELYON16 및 TCGA-NSCLC에서 최첨단 MIL/Transformer 방법과 어떻게 비교되는가?
RQ4MMIL-Transformer 프레임워크 내에서 패치 인코더(예: ResNet vs ViT)의 영향은 무엇인가?

주요 결과

데이터 세트	방법	정확도	AUC
CAMELYON16	MMIL-Transformer	0.9341	0.9474
TCGA-NSCLC	MMIL-Transformer	0.9437	0.9904

MMIL-Transformer는 CAMELYON16에서 96.80% 테스트 AUC 및 97.67% 테스트 정확도(초록에 보고된 바와 같이)의 강력한 성과를 달성한다.
MMIL-Transformer는 TCGA-NSCLC에서 99.04% 테스트 AUC 및 94.37% 테스트 정확도를 달성한다(초록에 보고된 바와 같이).
CAMELYON16에서 ResNet-50을 패치 인코더로 사용할 때 0.9341 정확도 및 0.9474 AUC를 Table 1에서 보고한다.
TCGA-NSCLC에서 MMIL-Transformer는 0.9437 정확도 및 0.9904 AUC를 Table 1에서 보고한다.
아블레이션 연구는 그룹화 유형, 마스크 비율, 다단계 프레임워크가 성능과 효율성에 상당한 영향을 미치며, 마스킹이 정확도를 높이고 상위 수준 가방 구성이 비근사 Self-attention을 가능하게 한다고 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.