QUICK REVIEW

[논문 리뷰] PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification

Jian Yu, Joakim Nguyen|arXiv (Cornell University)|2026. 03. 02.

AI in cancer detection인용 수 0

한 줄 요약

PathMoE는 상호작용 인식된 전문가 혼합기를 통해 H&E 슬라이드, 병리 보고서, 핵 수준의 세포 그래프를 융합하는 해석 가능 멀티모달 프레임워크를 도입하여 표본 수준의 모달리티 추론으로 소아 뇌종양을 분류한다.

ABSTRACT

Accurate classification of pediatric central nervous system tumors remains challenging due to histological complexity and limited training data. While pathology foundation models have advanced whole-slide image (WSI) analysis, they often fail to leverage the rich, complementary information found in clinical text and tissue microarchitecture. To this end, we propose PathMoE, an interpretable multimodal framework that integrates H\&E slides, pathology reports, and nuclei-level cell graphs via an interaction-aware mixture-of-experts architecture built on state-of-the-art foundation models for each modality. By training specialized experts to capture modality uniqueness, redundancy, and synergy, PathMoE employs an input-dependent gating mechanism that dynamically weights these interactions, providing sample-level interpretability. We evaluate our framework on two dataset-specific classification tasks on an internal pediatric brain tumor dataset (PBT) and external TCGA datasets. PathMoE improves macro-F1 from 0.762 to 0.799 (+0.037) on PBT when integrating WSI, text, and graph modalities; on TCGA, augmenting WSI with graph knowledge improves macro-F1 from 0.668 to 0.709 (+0.041). These results demonstrate significant performance gains over state-of-the-art image-only baselines while revealing the specific modality interactions driving individual predictions. This interpretability is particularly critical for rare tumor subtypes, where transparent model reasoning is essential for clinical trust and diagnostic validation.

연구 동기 및 목표

조직학적 이질성과 제한된 데이터 속에서 정확한 소아 뇌종양 분류를 동기 부여합니다.
보완적 모달리티(WSIs, 병리 보고서, 핵 그래프)를 활용하여 진단 성능을 향상시킵니다.
모달리티 기여도와 교차 모달 상호작용을 모델링하여 샘플 수준의 해석 가능성을 제공합니다.

제안 방법

각 모달리티를 슬라이드 수준 표현으로 인코딩합니다(이미지는 UNIv2, 텍스트는 TITAN, 그래프는 GraphSAGE를 이용한 핵 그래프).
조직학 이미지로부터 핵 수준의 그래프를 구성하고 주의 메커니즘 기반의 MIL 풀링으로 그래프 수준 특성을 얻습니다.
상호작용 인식 혼합 전문가(I2MoE)를 다섯 전문가로 구성합니다: 이미지, 텍스트, 그래프, 시너지, 중복.
최종 예측을 위한 전문가들에 대한 샘플 의존 가중치를 계산하기 위해 게이팅 네트워크를 적용합니다.
전문가의 전문화와 해석 가능한 게이팅을 촉진하기 위해 분류 손실과 상호작용 손실을 결합하여 학습합니다.
내부 PBT 데이터와 외부 TCGA 데이터에서 매크로-F1을 주요 지표로 사용하고 10-fold 교차 검증을 수행하여 평가합니다.

Figure 1: Overview of PathMoE . H&E WSIs, pathology reports, and nuclei graphs are encoded and fused via an interaction-aware mixture-of-experts module. An input-dependent gating network computes sample-specific weights to combine expert predictions into the final tumor classification. A vanilla fus

실험 결과

연구 질문

RQ1WSI, 병리 텍스트, 및 핵 그래프를 통합하는 것이 이미지 전용 벤치라인을 넘어 소아 뇌종양 분류를 향상시키는가?
RQ2모달리티 간 상호작용(단일 모달, 시너지, 중복)이 샘플별 예측 및 해석 가능성에 어떤 영향을 미치는가?
RQ3텍스트 데이터가 노이즈가 크거나 이용 불가능할 때 세포 그래프에서 얻는 도메인 지식이 강건성에 중요한가?
RQ4병리 정보를 위한 과제에서 어떤 텍스트 인코더가 최적의 멀티모달 융합 성능을 낳는가?

주요 결과

PathMoE가 모든 모달리티를 사용했을 때 내부 PBT에서 매크로-F1을 0.762(이미지 전용 EF W)에서 0.799(EF WTG)로 향상시킵니다.
TCGA 데이터에서 이미지에 그래프 정보를 더하면 매크로-F1이 0.668(EF W)에서 0.709(EF WG)로 증가합니다.
그래프 모달리티는 텍스트가 신뢰할 수 없거나 이용 불가능할 때도 성능을 높이는 비중복적 구조적 사전 정보를 제공합니다.
텍스트 인코더 품질(도메인 정렬된 TITAN)이 PathMoE 성능을 향상시키며, TITAN은 EFWTG 및 SGWTG 구성에서 가장 강한 매크로-F1을 달성합니다.
프로그램된 상호작용 가중치는 그래프 및 텍스트 기여가 이미지 전용 오류를 교정할 수 있음을 보여주며, 정성적 예시와 신경병리학자 검증에서 확인됩니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.