QUICK REVIEW

[论文解读] PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification

Jian Yu, Joakim Nguyen|arXiv (Cornell University)|Mar 2, 2026

AI in cancer detection被引用 0

一句话总结

PathMoE 引入一个可解释的多模态框架，通过交互感知的专家混合模型将 H&E 切片、病理报告和核级细胞图融合，以样本级模态推理对儿童脑肿瘤进行分类。

ABSTRACT

Accurate classification of pediatric central nervous system tumors remains challenging due to histological complexity and limited training data. While pathology foundation models have advanced whole-slide image (WSI) analysis, they often fail to leverage the rich, complementary information found in clinical text and tissue microarchitecture. To this end, we propose PathMoE, an interpretable multimodal framework that integrates H\&E slides, pathology reports, and nuclei-level cell graphs via an interaction-aware mixture-of-experts architecture built on state-of-the-art foundation models for each modality. By training specialized experts to capture modality uniqueness, redundancy, and synergy, PathMoE employs an input-dependent gating mechanism that dynamically weights these interactions, providing sample-level interpretability. We evaluate our framework on two dataset-specific classification tasks on an internal pediatric brain tumor dataset (PBT) and external TCGA datasets. PathMoE improves macro-F1 from 0.762 to 0.799 (+0.037) on PBT when integrating WSI, text, and graph modalities; on TCGA, augmenting WSI with graph knowledge improves macro-F1 from 0.668 to 0.709 (+0.041). These results demonstrate significant performance gains over state-of-the-art image-only baselines while revealing the specific modality interactions driving individual predictions. This interpretability is particularly critical for rare tumor subtypes, where transparent model reasoning is essential for clinical trust and diagnostic validation.

研究动机与目标

在组织学异质性和数据有限的情况下，推动儿童脑肿瘤分类的准确性。
利用互补模态（WSI、病理报告、核图）提升诊断性能。
通过建模模态贡献与跨模态交互，实现样本级可解释性。

提出的方法

将每种模态编码为切片级表示（图像使用 UNIv2，文本使用 TITAN，核图使用 GraphSAGE 的核图）.
从组织学图像构建核级图，并通过注意力 MIL 池化获得图级特征。
使用一个带五个专家的交互感知专家混合模型（I2MoE）：图像、文本、图、协同和冗余。
应用门控网络计算面向样本的专家权重以进行最终预测。
通过联合分类损失和交互损失训练，促进专家专业化和可解释的门控。
使用内部 PBT 数据的宏 F1 作为主要评估指标，结合外部 TCGA 数据进行 10 折交叉验证。

Figure 1: Overview of PathMoE . H&E WSIs, pathology reports, and nuclei graphs are encoded and fused via an interaction-aware mixture-of-experts module. An input-dependent gating network computes sample-specific weights to combine expert predictions into the final tumor classification. A vanilla fus

实验结果

研究问题

RQ1将 WSI、病理文本和核图整合是否能超越仅图像基线的儿童脑肿瘤分类？
RQ2模态交互（单模态、协同、冗余）如何影响每个样本的预测与可解释性？
RQ3当文本数据嘈杂或不可用时，来自细胞图的领域知识对鲁棒性是否至关重要？
RQ4哪种文本编码器在面向病理信息的多模态融合任务中表现最好？

主要发现

在 PBT 上，使用所有模态的 PathMoE 将宏 F1 从仅图像基线的 0.762 提升到 0.799（EF WTG）。
在 TCGA 上，向图像中添加图信息使宏 F1 从 0.668（EF W）提升到 0.709（EF WG）。
图模态提供非冗余的结构先验，提升性能，尤其在文本不可靠或不可用时尤为明显。
文本编码器质量（领域对齐的 TITAN）提升 PathMoE 的性能，其中 TITAN 在 EFWTG 和 SGWTG 配置下实现最强的宏 F1。
编程化的交互权重显示在具挑战性的案例中，图和文本的贡献可以纠正仅图像的错误，得到定性示例和神经病理学家验证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。