QUICK REVIEW

[論文レビュー] PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification

Jian Yu, Joakim Nguyen|arXiv (Cornell University)|Mar 2, 2026

AI in cancer detection被引用数 0

ひとこと要約

tldr: PathMoE は、H&E スライド、病理レポート、核レベルの細胞グラフを相互作用意識型混合専門家（interaction-aware mixture-of-experts）で融合する解釈可能な多モーダルフレームワークを導入し、サンプルレベルのモダリティ推論を用いて小児脳腫瘍を分類します。

ABSTRACT

Accurate classification of pediatric central nervous system tumors remains challenging due to histological complexity and limited training data. While pathology foundation models have advanced whole-slide image (WSI) analysis, they often fail to leverage the rich, complementary information found in clinical text and tissue microarchitecture. To this end, we propose PathMoE, an interpretable multimodal framework that integrates H\&E slides, pathology reports, and nuclei-level cell graphs via an interaction-aware mixture-of-experts architecture built on state-of-the-art foundation models for each modality. By training specialized experts to capture modality uniqueness, redundancy, and synergy, PathMoE employs an input-dependent gating mechanism that dynamically weights these interactions, providing sample-level interpretability. We evaluate our framework on two dataset-specific classification tasks on an internal pediatric brain tumor dataset (PBT) and external TCGA datasets. PathMoE improves macro-F1 from 0.762 to 0.799 (+0.037) on PBT when integrating WSI, text, and graph modalities; on TCGA, augmenting WSI with graph knowledge improves macro-F1 from 0.668 to 0.709 (+0.041). These results demonstrate significant performance gains over state-of-the-art image-only baselines while revealing the specific modality interactions driving individual predictions. This interpretability is particularly critical for rare tumor subtypes, where transparent model reasoning is essential for clinical trust and diagnostic validation.

研究の動機と目的

Histological heterogeneity と限られたデータの中で正確な小児脳腫瘍分類を動機づける。
WSI、病理レポート、核レベルグラフという補完的モダリティを活用して診断性能を向上させる。
モダリティの寄与とモダル間相互作用をモデル化してサンプルレベルの解釈性を提供する。

提案手法

各モダリティをスライドレベル表現にエンコードする（画像は UNIv2、テキストは TITAN、グラフは GraphSAGE を用いた核グラフ）。
病理組織画像から核レベルのグラフを構築し、注意 MIL プーリングでグラフレベル特徴を取得。
5つの専門家（画像、テキスト、グラフ、シナジー、冗長性）を持つ相互作用意識型混合専門家(I2MoE) を使用。
ゲーティングネットワークを適用して最終予測のための専門家に対するサンプル依存ウェイトを計算。
専門家の専門化と解釈可能なゲーティングを促す分類損失と相互作用損失を組み合わせて訓練。
10-fold 交差検証を用いて内部 PBT データと外部 TCGA データでマクロ-F1 を主要指標として評価。

Figure 1: Overview of PathMoE . H&E WSIs, pathology reports, and nuclei graphs are encoded and fused via an interaction-aware mixture-of-experts module. An input-dependent gating network computes sample-specific weights to combine expert predictions into the final tumor classification. A vanilla fus

実験結果

リサーチクエスチョン

RQ1WSI、病理テキスト、核グラフを統合することは、画像のみのベースラインを超える小児脳腫瘍分類の改善につながるか。
RQ2モダリティ間の相互作用（単一モダリティ、シナジー、冗長性）は、サンプル毎の予測と解釈性にどのように影響するか。
RQ3テキストデータがノイズあるいは利用不可のとき、細胞グラフからのドメイン知識はロバスト性にとって重要か。
RQ4病理情報に基づくタスクにおいて、どのテキストエンコーダが最良の多モーダル統合性能を示すか。

主な発見

PathMoE はすべてのモダリティを用いると PBT のマクロ-F1 を 0.762（画像のみ EF W）から 0.799（EF WTG）へ改善。
TCGA では、画像にグラフ情報を追加するとマクロ-F1 が 0.668（EF W）から 0.709（EF WG）へ向上。
グラフモダリティは、テキストが信頼できないまたは利用不能な場合に特に性能を向上させる冗長でない構造的事前情報を提供。
テキストエンコーダの品質（ドメイン適合の TITAN）は PathMoE の性能を高め、EFWTG および SGWTG 構成で TITAN が最も強いマクロ-F1 を達成。
プログラムされた相互作用ウェイトは、難しいケースでグラフとテキストの寄与が画像のみの誤りを正すことがあることを示す、定性的な例と神経病理学者の検証によって裏付けられる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。