QUICK REVIEW

[論文レビュー] Hierarchical Text-Guided Brain Tumor Segmentation via Sub-Region-Aware Prompts

Bahram Mohammadi, Ta Duc Huy|arXiv (Cornell University)|Mar 22, 2026

Brain Tumor Detection and Classification被引用数 0

ひとこと要約

TextCSP はソフトキャスケードデコーダー、サブリージョン認識プロンプト、テキストセマンティックチャネル調整を用いた階層的テキスト誘導脳腫瘍セグメンテーションモデルを導入し、TextBraTS における WT、TC、ET のセグメンテーションを改善する。Dice と HD95 で平均的に最先端を超える。

ABSTRACT

Brain tumor segmentation remains challenging because the three standard sub-regions, i.e., whole tumor (WT), tumor core (TC), and enhancing tumor (ET), often exhibit ambiguous visual boundaries. Integrating radiological description texts with imaging has shown promise. However, most multimodal approaches typically compress a report into a single global text embedding shared across all sub-regions, overlooking their distinct clinical characteristics. We propose TextCSP (text-modulated soft cascade architecture), a hierarchical text-guided framework that builds on the TextBraTS baseline with three novel components: (1) a text-modulated soft cascade decoder that predicts WT->TC->ET in a coarse-to-fine manner consistent with their anatomical containment hierarchy. (2) sub-region-aware prompt tuning, which uses learnable soft prompts with a LoRA-adapted BioBERT encoder to generate specialized text representations tailored for each sub-region; (3) text-semantic channel modulators that convert the aforementioned representations into channel-wise refinement signals, enabling the decoder to emphasize features aligned with clinically described patterns. Experiments on the TextBraTS dataset demonstrate consistent improvements across all sub-regions against state-of-the-art methods by 1.7% and 6% on the main metrics Dice and HD95.

研究の動機と目的

放射線科のテキストを活用して多領域脳腫瘍セグメンテーションを改善する動機づけ。
ET ⊆ TC ⊆ WT という解剖階層を活用することで、単一出力ヘッドとグローバルテキスト埋め込みの限界に対処する。
リソース効率の高い多要素フレームワークを開発し、言語的手掛かりをサブリージョンセグメンテーションに整合させる。

提案手法

解剖的包含を反映した三段階の逐次的ヘッド（WT、TC、ET）を持つテキスト変調付きソフトキャスケードデコーダを提案。
LoRA適応済みBioBERTとサブリージョン毎のソフトプロンプトを用いたサブリージョン認識プロンプトチューニングで専門的なテキスト表現を生成。
ブランチ固有の言語的 priors を用いてデコーダ特徴を洗練するテキストセマンティックチャネルモジュレーター（SE風）を組み込む。
TextBraTS ベースラインを踏襲（Swin Transformer 視覚エンコーダ、BioBERT テキストエンコーダ、クロスアテンション融合、U-Net デコーダ）。
テキストエンコーダ（クエリ／バリュー投影）、小さなソフトプロンプト（K=4）、テキスト条件付き SE モジュールで LoRA によって訓練； SAM と SGD で最適化。

実験結果

リサーチクエスチョン

RQ1階層的テキスト誘導アーキテクチャはセグメンテーション時に ET ⊆ TC ⊆ WT の解剖包含を強制できるか。
RQ2サブリージョン認識プロンプトと LoRA 適応は、グローバルテキスト埋め込みと比べて WT、TC、ET のテキスト–画像融合を改善するか。
RQ3テキストセマンティックチャネルモジュレーターは、領域特異的特徴マップへ言語的 priors を注入することで追加の利得を提供するか。

主な発見

方法	ET Dice	WT Dice	TC Dice	Avg Dice	ET HD95	WT HD95	TC HD95	Avg HD95
3D-UNet	80.4	87.3	81.6	83.1	6.11	10.51	8.93	8.17
nnU-Net	82.2	87.5	82.6	84.1	4.27	11.90	8.52	8.23
SegResNet	80.9	88.4	82.3	83.8	6.18	7.28	7.41	6.95
Swin UNETR	81.0	89.5	80.8	83.8	5.95	8.23	7.03	7.07
Nestedformer	82.6	89.5	80.2	84.1	5.08	10.51	8.93	8.17
TextBraTS	83.3	89.9	82.8	85.3	4.58	5.48	5.34	5.13
TextBraTS †	82.8	89.6	82.5	84.9	5.28	8.59	6.77	6.88
TextCSP (Ours)	85.3	90.7	85.1	87.0	3.95	4.98	5.51	4.81

TextCSP は TextBraTS で平均 Dice 87.0% の最先端を達成し、TextBraTS より 1.7% 高い。
TextCSP は平均 HD95 が 4.81 mm で最高を達成し、ベースラインより約 0.32 mm 改善。
TC の利得は TextCSP によってサブリージョン間で最も大きい（Dice で +2.6%）。
アブレーションにより、ソフトキャスケード、サブリージョンプロンプト、LoRA、テキストモジュレーションを含む全モデルが最大の Dice（87.0%）と最小の HD95（4.81 mm）を示す。
逐次的 WT→TC→ET キャスケードは並列・部分キャスケード戦略より全体の Dice（87.0%）で優れる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。