QUICK REVIEW

[論文レビュー] Act Like a Pathologist: Tissue-Aware Whole Slide Image Reasoning

Wentao Huang, Weimin Lyu|arXiv (Cornell University)|Feb 28, 2026

Multimodal Machine Learning Applications被引用数 0

ひとこと要約

HistoSelectは病理WSI VQAのための階層的で質問 guided なパッチ選択フレームワークを導入し、病理医の組織認識検索を再現して視覚トークンを約70%削減し、複数データセットで最新の精度を達成します。

ABSTRACT

Computational pathology has advanced rapidly in recent years, driven by domain-specific image encoders and growing interest in using vision-language models to answer natural-language questions about diseases. Yet, the core problem behind pathology question-answering remains unsolved, considering that a gigapixel slide contains far more information than necessary for a given question. Pathologists naturally navigate tissue and morphology complexity by scanning broadly, and zooming in selectively according to the clinical questions. Current models, in contrast, rely on uniform patch sampling or broad attention maps, often attending equally to irrelevant regions while overlooking key visual evidence. In this work, we try to bring models closer to how humans actually examine slides. We propose a question-guided, tissue-aware, and coarse-to-fine retrieval framework, HistoSelect, that consists of two key components: a group sampler that identifies question-relevant tissue regions, followed by a patch selector that retrieves the most informative patches within those regions. By selecting only the most informative patches, our method becomes significantly more efficient: reducing visual token usage by 70% on average, while improving accuracy across three pathology QA tasks. Evaluated on 356,000 question-answer pairs, our approach outperforms existing methods and produces answers grounded in interpretable, pathologist-consistent regions. Our results suggest that bringing human-like search and attention patterns into WSI reasoning is a promising direction for building practical and reliable pathology VLMs.

研究の動機と目的

組織に応じた問いかけ駆動の方法でgigapixel WSIsを推論させる動機づけ。
病理医と連携して組織タイププロンプトを定義し、意味的WSI分割を行う。
質問に条件付けたグループサンプリングとパッチ選択の二段階を開発し、質問に関連するパッチを特定する。
情報ボトルネックに基づく目的を用いて、グループレベルとパッチレベルの疎性と関連性を強制する。
病理医評価と公開ベンチマークを通じて解釈性と臨床的信頼性を示す。

提案手法

病理医設計のプロンプトとCLIP様の組織セグメンテーションによりWSIsをM個の組織グループに分割する。
各組織グループ内のパッチ特徴からグループプロトタイプを計算し、問いかけに条件付けてグループレベルのサンプリング率を予測する。
アクティブな組織グループ内で問いかけ関連性に基づいてパッチをランク付けし、学習された確率に基づいて上位kパッチを選択する。
選択を階層的情報ボトルネックとしてモデル化し、グループとパッチ選択の変分後方確率と回答用のニューラルLLMデコーダを用いる。
学習 priors に基づくVQA目的に加え、グループとパッチの圧縮項をKLダイバージェンスで組み合わせた損失を最適化する。
離散的トークン選択の微分可能性を保つためにストレートスルー推定器を用いる。

実験結果

リサーチクエスチョン

RQ1WSIにおいて臨床的問いかけに関連する組織領域をどう特定するか。
RQ2二段階の問いかけ駆動パッチ選択はトークン負荷を削減しつつ答えの正確性を保てるか。
RQ3情報ボトルネック目的は解釈性を高め、WSI推論の冗長性を減らすか。
RQ4病理医と整合した組織プロンプトと選択は解釈可能で根拠のある推論領域を生み出すか。

主な発見

HistoSelectは視覚トークン使用量を平均で70%削減しつつ、三つの病理QAタスクで精度を維持または向上させた。
三つのベンチマーク（SlideBench-VQA、WSI-Bench、院内卵巣データ）で閉じたVQAの最新精度を達成し、平均スコアは83.80%。”
WSI-BenchでのオープンエンドVQAはレポート生成の最高BLEUとROUGE-L指標を示し、ドメイン特有のVQA指標も最大で6中5を達成。
病理医評価で組織セグメンテーション精度に高い一致が見られ、モデルが選択したパッチが稀で診断的に関連していると評価。
アブレーション研究によりグループサンプラーとパッチセレクタの双方の重要性が示され、中程度のトークン予算（例: 約5千トークン）で最高性能を得られることが分かった。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。