QUICK REVIEW

[論文レビュー] HAAF: Hierarchical Adaptation and Alignment of Foundation Models for Few-Shot Pathology Anomaly Detection

Chunze Yang, Wenjie Zhao|arXiv (Cornell University)|Jan 24, 2026

Anomaly Detection Techniques and Applications被引用数 0

ひとこと要約

HAAF は階層的適応と階層横断の整合性フレームワークを導入し、テキストプロンプトを ROI 視覚情報に順次結び付け、その後視覚エンコーダを few-shot 病理異常検知へ導くことで、特にドメイン特化の病理バックボーンと共に最先端の結果を達成します。

ABSTRACT

Precision pathology relies on detecting fine-grained morphological abnormalities within specific Regions of Interest (ROIs), as these local, texture-rich cues - rather than global slide contexts - drive expert diagnostic reasoning. While Vision-Language (V-L) models promise data efficiency by leveraging semantic priors, adapting them faces a critical Granularity Mismatch, where generic representations fail to resolve such subtle defects. Current adaptation methods often treat modalities as independent streams, failing to ground semantic prompts in ROI-specific visual contexts. To bridge this gap, we propose the Hierarchical Adaptation and Alignment Framework (HAAF). At its core is a novel Cross-Level Scaled Alignment (CLSA) mechanism that enforces a sequential calibration order: visual features first inject context into text prompts to generate content-adaptive descriptors, which then spatially guide the visual encoder to spotlight anomalies. Additionally, a dual-branch inference strategy integrates semantic scores with geometric prototypes to ensure stability in few-shot settings. Experiments on four benchmarks show HAAF significantly outperforms state-of-the-art methods and effectively scales with domain-specific backbones (e.g., CONCH) in low-resource scenarios.

研究の動機と目的

Vision-Language 基盤モデルにおける ROI レベルの病理異常検知時の粒度不一致を解決する。
ROI 視覚コンテキストに意味的プロンプトをグラウンドする段階的なクロスモーダル適応パイプラインを提案する。
意味的整合と幾何学的プロトタイプを組み合わせて頑健な few-shot 異常検知を実現する。
HAAF を CONCH のようなドメイン特化病理バックボーンへ拡張可能性を示す。
複数の組織病理データセットにわたる包括的な ROI レベル異常検知ベンチマークを提供する。

提案手法

軽量な視覚・テキストアダプタを用いた階層的 intra-modal 適応によりタスク特異的なマルチスケール埋め込みを取得する。
Cross-Level Scaled Alignment (CLSA) は、Vision-to-Text コンテキスト注入を順次行い、その後 Text-to-Vision の意味的指示を Multi-Head Cross-Attention で実現する。
Dual-Branch Inference はパラメトリック意味スコアとノンパラメトリックプロトタイプ距離スコアを融合して few-shot での堅牢な判断を行う。

実験結果

リサーチクエスチョン

RQ1ROI グラウンドされ、順次的なクロスモーダル整合が few-shot 設定での病理異常の微細な局所化を改善できるか。
RQ2CLSA は ROI レベルの異常検知において並列融合戦略より優れるか。
RQ3CONCH のようなドメイン特化病理バックボーンを用いた場合の HAAF の拡張性はどうか。
RQ4意味的整合と幾何プロトタイプの組み合わせは少数サポートセットにおけるプロトタイプ汚染に対して頑健性を提供するか。

主な発見

HAAF は CONCH バックボーンを特に用いた4-shot設定で、4つの組織病理ベンチマーク（乳がん・前立腺・結腸直腸）で最先端の方法を継続的に上回る。
意味的潜在能力を解放する CLSA メカニズムは並列融合ベースラインより著しい性能向上をもたらす。
CONCH バックボーンを用いた HAAF は、測定データセットすべてで新しい SOTA を達成（HIS で AUC 最大 91.97%、SICAPv2 で 94.05%、NCT-CRC で 90.25%、BRACS で 83.53%）。
デュアルブランチ戦略は few-shot の不安定性とプロトタイプ汚染に対して頑健性を提供し、異なる shot 数（K ∈ {2,4,8,16}）でも高い性能を維持する。
アブレーション研究は逐次的 V→T→V 相互作用の必須性を確認し、標準的な PEFT 手法（Adapter、LoRA）は CLSA 搭載 HAAF によって上回られることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。