Skip to main content
QUICK REVIEW

[论文解读] UltrasoundAgents: Hierarchical Multi-Agent Evidence-Chain Reasoning for Breast Ultrasound Diagnosis

Yali Zhu, Kang Zhou|arXiv (Cornell University)|Mar 11, 2026
AI in cancer detection被引用 0
一句话总结

提出一个分层的双智能体框架(主智能体用于定位和诊断,子智能体用于细粒度属性)并实现解耦训练和轨迹自蒸馏,以生成可审计的证据并提升乳腺超声的 BI-RADS 与恶性预测。

ABSTRACT

Breast ultrasound diagnosis typically proceeds from global lesion localization to local sign assessment and then evidence integration to assign a BI-RADS category and determine benignity or malignancy. Many existing methods rely on end-to-end prediction or provide only weakly grounded evidence, which can miss fine-grained lesion cues and limit auditability and clinical review. To align with the clinical workflow and improve evidence traceability, we propose a hierarchical multi-agent framework, termed UltrasoundAgents. A main agent localizes the lesion in the full image and triggers a crop-and-zoom operation. A sub-agent analyzes the local view and predicts four clinically relevant attributes, namely echogenicity pattern, calcification, boundary type, and edge (margin) morphology. The main agent then integrates these structured attributes to perform evidence-based reasoning and output the BI-RADS category and the malignancy prediction, while producing reviewable intermediate evidence. Furthermore, hierarchical multi-agent training often suffers from error propagation, difficult credit assignment, and sparse rewards. To alleviate this and improve training stability, we introduce a decoupled progressive training strategy. We first train the attribute agent, then train the main agent with oracle attributes to learn robust attribute-based reasoning, and finally apply corrective trajectory self-distillation with spatial supervision to build high-quality trajectories for supervised fine-tuning, yielding a deployable end-to-end policy. Experiments show consistent gains over strong vision-language baselines in diagnostic accuracy and attribute agreement, together with structured evidence and traceable reasoning.

研究动机与目标

  • 模仿临床从粗到细的工作流程,通过将病变定位与属性感知及诊断分离来实现。
  • 提供一个可审计的证据链(ROI -> attributes -> BI-RADS/malignancy),具有鲁棒、可追溯的推理过程。
  • 通过一个oracle引导的课程化强化学习和轨迹自蒸馏,提升分层强化学习的训练稳定性。
  • 在公开的 BUS 数据集上展示改进的诊断准确率和属性一致性,并提升对 OOD 的泛化能力。

提出的方法

  • 一个两智能体架构:主智能体 A_M 用于全图病灶定位与证据整合,子智能体 A_S 在裁剪与放大视图上进行局部属性识别。
  • 三阶段训练:阶段1 使用 RL 训练 A_S 以预测四个临床属性( echogenicity、calcification、boundary type、edge)并生成可解释的痕迹。
  • 阶段2 使用课程 RL 在有 GT 属性的条件下训练 A_M,以稳定高层推理,降低感知噪声对推理的影响。
  • 阶段3 通过纠正轨迹自蒸馏进行轨迹 refinements 与有监督微调(SFT),以产生可部署的端到端策略。
  • 显式的 ROI -> attributes -> diagnosis 证据链,裁剪与放大将结构化证据送入主智能体。
  • 评估采用 AUROC、准确率、BI-RADS 准确率以及 Cohen's κ,并在多个 BUS 数据集上进行对比;消融研究包括 GTbox/GTattr 的上界以及定位分析。
Figure 1 : Hierarchical multi-agent architecture. The main agent analyzes the full image to localize the lesion, triggers a crop-and-zoom operation, and queries the sub-agent on the zoomed view to obtain structured attribute evidence. The main agent then integrates the global context and the attribu
Figure 1 : Hierarchical multi-agent architecture. The main agent analyzes the full image to localize the lesion, triggers a crop-and-zoom operation, and queries the sub-agent on the zoomed view to obtain structured attribute evidence. The main agent then integrates the global context and the attribu

实验结果

研究问题

  • RQ1相比端到端模型,分层多智能体系统是否能提高乳腺超声诊断的可解释性与可追溯性?
  • RQ2裁剪与放大并进行显式属性推理是否能提升局部化证据质量以及下游的 BI-RADS 与恶性预测?
  • RQ3oracle引导的课程 RL 与纠正性轨迹自蒸馏对训练稳定性和最终策略性能有何影响?
  • RQ4错误的主要来源(定位 vs 属性噪声)是什么,它们如何影响在域内和跨域的表现?

主要发现

  • 所提出的方法在同类基线中实现了最佳的域内诊断性能(AUC 0.741,Acc 0.813,Bi-Acc 0.515,κ 0.224)。
  • ROI -> attribute -> diagnosis 的证据链结合裁剪与放大,提升了属性证据质量与诊断一致性。
  • Oracle-guided Curriculum RL 显著提升性能与定位对齐;移除后整体 AUC 降至 0.535,κ 降至 0.018。
  • 纠正性轨迹自蒸馏显著提升 IoU 与诊断准确性,整体 IoU 从 0.299 提升到 0.610,AUC 从 0.726 提升到 0.741。
  • OA 分析显示 GTbox 与 GTattr 的上界分别达到 AUC 0.782 和 0.804,凸显定位与属性噪声是 BI-RADS 一致性的关键瓶颈。
  • 病灶裁剪通常比整图在属性 F1(边界、边缘、回声)上表现更高,支持裁剪与放大在获取细粒度证据上的有效性。
Figure 2 : Three-stage training. Stage 1: RL trains $A_{S}$ for attribute recognition. Stage 2: oracle-guided RL trains $A_{M}$ with GT attributes for stable reasoning. Stage 3: refine trajectories and distill via SFT for robustness.
Figure 2 : Three-stage training. Stage 1: RL trains $A_{S}$ for attribute recognition. Stage 2: oracle-guided RL trains $A_{M}$ with GT attributes for stable reasoning. Stage 3: refine trajectories and distill via SFT for robustness.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。