QUICK REVIEW

[论文解读] Understanding Task Aggregation for Generalizable Ultrasound Foundation Models

Fangyijie Wang, Tanya Akumu|arXiv (Cornell University)|Mar 18, 2026

Domain Adaptation and Few-Shot Learning被引用 0

一句话总结

该论文分析了任务聚合在统一的超声基础模型（M2DINO）中对27个任务的性能影响，数据规模与任务类型决定正向/负向迁移，所有任务的统一训练通常比临床分组训练更稳定。

ABSTRACT

Foundation models promise to unify multiple clinical tasks within a single framework, but recent ultrasound studies report that unified models can underperform task-specific baselines. We hypothesize that this degradation arises not from model capacity limitations, but from task aggregation strategies that ignore interactions between task heterogeneity and available training data scale. In this work, we systematically analyze when heterogeneous ultrasound tasks can be jointly learned without performance loss, establishing practical criteria for task aggregation in unified clinical imaging models. We introduce M2DINO, a multi-organ, multi-task framework built on DINOv3 with task-conditioned Mixture-of-Experts blocks for adaptive capacity allocation. We systematically evaluate 27 ultrasound tasks spanning segmentation, classification, detection, and regression under three paradigms: task-specific, clinically-grouped, and all-task unified training. Our results show that aggregation effectiveness depends strongly on training data scale. While clinically-grouped training can improve performance in data-rich settings, it may induce substantial negative transfer in low-data settings. In contrast, all-task unified training exhibits more consistent performance across clinical groups. We further observe that task sensitivity varies by task type in our experiments: segmentation shows the largest performance drops compared with regression and classification. These findings provide practical guidance for ultrasound foundation models, emphasizing that aggregation strategies should jointly consider training data availability and task characteristics rather than relying on clinical taxonomy alone.

研究动机与目标

激励并评估是否可以在不损害性能的前提下联合学习异构超声任务。
探索任务聚合策略如何与跨器官系统的训练数据规模相互作用。
开发M2DINO，一个具备任务条件的Mixture-of-Experts的多器官、多任务框架，以实现自适应能力。
系统性比较三种训练范式（任务特定、临床分组、全任务统一）在27个任务上的表现。

提出的方法

提出M2DINO，一个基于DINOv3的编码器，在最后六个Transformer层中放置任务条件的Mixture-of-Experts模块。
用共享的空间特征图表示所有任务，并为分割、检测、分类和回归提供任务特异的头部。
在CG和AU范式下，使用统一的多任务损失L = sum_t lambda_t L_t，其中任务特定损失包括分割的Dice、分类的交叉熵、回归的L1、以及检测损失。
在27项超声任务上评估三种训练范式（任务特定、临床分组、全任务统一），覆盖分割、分类、检测和回归。
在数据充足组和数据稀缺组之间比较性能，以分析迁移模式和负迁移风险。

实验结果

研究问题

RQ1哪些超声任务可以在不显著降低性能的情况下共同学习？
RQ2在临床分组与全任务统一训练下，训练数据规模如何影响正向或负向迁移？
RQ3任务类型（分割、分类、回归、检测）是否影响聚合结果？
RQ4在不同数据可用性下，统一的全任务方法是否比临床分组训练更稳定？
RQ5为设计具备多任务能力的超声基础模型，能得出哪些实际指南？

主要发现

聚合的有效性强烈受规模影响；数据充足的组在CG/AU下受益，而数据较少的组在CG下会出现负向迁移。
全任务统一训练相比CG，在跨组的表现更稳定，且在小数据集时更少出现大幅下降。
分割任务对聚合策略最为敏感，表现下降幅度大于回归或分类。
在产科数据量大的情况下，CG和AU优于TS；在乳腺和肺组中，CG常表现不佳而AU更具鲁棒性。
在27个任务中，AU通常带来更稳定的跨任务迁移，并且在数据稀缺环境中可对学习进行正则化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。