QUICK REVIEW

[論文レビュー] A Semi-Supervised Framework for Breast Ultrasound Segmentation with Training-Free Pseudo-Label Generation and Label Refinement

Ruili Li, Jiayi Ding|arXiv (Cornell University)|Mar 6, 2026

AI in cancer detection被引用数 0

ひとこと要約

この論文は appearance-based prompts を用いて訓練不要の擬似ラベルを生成する半教師ありフレームワークを導入し、次にデュアル-teacher による精練と不確実性ガイド付きコントラスト学習を経て、非常に少ないラベル付き BUS 画像でほぼ完全な監視付き性能を達成します。

ABSTRACT

Semi-supervised learning (SSL) has emerged as a promising paradigm for breast ultrasound (BUS) image segmentation, but it often suffers from unstable pseudo labels under extremely limited annotations, leading to inaccurate supervision and degraded performance. Recent vision-language models (VLMs) provide a new opportunity for pseudo-label generation, yet their effectiveness on BUS images remains limited because domain-specific prompts are difficult to transfer. To address this issue, we propose a semi-supervised framework with training-free pseudo-label generation and label refinement. By leveraging simple appearance-based descriptions (e.g., dark oval), our method enables cross-domain structural transfer between natural and medical images, allowing VLMs to generate structurally consistent pseudo labels. These pseudo labels are used to warm up a static teacher that captures global structural priors of breast lesions. Combined with an exponential moving average teacher, we further introduce uncertainty entropy weighted fusion and adaptive uncertainty-guided reverse contrastive learning to improve boundary discrimination. Experiments on four BUS datasets demonstrate that our method achieves performance comparable to fully supervised models even with only 2.5% labeled data, significantly outperforming existing SSL approaches. Moreover, the proposed paradigm is readily extensible: for other imaging modalities or diseases, only a global appearance description is required to obtain reliable pseudo supervision, enabling scalable semi-supervised medical image segmentation under limited annotations.

研究の動機と目的

乳房超音波分割の注釈負担を軽減する。
訓練を要さず信頼性の高い擬似ラベルを生成するために vision–language モデルを活用する。
デュアル-teacher フレームワークと不確実性を取り入れたガイダンスで擬似ラベルを洗練する。
不確実領域での適応的コントラスト学習により境界の精度を向上させる。
ごく少数のラベル状態でも強力な性能を示し、データセット間の一般化を実証する。

提案手法

Appearance-Prompted Training-Free Pseudo-Label Generation (APPG) は appearance-based prompts を用いて VLM（Grounding DINO および SAM）を誘導し、訓練なしで粗い擬似マスクを生成する。
Static teacher warm-up: 擬似ラベルを用いて凍結した教師 T^A を訓練し、病変の粗い構造 priors を捉える。
Dual-teacher refinement: 動的 EMA 教師 T^B と student S を用い、Uncertainty–Entropy Weighted Fusion (UEWF) により両教師からの擬似ラベルを統合する。
Uncertainty-guided reverse contrastive learning (AURCL) は高不確実パッチで予測を反転させ、パッチレベルの対照損失を適用することで難易度の高い境界領域に焦点を当てる。
Loss の定式化は、ラベル付きデータの監視付き損失と、融合擬似ラベルからの教師なし損失およびコントラスト損失を組み合わせる（L = L_s + λ_u L_u + λ_c L_c）。
Adaptive uncertainty mechanism はピクセル単位のエントロピーとパッチレベルの平滑化を用いて擬似ラベルの融合に重みを付け、境界識別を改善する。

Figure 1: To illustrate how different textual prompts affect zero-shot knowledge transfer, we visualize bounding box generated from (c) medical terms (“tumor”), (d) radiological attributes (“high density”), and (e) appearance-based descriptions(“dark oval.dark round.dark lobulated”).

実験結果

リサーチクエスチョン

RQ1 appearance-guided VLM によって生成された訓練不要の擬似ラベルは、極めて少ないラベルで BUS 分割の信頼できる監視を提供できるか。
RQ2 不確実性に基づく融合と適応的コントラスト学習を備えたデュアル-teacher フレームワークは、単一教師 SSL 手法と比較して BUS 画像の分割品質を向上させるか。
RQ3 本提案手法は、異なるデバイスと取得設定を持つBUSデータセット間でどの程度一般化するか。
RQ4 各構成要素（APPG、静的教師のウォームアップ、UEWF、AURCL）が最終性能に与える影響はどれくらいか。
RQ5 ラベルが希少な場合（例：2.5%）に、手法は完全に監視された性能に近づくか。

主な発見

Method	Venue	Labeled	Unlabeled	Dice(%)	IoU(%)	Acc(%)
U‑Net (static teacher)	–	0	499	67.34	57.28	93.03
U‑Net	MICCAI	12 (2.5%)	0	50.00	39.71	92.35
U‑Net	MICCAI	51 (10%)	0	65.04	54.77	94.76
U‑Net	MICCAI	103 (20%)	0	70.94	61.79	95.06
U‑Net	MICCAI	517 (100%)	0	81.68	73.74	96.65
MT	NeurIPS’17	–	–	54.09	42.93	92.87
U2PL	CVPR’22	–	–	51.53	41.12	93.26
BCP	CVPR’23	–	–	58.93	49.48	93.89
MCF	CVPR’23	–	–	49.18	39.30	92.33
PH‑Net	CVPR’24	12 (2.5%)	97.5%	55.13	45.28	92.79
PH‑Net	CVPR’24	51 (10%)	97.5%	50.02	38.82	94.66
PH‑Net	CVPR’24	103 (20%)	80%	72.64	63.54	95.19
CSC‑PA	CVPR’25	–	–	58.78	45.97	93.68
PGCL	CVPR’23	–	–	54.26	43.31	92.88
Text‑semiseg	MICCAI’25	–	–	56.85	45.35	93.13
AaU‑ssm	MedIA’24	–	–	53.75	42.82	92.83
Ours (proposed)	–	2.5%	–	72.72	63.11	95.08
Ours (proposed)	–	13 (2.5%)	–	75.75	67.09	96.67

この手法は、ラベル付きデータがわずか2.5%の場合でも、完全監視モデルと競合する Dice、IoU、精度を達成する。
BUSI データセットでラベル2.5%の場合、Dice = 72.72% で、IoU/Accuracy も同様に強く、同ラベリング水準でこれまでの SSL ベースラインを上回る。
クロスデータセットの UBB ベンチマークでは、13 ラベル画像で Dice = 75.75% を達成し、前回の最高を 15.99% 上回り、100% ラベルで訓練した完全監視 UNet を Dice で上回る。
アブレーション実験では APPG が最大のゲインを提供し、デュアル-teacher 精練がさらに小さな改善を追加し、AURCL と UEWF が追加のゲインをもたらし、設計の妥当性を裏付ける。
ラベリング比率（2.5%、10%、20%）を通じて、手法は常に最先端の SSL 手法と比較して最高の Dice および IoU を示す。

Figure 2: Overview of the proposed semi-supervised BUS segmentation framework. The pipeline consists of two stages: (1) Appearance-Prompted Pseudo-Label Generation (APPG), where appearance-prompted vision–language models (VLMs) produce initial pseudo labels in a training-free manner; (2) Pseudo-Labe

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。