QUICK REVIEW

[論文レビュー] US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound

Ashwath Radhachandran, Vedrana Ivezić|arXiv (Cornell University)|Feb 22, 2026

Ultrasound Imaging and Elastography被引用数 0

ひとこと要約

US-JEPA は静的教師付き SALT ベースの JEPA 手法を用い、マスク埋め込み空間で潜在表現を学習し、UltraBench の8タスクで線形予測性能が高い。

ABSTRACT

Ultrasound (US) imaging poses unique challenges for representation learning due to its inherently noisy acquisition process. The low signal-to-noise ratio and stochastic speckle patterns hinder standard self-supervised learning methods relying on a pixel-level reconstruction objective. Joint-Embedding Predictive Architectures (JEPAs) address this drawback by predicting masked latent representations rather than raw pixels. However, standard approaches depend on hyperparameter-brittle and computationally expensive online teachers updated via exponential moving average. We propose US-JEPA, a self-supervised framework that adopts the Static-teacher Asymmetric Latent Training (SALT) objective. By using a frozen, domain-specific teacher to provide stable latent targets, US-JEPA decouples student-teacher optimization and pushes the student to expand upon the semantic priors of the teacher. In addition, we provide the first rigorous comparison of all publicly available state-of-the-art ultrasound foundation models on UltraBench, a public dataset benchmark spanning multiple organs and pathological conditions. Under linear probing for diverse classification tasks, US-JEPA achieves performance competitive with or superior to domain-specific and universal vision foundation model baselines. Our results demonstrate that masked latent prediction provides a stable and efficient path toward robust ultrasound representations.

研究の動機と目的

ノイズと斑状アーティファクトの影響を受ける超音波表現学習の頑健性とデータ効率を向上させる。
固定ドメイン専用教師を用いた latent 空間で動作する JEPA ベースの自己教師ありフレームワークを開発する。
ピクセルレベルの再構成依存を緩和し、潜在的な意味表現の予測に焦点を当てる。
公開超音波 foundation モデルを UltraBench 上で線形プロービングを用いて標準化評価する。

提案手法

SALT を採用：ドメイン特異的教師（URFM）を凍結し、安定した潜在ターゲットを提供する。
同一画像内のコンテキストブロックからターゲット埋め込みを予測するマスク付き潜在予測目的を使用する。
USrc（Ultrasound Region-Conditioning）を組み込み、マスキングを超音波有効領域に制限し非解剖内容を回避する。
凍結した教師埋め込みへのスムースL1距離を最小化する、コンテキストエンコーダ（ViT-B/16）と予測器を訓練する。
大規模な公開超音波コーパス（約4.73百万フレーム、49データセット）で事前学習する。
8分類タスクにわたり標準化された UltraBench 線形プローブで評価する。

Figure 1 : USrc-JEPA framework. Here we show the model training framework with USrc. URFM is the frozen teacher that extracts target embeddings. The student and predictor are jointly optimized with $\mathcal{L}_{US-JEPA}$ to align with the target.

実験結果

リサーチクエスチョン

RQ1静的教師付き SALT フレームワークは EMA ベースの JEPA やドメイン特異的ベースラインと比較して潜在空間の超音波表現を改善できるか。
RQ2US-JEPA は多様な超音波タスクで少数ショットの線形プロービングにおいてどのように性能を発揮するか。
RQ3学習した潜在空間は超音波画像に特有のアーティファクトや腐敗に頑健か。
RQ4ターゲット/コンテキストを超音波有効領域に限定する（USrc）ことは表現品質を改善するか。
RQ5公開の超音波 foundation モデルは線形プロービングを前提とした標準化 UltraBench ベンチマークでどのように比較されるか。

主な発見

Model	AUL（Macro F1）	BUSBRA（Macro F1）	BUTTERFLY（Macro F1）	FATTY LIVER（Macro F1）	GBCU（Macro F1）	MMOTU（Macro F1）	POCUS（Macro F1）	TN5000（Macro F1）
DINOv3	64.3 ± 0.6	70.9 ± 1.7	91.7 ± 0.4	55.8 ± 5.5	61.7 ± 0.5	37.2 ± 0.6	91.4 ± 0.4	67.5 ± 0.4
I-JEPA	61.5 ± 1.1	71.2 ± 4.0	90.5 ± 0.6	54.8 ± 1.6	53.7 ± 0.4	35.3 ± 0.6	88.1 ± 0.4	68.9 ± 0.2
UltraSAM	62.6 ± 3.1	70.2 ± 3.1	89.6 ± 2.4	66.9 ± 3.3	43.5 ± 4.9	39.7 ± 1.8	87.3 ± 2.1	63.9 ± 2.0
SAMUS	40.2 ± 0.9	65.9 ± 0.3	91.5 ± 0.0	42.1 ± 0.0	48.8 ± 0.3	20.4 ± 0.2	76.2 ± 0.1	51.7 ± 0.0
EchoCare	49.2 ± 2.4	64.4 ± 0.0	84.1 ± 0.7	42.1 ± 0.0	36.2 ± 0.5	21.1 ± 0.1	73.8 ± 3.9	49.8 ± 3.8
USF-MAE	58.1 ± 1.4	62.9 ± 0.5	91.1 ± 0.3	42.1 ± 0.0	45.9 ± 0.3	28.7 ± 0.3	90.1 ± 0.0	56.3 ± 1.1
USFM	61.6 ± 1.2	74.6 ± 0.5	92.4 ± 0.3	73.6 ± 8.8	67.4 ± 0.6	33.8 ± 0.3	85.7 ± 0.5	65.0 ± 2.6
URFM	71.5 ± 1.1	69.5 ± 2.2	92.1 ± 0.4	82.6 ± 6.0	59.1 ± 1.7	42.7 ± 0.4	91.7 ± 0.3	77.4 ± 0.4
US-JEPA	69.6 ± 1.5	73.8 ± 1.1	90.8 ± 0.3	82.5 ± 1.1	67.0 ± 1.4	52.2 ± 0.2	93.1 ± 0.0	73.1 ± 0.7
USrc-JEPA	67.6 ± 0.5	76.0 ± 1.2	91.5 ± 0.6	89.2 ± 0.9	70.2 ± 0.5	46.8 ± 0.2	92.5 ± 0.1	70.8 ± 1.3

US-JEPA および USrc-JEPA は UltraBench の8タスク中5タスクで最先端の線形プロービング性能を達成。
MMOTU（8クラスの卵巣腫瘍）では US-JEPA が 52.2% の macro F1 を達成し URFM を 9.5%上回る。
US-JEPA および USrc-JEPA はドメイン特有の腐敗、特に斑状ノイズに対して高い頑健性を示し、高腐敗レベルでベースラインを上回る。
少数ショット設定では labeled データ <10% の場合、URFM および USFM に比べて US-JEPA のマクロ F1 が最大で 18% 向上。
US-JEPA はドメイン特異的・普遍的ベースラインと競合し、しばしばそれらを上回る一方で、標準化された公開ベンチマーキングを可能にする。

Figure 2 : Distribution of pretraining data. To characterize the dataset composition at the organ level, we report the distribution of a. temporal sequences, including videos and volumes ( $n_{v}$ ), and b. individual static frames ( $n_{f}$ ).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。