QUICK REVIEW

[論文レビュー] US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound

Ashwath Radhachandran, Vedrana Ivezić|arXiv (Cornell University)|Feb 22, 2026

Ultrasound Imaging and Elastography被引用数 0

ひとこと要約

US-JEPA は静的教師付き SALT ベースの JEPA 手法を用い、マスク埋め込み空間で潜在表現を学習し、UltraBench の8タスクで線形予測性能が高い。

ABSTRACT

Ultrasound (US) imaging poses unique challenges for representation learning due to its inherently noisy acquisition process. The low signal-to-noise ratio and stochastic speckle patterns hinder standard self-supervised learning methods relying on a pixel-level reconstruction objective. Joint-Embedding Predictive Architectures (JEPAs) address this drawback by predicting masked latent representations rather than raw pixels. However, standard approaches depend on hyperparameter-brittle and computationally expensive online teachers updated via exponential moving average. We propose US-JEPA, a self-supervised framework that adopts the Static-teacher Asymmetric Latent Training (SALT) objective. By using a frozen, domain-specific teacher to provide stable latent targets, US-JEPA decouples student-teacher optimization and pushes the student to expand upon the semantic priors of the teacher. In addition, we provide the first rigorous comparison of all publicly available state-of-the-art ultrasound foundation models on UltraBench, a public dataset benchmark spanning multiple organs and pathological conditions. Under linear probing for diverse classification tasks, US-JEPA achieves performance competitive with or superior to domain-specific and universal vision foundation model baselines. Our results demonstrate that masked latent prediction provides a stable and efficient path toward robust ultrasound representations.

研究の動機と目的

ノイズと斑状アーティファクトの影響を受ける超音波表現学習の頑健性とデータ効率を向上させる。
固定ドメイン専用教師を用いた latent 空間で動作する JEPA ベースの自己教師ありフレームワークを開発する。
ピクセルレベルの再構成依存を緩和し、潜在的な意味表現の予測に焦点を当てる。
公開超音波 foundation モデルを UltraBench 上で線形プロービングを用いて標準化評価する。

提案手法

SALT を採用：ドメイン特異的教師（URFM）を凍結し、安定した潜在ターゲットを提供する。
同一画像内のコンテキストブロックからターゲット埋め込みを予測するマスク付き潜在予測目的を使用する。
USrc（Ultrasound Region-Conditioning）を組み込み、マスキングを超音波有効領域に制限し非解剖内容を回避する。
凍結した教師埋め込みへのスムースL1距離を最小化する、コンテキストエンコーダ（ViT-B/16）と予測器を訓練する。
大規模な公開超音波コーパス（約4.73百万フレーム、49データセット）で事前学習する。
8分類タスクにわたり標準化された UltraBench 線形プローブで評価する。

Figure 1 : USrc-JEPA framework. Here we show the model training framework with USrc. URFM is the frozen teacher that extracts target embeddings. The student and predictor are jointly optimized with $\mathcal{L}_{US-JEPA}$ to align with the target.

実験結果

リサーチクエスチョン

RQ1静的教師付き SALT フレームワークは EMA ベースの JEPA やドメイン特異的ベースラインと比較して潜在空間の超音波表現を改善できるか。
RQ2US-JEPA は多様な超音波タスクで少数ショットの線形プロービングにおいてどのように性能を発揮するか。
RQ3学習した潜在空間は超音波画像に特有のアーティファクトや腐敗に頑健か。
RQ4ターゲット/コンテキストを超音波有効領域に限定する（USrc）ことは表現品質を改善するか。
RQ5公開の超音波 foundation モデルは線形プロービングを前提とした標準化 UltraBench ベンチマークでどのように比較されるか。

主な発見

US-JEPA および USrc-JEPA は UltraBench の8タスク中5タスクで最先端の線形プロービング性能を達成。
MMOTU（8クラスの卵巣腫瘍）では US-JEPA が 52.2% の macro F1 を達成し URFM を 9.5%上回る。
US-JEPA および USrc-JEPA はドメイン特有の腐敗、特に斑状ノイズに対して高い頑健性を示し、高腐敗レベルでベースラインを上回る。
少数ショット設定では labeled データ <10% の場合、URFM および USFM に比べて US-JEPA のマクロ F1 が最大で 18% 向上。
US-JEPA はドメイン特異的・普遍的ベースラインと競合し、しばしばそれらを上回る一方で、標準化された公開ベンチマーキングを可能にする。

Figure 2 : Distribution of pretraining data. To characterize the dataset composition at the organ level, we report the distribution of a. temporal sequences, including videos and volumes ( $n_{v}$ ), and b. individual static frames ( $n_{f}$ ).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。