QUICK REVIEW

[論文レビュー] Zero-shot-Learning Cross-Modality Data Translation Through Mutual Information Guided Stochastic Diffusion

Zihao Wang, Yingyu Yang|arXiv (Cornell University)|Jan 31, 2023

Cancer-related molecular mechanisms research被引用数 8

ひとこと要約

The paper introduces MIDiffusion, a zero-shot unsupervised cross-modality data translation method that uses a local-wise mutual information (LMI) layer to guide diffusion-based translation without source-domain training data.

ABSTRACT

Cross-modality data translation has attracted great interest in image computing. Deep generative models ( extit{e.g.}, GANs) show performance improvement in tackling those problems. Nevertheless, as a fundamental challenge in image translation, the problem of Zero-shot-Learning Cross-Modality Data Translation with fidelity remains unanswered. This paper proposes a new unsupervised zero-shot-learning method named Mutual Information guided Diffusion cross-modality data translation Model (MIDiffusion), which learns to translate the unseen source data to the target domain. The MIDiffusion leverages a score-matching-based generative model, which learns the prior knowledge in the target domain. We propose a differentiable local-wise-MI-Layer ($LMI$) for conditioning the iterative denoising sampling. The $LMI$ captures the identical cross-modality features in the statistical domain for the diffusion guidance; thus, our method does not require retraining when the source domain is changed, as it does not rely on any direct mapping between the source and target domains. This advantage is critical for applying cross-modality data translation methods in practice, as a reasonable amount of source domain dataset is not always available for supervised training. We empirically show the advanced performance of MIDiffusion in comparison with an influential group of generative models, including adversarial-based and other score-matching-based models.

研究の動機と目的

トレーニング時にペアデータやソースドメインデータを必要とせず、クロスモダリティ翻訳のニーズに対応する。
ローカル指向MI（LMI）でガイドされた拡散ベースのフレームワークを提案し、ゼロショット翻訳を実現する。
conditioning のためにサイクル整合性、敵対的訓練、事前訓練済み生成器への依存を避ける。
複数の医療画像モダリティ間で翻訳の忠実度（忠実性と現実性）を向上させることを実証する。

提案手法

ソースモダリティとターゲットモダリティ間の統計的類似性を条件付けとして拡散過程を導く差分可能なLMI層を導入する。
ソースとターゲットモダリティの局所的な統計依存性をカーネル密度推定と近傍パッチで測るLMIを定義する。
forward perturbationとbackward denoisingの両方のステップにLMI条件付けを埋め込み、ソースドメイン訓練データなしでゼロショットのガイダンスを可能にする。
効率的なオペレータ（Definitions 4–5 and Proposition 1）を提供し、LMIを実行可能でGPUに優しい方法で計算する。
LMIガイダンスを Conditioning signal として含む損失（Equation 12）でスコアリングネットワーク s_theta を訓練し、逆SDE（Equation 13）によってサンプリングを行う。

実験結果

リサーチクエスチョン

RQ1訓練中にソースモダリティを見なくても、ゼロショットのクロスモダリティデータ翻訳は達成できるのか。
RQ2ローカルMIガイダンスはGANベースおよび他の拡散ベースベースラインと比べて翻訳の現実性と忠実性を改善するのか。
RQ3MIDiffusionはCT↔MR、T1↔FLAIR、PD↔T1など多様な医用モダリティ対において忠実性と現実性の観点でどのように性能を示すのか。

主な発見

Dataset	Method	Modalities	SSIM (Tar)	SSIM (Src)	MSE	MI	PSNR	FID
GoldAtlas	CycleGAN (sup, few-shot 2%)	CT→MR	0.04	0.03	614.02	1.16	20.53	202.43
GoldAtlas	CycleGAN (sup, few-shot 2%)	MR→CT	0.03	0.02	819.59	1.13	19.08	281.35
GoldAtlas	StyleGAN (unsup, inversion)	CT→MR	0.13	0.04	788.76	1.09	20.09	213.47
GoldAtlas	StyleGAN (unsup, inversion)	MR→CT	0.08	0.07	570.91	1.12	21.17	170.83
GoldAtlas	SDEdit (unsup)	CT→MR	0.003	0.01	766.40	1.11	19.50	237.27
GoldAtlas	SDEdit (unsup)	MR→CT	0.01	0.04	996.71	1.10	18.58	223.44
GoldAtlas	MIDiffusion (unsup)	CT→MR	0.06	0.11	523.18	1.08	21.66	245.82
GoldAtlas	MIDiffusion (unsup)	MR→CT	0.12	0.08	392.35	1.17	23.03	194.35
CuRIOUS	CycleGAN (sup, few-shot ~6%)	T1→FLAIR	-0.006	0.81	1747.13	1.08	16.04	186.59
CuRIOUS	CycleGAN (sup, few-shot ~6%)	FLAIR→T1	0.005	0.02	3145.05	1.05	13.82	331.89
CuRIOUS	StyleGAN (unsup, inversion)	T1→FLAIR	0.003	0.12	1880.62	1.04	15.83	261.47
CuRIOUS	StyleGAN (unsup, inversion)	FLAIR→T1	-0.003	0.19	1570.83	1.05	16.62	229.73
CuRIOUS	SDEdit (unsup)	T1→FLAIR	0.011	0.01	1558.22	1.04	16.42	131.70
CuRIOUS	SDEdit (unsup)	FLAIR→T1	0.005	0.01	2165.42	1.03	15.14	141.89
CuRIOUS	MIDiffusion (unsup)	T1→FLAIR	0.07	-0.08	1226.40	1.08	17.65	146.77
CuRIOUS	MIDiffusion (unsup)	FLAIR→T1	0.15	0.23	1175.11	1.08	18.02	157.98
IXI	CycleGAN (sup, few-shot 11%)	PD→T1	0.12	0.14	1154.19	1.17	17.65	141.95
IXI	CycleGAN (sup, few-shot 11%)	T1→PD	0.16	0.16	876.99	1.19	18.86	113.67
IXI	StyleGAN (unsup, inversion)	PD-T1	0.02	0.06	6609.13	1.08	10.17	266.52
IXI	StyleGAN (unsup, inversion)	T1→PD	0.21	0.37	2319.78	1.14	14.65	199.12
IXI	SDEdit (unsup)	PD-T1	0.09	0.06	1619.14	1.15	16.19	68.60
IXI	SDEdit (unsup)	T1→PD	0.10	0.06	1753.82	1.16	15.95	80.81
IXI	MIDiffusion (unsup)	PD-T1	0.11	0.19	1652.81	1.17	16.35	129.12
IXI	MIDiffusion (unsup)	T1→PD	0.18	0.26	1301.91	1.13	17.13	132.46

MIDiffusionは複数データセットで、GANベースおよび拡散ベースのベースラインよりも翻訳忠実度（SSIMの向上、MSEの低減、MIの向上）を達成し、現実性（FIDの低減）も競争力を持つ。
GoldAtlasおよびCuRIOUSデータセットにおいて、ゼロショットの非教師付きMIDiffusionはGoldAtlasとCuRIOUSの少-shot CycleGANを上回り、強いゼロショット一般化を示す。
MIDiffusionはSDEditよりもソース・ターゲットの両方に対するSSIMが改善され、翻訳誤差が小さく、現実性も競合的である。
GoldAtlas、CuRIOUS、IXIデータセット全体で、CycleGAN、StyleGAN、SDEditのベースラインと比較してSSIM（Tar/Src）、MSE、MI、PSNR、FIDの点で優位または競合的な性能を示す。
LMIガイダンスによる Conditioningは別個の生成器やテスト時の反転を必要とせず、意味的整合性を提供する。
見えないモダリティにも効果的に翻訳可能であるが、反復サンプリングコスト（数百の拡散ステップ）が高い。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。