QUICK REVIEW

[논문 리뷰] Zero-shot-Learning Cross-Modality Data Translation Through Mutual Information Guided Stochastic Diffusion

Zihao Wang, Yingyu Yang|arXiv (Cornell University)|2023. 01. 31.

Cancer-related molecular mechanisms research인용 수 8

한 줄 요약

이 논문은 소스 도메인 학습 데이터 없이도 로컬-와이즈 상호 정보(LMI) 계층을 이용해 확산 기반 번역을 안내하는 제로샷 비지도 교차 모달 데이터 번역 방법인 MIDiffusion을 소개한다.

ABSTRACT

Cross-modality data translation has attracted great interest in image computing. Deep generative models ( extit{e.g.}, GANs) show performance improvement in tackling those problems. Nevertheless, as a fundamental challenge in image translation, the problem of Zero-shot-Learning Cross-Modality Data Translation with fidelity remains unanswered. This paper proposes a new unsupervised zero-shot-learning method named Mutual Information guided Diffusion cross-modality data translation Model (MIDiffusion), which learns to translate the unseen source data to the target domain. The MIDiffusion leverages a score-matching-based generative model, which learns the prior knowledge in the target domain. We propose a differentiable local-wise-MI-Layer ($LMI$) for conditioning the iterative denoising sampling. The $LMI$ captures the identical cross-modality features in the statistical domain for the diffusion guidance; thus, our method does not require retraining when the source domain is changed, as it does not rely on any direct mapping between the source and target domains. This advantage is critical for applying cross-modality data translation methods in practice, as a reasonable amount of source domain dataset is not always available for supervised training. We empirically show the advanced performance of MIDiffusion in comparison with an influential group of generative models, including adversarial-based and other score-matching-based models.

연구 동기 및 목표

학습 중에 페어링된 데이터나 소스 도메인 데이터가 필요하지 않은 교차 모달 번역의 필요성을 해결한다.
로컬-와이즈 상호 정보에 의해 안내되는 확산 기반 프레임워크를 제안하여 제로샷 번역을 가능하게 한다.
Conditioning을 위해 사이클-일관성, 적대적 학습, 또는 사전 학습된 제네레이터에 의존하지 않는다.
다양한 의학 영상 모달리티에 걸친 번역의 충실도(충실성 및 현실성) 향상을 보여준다.

제안 방법

교차 모달 번역을 위해 VE-SDE(Variance Exploding SDE) 기반 점수 기반 확산 모델을 채택한다.
원본 모달리티와 대상 모달리티 간의 통계적 유사성으로 확산 과정을 조건화하기 위해 차별 가능 로컬-와이즈 MI(LMI) 계층을 도입한다.
로컬 통계 의존성을 커널 밀도 추정치와 이웃 패치를 통해 측정하도록 LMI를 정의한다.
LMI 조건화를 순방향 교란 및 역 노이즈 제거 단계에 모두 포함시켜 소스 도메인 학습 데이터 없이 제로샷 가이드를 가능하게 한다.
정다면으로 LMI를 계산하는 실용적 연산자(정의 4–5 및 보행 1)로 GPU 친화적이고 계산 가능한 방법을 제공한다.
LMI 가이던스를 조건 신호로 포함하는 손실(식(12))로 s_theta 점수 네트워크를 학습하고 역 SDE(식(13))를 통해 샘플링을 수행한다.

실험 결과

연구 질문

RQ1학습 중에 소스 모달리티를 보지 않고도 제로샷 교차 모달 데이터 번역이 가능할까?
RQ2로컬-와이즈 상호 정보 가이던스가 GAN 기반 및 다른 확산 기반 기준선에 비해 번역의 현실성과 충실도를 향상시키는가?
RQ3다양한 의학 영상 모달리티 쌍(CT↔MR, T1↔FLAIR, PD↔T1)에서 MIDiffusion은 충실도와 현실성 측면에서 어떠한 성능을 보이는가?

주요 결과

데이터 세트	방법	모달리티	SSIM (Tar)	SSIM (Src)	MSE	MI	PSNR	FID
GoldAtlas	CycleGAN (sup, few-shot 2%)	CT→MR	0.04	0.03	614.02	1.16	20.53	202.43
GoldAtlas	CycleGAN (sup, few-shot 2%)	MR→CT	0.03	0.02	819.59	1.13	19.08	281.35
GoldAtlas	StyleGAN (unsup, inversion)	CT→MR	0.13	0.04	788.76	1.09	20.09	213.47
GoldAtlas	StyleGAN (unsup, inversion)	MR→CT	0.08	0.07	570.91	1.12	21.17	170.83
GoldAtlas	SDEdit (unsup)	CT→MR	0.003	0.01	766.40	1.11	19.50	237.27
GoldAtlas	SDEdit (unsup)	MR→CT	0.01	0.04	996.71	1.10	18.58	223.44
GoldAtlas	MIDiffusion (unsup)	CT→MR	0.06	0.11	523.18	1.08	21.66	245.82
GoldAtlas	MIDiffusion (unsup)	MR→CT	0.12	0.08	392.35	1.17	23.03	194.35
CuRIOUS	CycleGAN (sup, few-shot ~6%)	T1→FLAIR	-0.006	0.81	1747.13	1.08	16.04	186.59
CuRIOUS	CycleGAN (sup, few-shot ~6%)	FLAIR→T1	0.005	0.02	3145.05	1.05	13.82	331.89
CuRIOUS	StyleGAN (unsup, inversion)	T1→FLAIR	0.003	0.12	1880.62	1.04	15.83	261.47
CuRIOUS	StyleGAN (unsup, inversion)	FLAIR→T1	-0.003	0.19	1570.83	1.05	16.62	229.73
CuRIOUS	SDEdit (unsup)	T1→FLAIR	0.011	0.01	1558.22	1.04	16.42	131.70
CuRIOUS	SDEdit (unsup)	FLAIR→T1	0.005	0.01	2165.42	1.03	15.14	141.89
CuRIOUS	MIDiffusion (unsup)	T1→FLAIR	0.07	-0.08	1226.40	1.08	17.65	146.77
CuRIOUS	MIDiffusion (unsup)	FLAIR→T1	0.15	0.23	1175.11	1.08	18.02	157.98
IXI	CycleGAN (sup, few-shot 11%)	PD→T1	0.12	0.14	1154.19	1.17	17.65	141.95
IXI	CycleGAN (sup, few-shot 11%)	T1→PD	0.16	0.16	876.99	1.19	18.86	113.67
IXI	StyleGAN (unsup, inversion)	PD-T1	0.02	0.06	6609.13	1.08	10.17	266.52
IXI	StyleGAN (unsup, inversion)	T1→PD	0.21	0.37	2319.78	1.14	14.65	199.12
IXI	SDEdit (unsup)	PD-T1	0.09	0.06	1619.14	1.15	16.19	68.60
IXI	SDEdit (unsup)	T1→PD	0.10	0.06	1753.82	1.16	15.95	80.81
IXI	MIDiffusion (unsup)	PD-T1	0.11	0.19	1652.81	1.17	16.35	129.12
IXI	MIDiffusion (unsup)	T1→PD	0.18	0.26	1301.91	1.13	17.13	132.46

MIDiffusion은 여러 데이터셋에서 GAN 기반 및 확산 기반 기준선보다 더 높은 번역 충실도(더 나은 SSIM, 더 낮은 MSE, 더 높은 MI)와 경쟁력 있는 현실성(낮은 FID)을 보인다.
GoldAtlas 및 CuRIOUS 데이터셋에서 제로샷 비지도 MIDiffusion이 Few-shot CycleGAN보다 우수하여 강한 제로샷 일반화를 시사한다.
MIDiffusion은 소스 및 대상 도메인에 대한 SSIM을 더 잘 달성하고 번역 오차를 줄이며 SDEdit보다 현실성을 유지하는 경우가 많다.
GoldAtlas, CuRIOUS 및 IXI 데이터셋에서 MIDiffusion은 CycleGAN, StyleGAN 및 SDEdit 기준선에 비해 SSIM(Tar 및 Src), MSE, MI, PSNR, FID에서 우수하거나 경쟁적이다.
LMI 기반 조건화는 별도의 제네레이터나 테스트 시 역전이 없이도 의미론적 일관성을 제공한다.
이 방법은 보였지 않은 모달리티를 효과적으로 번역할 수 있지만 다수의 확산 단계에 의한 높은 반복 샘플링 비용이 필요하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.