QUICK REVIEW

[論文レビュー] Making Reconstruction FID Predictive of Diffusion Generation FID

Tongda Xu, Mingwei He|arXiv (Cornell University)|Mar 5, 2026

Advanced Neuroimaging Techniques and Applications被引用数 0

ひとこと要約

論文は、(diffusion)生成FID (gFID)と強く相関する単純な潜在空間内補間指標である interpolated FID (iFID) を提案し、再構成と生成のジレンマに対処します。rFID が改良フェーズの品質と相関し、iFID がナビゲーションフェーズの品質と整合すること、拡散一般化と幻覚に基づく説明、コード公開を示します。

ABSTRACT

It is well known that the reconstruction FID (rFID) of a VAE is poorly correlated with the generation FID (gFID) of a latent diffusion model. We propose interpolated FID (iFID), a simple variant of rFID that exhibits a strong correlation with gFID. Specifically, for each element in the dataset, we retrieve its nearest neighbor (NN) in the latent space and interpolate their latent representations. We then decode the interpolated latent and compute the FID between the decoded samples and the original dataset. Additionally, we refine the claim that rFID correlates poorly with gFID, by showing that rFID correlates with sample quality in the diffusion refinement phase, whereas iFID correlates with sample quality in the diffusion navigation phase. Furthermore, we provide an explanation for why iFID correlates well with gFID, and why reconstruction metrics are negatively correlated with gFID, by connecting to results in the diffusion generalization and hallucination. Empirically, iFID is the first metric to demonstrate a strong correlation with diffusion gFID, achieving Pearson linear and Spearman rank correlations approximately 0.85. The source code is provided in https://github.com/tongdaxu/Making-rFID-Predictive-of-Diffusion-gFID.

研究の動機と目的

VAE の再構成から拡散生成品質を予測する指標の必要性を動機づける。
rFID の単純な潜在空間補間変種である iFID を提案し、gFID との強い相関を実証する。
改良フェーズとナビゲーションフェーズにおける rFID と拡散サンプル品質の関係を洗練させる。
iFID が拡散性能と相関する理由、標準的な再構成指標が失敗しうる理由を説明する。
ImageNet 上で多様な VAEs と拡散モデルを対象に iFID を評価する。

提案手法

潜在-拡散設定 (VAE エンコーダ、デコーダ g、拡散ソルバー Φ) における rFID と gFID を定義する。
元画像とデコード済み補間潜在 ẑ = 0.5(z + NN(z)) との間の FID を iFID として導入する。ここで NN(z) は潜在空間における最近傍。
拡散軌跡とフェーズを跨ぐ rFID/iFID/gFID の相関 (PCC および SRCC) を評価する。
補間タイプ（線形、球状、マスク）、補間強度 α、最近傍集合のサイズを変えてロバスト性を検証する。
iFID が拡散品質と一致する理由を拡散一般化/幻覚の文献と結びつけて分析する。
再構成指標と非再構成損失（Diffusion Loss、EQ/SE/VF/GMM Loss）と iFID の比較を行う。

Figure 1: Left two plots : The rFID values of VAEs are uncorrelated, or even negatively correlated with, the gFID values of diffusion models. Right two plots : iFID metric exhibits a strong positive correlation with the gFID values of diffusion models.

実験結果

リサーチクエスチョン

RQ1iFID は VAEs 間で rFID よりも拡散 gFID のより強力で信頼性の高い代理指標を提供できるか。
RQ2改良フェーズとナビゲーションフェーズにおける rFID と iFID は拡散サンプル品質とどのように関連するか。
RQ3iFID が訓練データの補間と潜在空間構造の観点で gFID と相関する理由は何か。
RQ4潜在空間のどの性質（連結性、補間の妥当性）が拡散生成品質に影響を与えるか。

主な発見

iFID は拡散 gFID と強い相関を示す（Pearson および Spearman ≈0.85、モデルと設定を横断）。
rFID は改良フェーズの拡散サンプル品質と相関し、iFID はナビゲーションフェーズの品質と相関する。
再構成指標（PSNR、SSIM、LPIPS）は拡散 gFID と負の相関を示し、再構成-生成のジレンマを示す。
iFID は非再構成指標および拡散損失よりも gFID の予測に優れており、補間潜在表現の有効性を捉えている。
ロバストネス分析では、線形・球状・マスクの補間法、NN(z) のサブセットサイズ、トップK 最近傍の使用に対して iFID は安定であり、球状補間が最も高い相関を示す。
著者は iFID を拡散一般化と幻覚へ結びつける直感を提供し、潜在空間での補間が生成性能を reflect する理由を説明する。

Figure 2: The refinement and navigation phases are key components of the sampling process for SiT-XL trained with SD-VAE. In the refinement phase (small $t$ ), the sample generated from the noisy source is nearly identical to the source. In contrast, during the navigation phase (large $t$ ), the sam

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。