Skip to main content
QUICK REVIEW

[论文解读] Making Reconstruction FID Predictive of Diffusion Generation FID

Tongda Xu, Mingwei He|arXiv (Cornell University)|Mar 5, 2026
Advanced Neuroimaging Techniques and Applications被引用 0
一句话总结

论文提出了插值FID(iFID),一种简单的潜在空间插值度量,与扩散模型生成FID(gFID)高度相关,解决了重建-生成的两难。研究表明 rFID 与细化阶段质量相关,而 iFID 与导航阶段质量对齐,并给出基于扩散泛化与幻觉的解释,且公布了代码。

ABSTRACT

It is well known that the reconstruction FID (rFID) of a VAE is poorly correlated with the generation FID (gFID) of a latent diffusion model. We propose interpolated FID (iFID), a simple variant of rFID that exhibits a strong correlation with gFID. Specifically, for each element in the dataset, we retrieve its nearest neighbor (NN) in the latent space and interpolate their latent representations. We then decode the interpolated latent and compute the FID between the decoded samples and the original dataset. Additionally, we refine the claim that rFID correlates poorly with gFID, by showing that rFID correlates with sample quality in the diffusion refinement phase, whereas iFID correlates with sample quality in the diffusion navigation phase. Furthermore, we provide an explanation for why iFID correlates well with gFID, and why reconstruction metrics are negatively correlated with gFID, by connecting to results in the diffusion generalization and hallucination. Empirically, iFID is the first metric to demonstrate a strong correlation with diffusion gFID, achieving Pearson linear and Spearman rank correlations approximately 0.85. The source code is provided in https://github.com/tongdaxu/Making-rFID-Predictive-of-Diffusion-gFID.

研究动机与目标

  • 说明需要一种可以从VAE重建预测扩散生成质量的度量;
  • 提出一个简单的潜在空间线性插值变体的rFID(iFID),并证明其与gFID的强相关性;
  • 细化理解 rFID 如何在细化阶段与导航阶段的扩散样本质量相关;
  • 解释为何 iFID 与扩散性能相关,以及为何标准重建度量可能失效;
  • 在 ImageNet 上对多种VAE和扩散模型评估 iFID。

提出的方法

  • 在潜在扩散设置下定义 rFID 和 gFID(VAE 编码器,解码器 g,扩散求解器 Φ);
  • 将 iFID 定义为原始图像与解码的插值潜在向量 ẑ = 0.5(z + NN(z)) 之间的 FID,其中 NN(z) 是潜在空间中的最近邻;
  • 评估 rFID/iFID/gFID 在扩散轨迹和阶段中的相关性(PCC 和 SRCC);
  • 对插值类型(线性、球形、掩模)、插值强度 α、最近邻集合大小进行消融测试以检验鲁棒性;
  • 分析为何 iFID 通过联系扩散泛化/幻觉文献来反映扩散质量;
  • 将 iFID 与重建度量和非重建损失(Diffusion Loss、EQ/SE/VF/GMM Loss)进行比较。
Figure 1: Left two plots : The rFID values of VAEs are uncorrelated, or even negatively correlated with, the gFID values of diffusion models. Right two plots : iFID metric exhibits a strong positive correlation with the gFID values of diffusion models.
Figure 1: Left two plots : The rFID values of VAEs are uncorrelated, or even negatively correlated with, the gFID values of diffusion models. Right two plots : iFID metric exhibits a strong positive correlation with the gFID values of diffusion models.

实验结果

研究问题

  • RQ1iFID 是否比 rFID 对扩散 gFID 提供更强、更可靠的代理?
  • RQ2在细化与导航阶段,rFID 和 iFID 如何与扩散样本质量相关?
  • RQ3为何 iFID 在训练数据和潜在空间结构的插值方面与 gFID 相关?
  • RQ4哪些潜在空间属性(连通性、插值有效性)会影响扩散生成质量?

主要发现

  • iFID 与扩散 gFID 的相关性强(Pearson 与 Spearman ≈0.85),跨模型与设置保持一致;
  • rFID 在细化阶段与扩散样本质量相关,而 iFID 在导航阶段相关;
  • 重建度量(PSNR、SSIM、LPIPS)与扩散 gFID 负相关,体现重建-生成两难;
  • iFID 在预测 gFID 方面优于非重建度量和扩散损失,表明其捕捉了插值潜在表示的有效性;
  • 鲁棒性分析显示 iFID 对插值方法(线性、球形、掩模)、NN(z) 的子集大小以及使用 top-K 邻居都较为稳定;球形插值获得最高相关性;
  • 作者提供了将 iFID 与扩散泛化和幻觉联系的直觉解释,阐明潜在空间插值如何反映生成性能。
Figure 2: The refinement and navigation phases are key components of the sampling process for SiT-XL trained with SD-VAE. In the refinement phase (small $t$ ), the sample generated from the noisy source is nearly identical to the source. In contrast, during the navigation phase (large $t$ ), the sam
Figure 2: The refinement and navigation phases are key components of the sampling process for SiT-XL trained with SD-VAE. In the refinement phase (small $t$ ), the sample generated from the noisy source is nearly identical to the source. In contrast, during the navigation phase (large $t$ ), the sam

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。