QUICK REVIEW

[论文解读] Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry

Yong-Hyun Park, Mingi Kwon|arXiv (Cornell University)|Jul 24, 2023

Advanced Neuroimaging Techniques and Applications被引用 13

一句话总结

论文使用回拉度量分析扩散模型的潜在空间，推导局部潜在基，实现在单步的 x 空间编辑，并研究随时间步和提示的几何演化。

ABSTRACT

Despite the success of diffusion models (DMs), we still lack a thorough understanding of their latent space. To understand the latent space $\mathbf{x}_t \in \mathcal{X}$, we analyze them from a geometrical perspective. Our approach involves deriving the local latent basis within $\mathcal{X}$ by leveraging the pullback metric associated with their encoding feature maps. Remarkably, our discovered local latent basis enables image editing capabilities by moving $\mathbf{x}_t$, the latent space of DMs, along the basis vector at specific timesteps. We further analyze how the geometric structure of DMs evolves over diffusion timesteps and differs across different text conditions. This confirms the known phenomenon of coarse-to-fine generation, as well as reveals novel insights such as the discrepancy between $\mathbf{x}_t$ across timesteps, the effect of dataset complexity, and the time-varying influence of text prompts. To the best of our knowledge, this paper is the first to present image editing through $\mathbf{x}$-space traversal, editing only once at specific timestep $t$ without any additional training, and providing thorough analyses of the latent structure of DMs. The code to reproduce our experiments can be found at https://github.com/enkeejunior1/Diffusion-Pullback.

研究动机与目标

理解扩散模型（DMs）潜在空间的目的不仅限于前向噪声预测。
引入黎曼几何框架，通过回拉度量在 X 中定义局部潜在基。
在固定时间步通过沿着发现的基向量遍历 X 来实现图像编辑。
分析潜在几何在扩散时间步和不同文本提示下的演化。
展示通过单步操作即可实现编辑，无需额外训练。

提出的方法

使用特征空间 H（U-Net 的瓶颈）的欧几里得结构与 X 和 H 之间的雅可比 Jx 来定义回拉度量。
将局部潜在基 {vi} 计算为 Jx 的右上奇异向量（通过 SVD 或幂法）。
使用 x-space 指导通过沿基向量扰动并使用 epsilon-模型差异来编辑潜在 x：x̃XG = x + γ[ϵθ(x+v) − ϵθ(x)]。
在 H 中应用平行传 transports，将局部基向量在不同 x 样本的切空间之间转移，以实现跨样本编辑。
执行 DDIM 逆变和生成，以实现无需额外训练的编辑。
可选地将基条件化于文本提示，以获得语义上有意义的编辑方向。

Figure 1: Conceptual illustration of local geometric structure. (a) The local basis $\{\mathbf{v}_{1},\mathbf{v}_{2},\cdots\}$ of the local latent subspace $\mathcal{T}_{{\mathbf{x}}_{t}}$ within the latent space $\mathcal{X}$ is interlinked with the local basis $\{\mathbf{u}_{1},\mathbf{u}_{2},\cdo

实验结果

研究问题

RQ1如何为扩散模型的潜在空间 X 构建有意义的局部几何结构？
RQ2是否可以通过回拉几何发现的局部潜在基实现无额外训练的语义上有意义的图像编辑？
RQ3潜在结构在扩散时间步中如何演化，并且随数据集复杂度与提示而异？
RQ4在特征空间中通过平行传输跨样本转移编辑方向的程度如何？
RQ5文本条件如何影响潜在/切空间的几何？

主要发现

可以通过 X 与 H 之间的雅可比所定义的回拉度量找到局部潜在基。
沿着基向量前进在给定时间步上即可实现有语义意义的图像编辑，无需额外训练。
在生成过程中，潜在基从低频分量向高频分量转变，验证了粗到细的行为。
不同样本的切空间在扩散进行时变得更不相似，并且依赖于数据集的复杂度。
相似的提示得到相似的切空间，而提示的影响在后期时间步减弱。
在 H 中的平行传输使得在切空间对齐充分时可以跨样本转移编辑方向。

Figure 2: Image editing with the discovered latent basis. (a) Schematic depiction of our image editing procedure. ① An input image is subjected to DDIM inversion, resulting in an initial noisy sample $\mathbf{x}_{T}$ . ② The sample $\mathbf{x}_{T}$ is progressively denoised until reaching the point

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。