QUICK REVIEW

[论文解读] Structural Pruning for Diffusion Models

Gongfan Fang, Xinyin Ma|arXiv (Cornell University)|May 18, 2023

Music and Audio Processing被引用 17

一句话总结

Diff-Pruning 是基于泰勒展开的结构化剪枝方法，通过剪除时间步和权重来压缩预训练扩散模型，在保持生成行为的同时实现约 50% 的 FLOPs 下降，且训练成本仅为原始成本的 10–20%。

ABSTRACT

Generative modeling has recently undergone remarkable advancements, primarily propelled by the transformative implications of Diffusion Probabilistic Models (DPMs). The impressive capability of these models, however, often entails significant computational overhead during both training and inference. To tackle this challenge, we present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones, without the need for extensive re-training. The essence of Diff-Pruning is encapsulated in a Taylor expansion over pruned timesteps, a process that disregards non-contributory diffusion steps and ensembles informative gradients to identify important weights. Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method: 1) Efficiency: it enables approximately a 50\% reduction in FLOPs at a mere 10\% to 20\% of the original training expenditure; 2) Consistency: the pruned diffusion models inherently preserve generative behavior congruent with their pre-trained models. Code is available at \url{https://github.com/VainF/Diff-Pruning}.

研究动机与目标

动机：推动对扩散概率模型（DPMs）进行压缩，以降低训练和推理开销。
提出一个专门针对扩散模型的剪枝方法（Diff-Pruning）。
开发基于泰勒展开的准则，用于识别重要权重和需剪枝的时间步。
展示剪枝在多样数据集上能够保持甚至提升生成质量和一致性。

提出的方法

把模型剪枝视为移除整体权重子结构，以得到一个稀疏的参数矩阵。
使用每个时间步损失 L_t 的泰勒展开来估计参数的重要性，并汇总跨时间步的影响（方程式 7 的变体）。
通过对相对损失 L_t/L_max 的阈值化机制选择被剪枝的时间步来引入时间步感知剪枝（方程式 9/10）。
在部分时间步上累积梯度，以计算每个参数的鲁棒重要性分数（方程式 10）。
对预训练的扩散模型进行一次性剪枝，然后在目标数据集上进行微调。
在多个数据集（CIFAR-10、CelebA-HQ、LSUN、ImageNet-1K）上评估效率（参数、MACs）、质量（FID）和一致性（SSIM）。

Figure 1 : Diff-Pruning leverages Taylor expansion at pruned timesteps to estimate the importance of weights, where early steps focus on local details like edges and color and later ones pay more attention to contents such as object and shape. We propose a simple thresholding method to trade off the

实验结果

研究问题

RQ1结构化剪枝是否能够在不进行大量再训练的情况下，准确识别并移除扩散模型中的冗余成分？
RQ2剪枝时间步与剪枝权重如何影响扩散模型的内容生成与细节生成？
RQ3在不同数据集和模型类型（DDPMs、LDMs）下，剪枝比率、恢复成本与生成样本质量之间的权衡是什么？

主要发现

Diff-Pruning 实现显著压缩，约减少 50% 的 FLOPs，同时仅用原始训练成本的 10%–20%。
剪枝后的模型保持，甚至在某些情况下改善，与预训练模型相比的生成行为和样本一致性（例如在 LSUN Church 上，训练步数从 0.5M 提升到 4.4M）。
贡献内容的时间步并不仅仅在扩散过程的末端；剪枝需要按时间步重要性加权，以平衡内容与细节。
对所有时间步的完整泰勒展开可能积累噪声梯度；使用阈值化的部分泰勒展开可提高剪枝精度。
在 LSUN Church/Bedroom 和 ImageNet-1K-LDMs 上，剪枝模型在参数和 MACs 明显较基线更少的情况下，获得了有竞争力的 FID/SSIM。
Diff-Pruning 在 CIFAR-10 与 CelebA-HQ 上，始终优于随机、幅度和天真泰勒剪枝。

Figure 2 : Generated images of the pre-trained models [ 18 ] (left) and the pruned models (right) on LSUN Church and LSUN Bedroom. SSIM measures the similarity between generated images.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。