QUICK REVIEW

[论文解读] Timestep-Aware Block Masking for Efficient Diffusion Model Inference

Haodong He, Yuan Gao|arXiv (Cornell University)|Mar 20, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

引入逐时间步的学习二值掩模以跳过扩散模型的计算，在 DDPM、LDM、DiT 和 PixArt 上实现更快的采样，同时保持极小的质量损失。掩模以逐时间步端到端训练，结合特征保真、稀疏性和双峰正则化，采用逐时间步感知的损失缩放与知识引导的纠正。

ABSTRACT

Diffusion Probabilistic Models (DPMs) have achieved great success in image generation but suffer from high inference latency due to their iterative denoising nature. Motivated by the evolving feature dynamics across the denoising trajectory, we propose a novel framework to optimize the computational graph of pre-trained DPMs on a per-timestep basis. By learning timestep-specific masks, our method dynamically determines which blocks to execute or bypass through feature reuse at each inference stage. Unlike global optimization methods that incur prohibitive memory costs via full-chain backpropagation, our method optimizes masks for each timestep independently, ensuring a memory-efficient training process. To guide this process, we introduce a timestep-aware loss scaling mechanism that prioritizes feature fidelity during sensitive denoising phases, complemented by a knowledge-guided mask rectification strategy to prune redundant spatial-temporal dependencies. Our approach is architecture-agnostic and demonstrates significant efficiency gains across a broad spectrum of models, including DDPM, LDM, DiT, and PixArt. Experimental results show that by treating the denoising process as a sequence of optimized computational paths, our method achieves a superior balance between sampling speed and generative quality. Our code will be released.

研究动机与目标

通过利用逐时间步特征动态的稳定性来降低扩散模型的推理成本。
提出逐时间步二值掩模框架，在不重新训练基础模型的情况下跳过或复用块计算。
通过对每个时间步独立优化掩模实现内存高效训练。
结合逐时间步感知的损失缩放和知识引导的掩模纠正以保持生成质量。

提出的方法

训练一个 T×B 二值掩模 m，其中 t 表示时间步，b 表示网络块，用以决定是否计算一个块还是复用其缓存特征。
冻结扩散模型参数，使用端到端训练与特征保真损失对每个时间步的掩模进行优化，以匹配原始模型。
对 m 在 [0,1] 的连续松弛量 s 使用 L1 稀疏性和双峰正则化以促使 s 的二值化。
引入基于特征变化 delta[t] 的逐时间步损失权重，以在敏感的去噪阶段优先保真。
应用知识引导的后处理规则，通过跨块和跨时间步传播依赖关系来纠正掩模以进一步加速推理。
提供一种与架构无关的方法，适用于 UNet 风格的 CNN 和扩散变换器（如 DiT 中的 MHA/MLP、U-Net 的 ResBlock/AttnBlock）。

实验结果

研究问题

RQ1逐时间步掩模是否能在不重新训练原始模型的情况下加速扩散模型推理？
RQ2如何高效训练这些掩模以在实现高速度的同时保持生成质量？
RQ3逐时间步感知的损失和掩模纠正对在最大化加速的同时维持保真度起到何种作用？
RQ4该方法是否在不同扩散结构（DDPM、LDM、DiT、PixArt）和数据集上具有普遍性？

主要发现

Method	Extra Data	Training Time↓	MACs↓	Speed↑	FID↓
DDPM [13]	–	0.61T	1×	1.00	4.19
DDPM*	–	0.61T	1×	1.00	4.25
Diff-Pruning [7]	✓	0.34T	1.37×	1.37	5.29
CT [42] *	✓	–	–	1.62×	4.68
DeepCache [30]	✗	0.35T	1.61×	1.61	4.70
Ours	✗	0.2h	0.34T	1.63×	4.66

在多种架构上实现了有意义的加速：在 DDPM 的 CIFAR-10 上达到 1.63×，在 LSUN 变体上达到 1.31×–1.63×，在 ImageNet 的 LDM-4-G 上达到 2.75×，且 FID/IS 指标具有竞争力。
在 CIFAR-10、LSUN-Bedroom、LSUN-Churches 上，我们的方法在保持或略有提升的 FID 指标的同时，速度比 baselines（DeepCache、Diff-Pruning、CT）更快。
对于 ImageNet 的 DiT-XL/2，在 256×256 时精度与 L2C 相当或更好，同时提供更快的加速（1.67×，512×512 情况下具有竞争力）。
消融研究表明随机掩模采样在此设定下优于 Gumbel-Softmax；掩模纠正和逐时间步损失缩放显著提升加速比且质量损失极小。
经过训练的掩模在值分布上趋向于接近 0 或 1，表明在块跳过时具有稳定而果断的决策。
掩模训练仅需要高斯噪声作为输入且不需修改预训练模型权重。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。