QUICK REVIEW

[論文レビュー] Timestep-Aware Block Masking for Efficient Diffusion Model Inference

Haodong He, Yuan Gao|arXiv (Cornell University)|Mar 20, 2026

Generative Adversarial Networks and Image Synthesis被引用数 0

ひとこと要約

diffusionモデルの各タイムステップで学習済みの二値マスクを導入し、計算をスキップすることでDDPM、LDM、DiT、PixArt全体でサンプリングを高速化しつつ品質低下を最小限に抑える。マスクはタイムステップごとにエンドツーエンドで訓練され、特徴忠実性・スパーシティ・双峰正則化とタイムステップ認識付き損失スケーリングおよび知識誘導整 Rectificationにより最適化される。

ABSTRACT

Diffusion Probabilistic Models (DPMs) have achieved great success in image generation but suffer from high inference latency due to their iterative denoising nature. Motivated by the evolving feature dynamics across the denoising trajectory, we propose a novel framework to optimize the computational graph of pre-trained DPMs on a per-timestep basis. By learning timestep-specific masks, our method dynamically determines which blocks to execute or bypass through feature reuse at each inference stage. Unlike global optimization methods that incur prohibitive memory costs via full-chain backpropagation, our method optimizes masks for each timestep independently, ensuring a memory-efficient training process. To guide this process, we introduce a timestep-aware loss scaling mechanism that prioritizes feature fidelity during sensitive denoising phases, complemented by a knowledge-guided mask rectification strategy to prune redundant spatial-temporal dependencies. Our approach is architecture-agnostic and demonstrates significant efficiency gains across a broad spectrum of models, including DDPM, LDM, DiT, and PixArt. Experimental results show that by treating the denoising process as a sequence of optimized computational paths, our method achieves a superior balance between sampling speed and generative quality. Our code will be released.

研究の動機と目的

タイムステップ間で安定した特徴ダイナミクスを活用して拡散モデルの推論コストを削減する動機づけ。
基盤モデルを再訓練することなく、ブロック計算をスキップまたは再利用するタイムステップごとの二値マスク枠組みを提案。
各タイムステップでマスクを個別に最適化してメモリ効率の良い訓練を実現。
タイムステップ認識損失スケーリングと知識誘導マスク整 Rectificationを組み込み、生成品質を維持する。

提案手法

t×B の二値マスク m を訓練し、t はタイムステップ、b はネットワークブロックを表し、ブロックを計算するかキャッシュされた特徴を再利用するかを決定する。
拡散モデルのパラメータを凍結し、元のモデルと一致する特徴忠実性損失を用いたエンドツーエンド訓練でタイムステップごとにマスクを最適化。
m に対して連続緩和 s ∈ [0,1] を用い、L1 稀疎性とビーモーダル正則化で s の2値化を促進。
delta[t] に基づく特徴変動に応じたタイムステップ認識損失重み付けを導入し、敏感なデノイズ段階で忠実性を優先。
ブロック間およびタイムステップ間の依存関係を伝播させてマスクを修正する知識誘導後処理ルールを適用し、推論をさらに加速。
UNet様のCNNや拡散トランスフォーマー（DiT の MHA/MLP、U-Netの ResBlock/AttnBlock など）にも適用可能なアーキテクチャ非依存アプローチを提供。

実験結果

リサーチクエスチョン

RQ1タイムステップごとのマスキングは元のモデルを再訓練せずに拡散モデルの推論を高速化できるか。
RQ2品質を保ちつつ大幅な速度アップを達成するために、マスクを効率的に訓練する方法は。
RQ3タイムステップ認識損失とマスク整 Rectification が忠実性維持と加速の最大化の役割をどう果たすか。
RQ4本手法はDDPM、LDM、DiT、PixArtなどの異なる拡散アーキテクチャやデータセットで普遍的か。

主な発見

Method	Extra Data	Training Time↓	MACs↓	Speed↑	FID↓
DDPM [13]	–	0.61T	1×	1.00	4.19
DDPM*	–	0.61T	1×	1.00	4.25
Diff-Pruning [7]	✓	0.34T	1.37×	1.37	5.29
CT [42] *	✓	–	–	1.62×	4.68
DeepCache [30]	✗	0.35T	1.61×	1.61	4.70
Ours	✗	0.2h	0.34T	1.63×	4.66

複数のアーキテクチャで有意義な速度アップを達成：DDPMで CIFAR-10 に対して 1.63×、LSUN 系列で 1.31×–1.63×、ImageNet で LDM-4-G で 2.75×、FID/IS 指標も競争力を有する。
CIFAR-10、LSUN-Bedroom、LSUN-Churches では、ベースライン（DeepCache、Diff-Pruning、CT）と比較して速度向上を一貫して達成しつつ、FIDは維持またはわずかに改善。
ImageNet の DiT-XL/2 では L2C と同等以上の精度を保ちつつ、加速はより速く（256×256 で 1.67×、512×512 で競争的）。
アブレーションにより、ランダムマスクサンプリングはこの設定で Gumbel-Softmax より優れており、マスク整 Rectificationとタイムステップ認識損失スケーリングが速度アップを大幅に高め、品質低下は最小。
訓練済みモデルのブロック選択は 0 または 1 に近いマスク値へ収束しており、安定した決定的なブロックスキップを示す。
マスク訓練には入力としてガウシアンノイズのみが必要で、事前学習済みモデルの重みを変更しない。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。