[論文レビュー] TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation
TRACT significantly improves single- and few-step diffusion sampling by distilling through a transitive closure approach, achieving state-of-the-art FID scores for 1-step DDIM on CIFAR-10 and 64×64 ImageNet without changing architecture.
Denoising Diffusion models have demonstrated their proficiency for generative sampling. However, generating good samples often requires many iterations. Consequently, techniques such as binary time-distillation (BTD) have been proposed to reduce the number of network calls for a fixed architecture. In this paper, we introduce TRAnsitive Closure Time-distillation (TRACT), a new method that extends BTD. For single step diffusion,TRACT improves FID by up to 2.4x on the same architecture, and achieves new single-step Denoising Diffusion Implicit Models (DDIM) state-of-the-art FID (7.4 for ImageNet64, 3.8 for CIFAR10). Finally we tease apart the method through extended ablations. The PyTorch implementation will be released soon.
研究の動機と目的
- Motivate and reduce inference cost of diffusion models by enabling single- or few-step sampling without architecture changes.
- Identify limitations of binary time-distillation (BTD) such as objective degeneracy and SWA incompatibility.
- Propose TRACT to distill outputs across time steps via transitive closure with self-teaching to maintain quality with few phases.
- Show that TRACT achieves state-of-the-art or competitive FID with 1-2 steps on CIFAR-10 and 64×64 ImageNet and analyze ablations.
提案手法
- Extend binary time-distillation (BTD) to Transitive Closure Time-Distillation (TRACT) reducing distillation phases from log2(T) to a small constant (1–2).
- Train a student to distill the teacher’s inference from t to t' where t' < t using a self-teacher EMA to perform transitive closure (equations 6–9).
- Use a self-teaching EMA of the student weights to generate targets for multi-step jumps (Algorithm 1).
- Adapt TRACT to VE/EDM settings with RK and DDIM-VE teachers and derive corresponding targets and losses (equations 11–15).
- Mitigate objective degeneracy by limiting distillation phases and leveraging self-teaching with EMA and inference-time EMA.
- Provide training details including group-based distillation, loss weighting, and EMA updates (Appendix references).

実験結果
リサーチクエスチョン
- RQ1Can TRACT achieve high-quality samples with 1–2 inference steps without architectural changes?
- RQ2Does reducing the number of distillation phases mitigate objective degeneracy and enable effective SWA?
- RQ3How does TRACT perform with VE/EDM teachers and alternative samplers (RK, DDIM-VE) across CIFAR-10 and 64×64 ImageNet?
主な発見
| 方法 | NFEs | FID | パラメータ |
|---|---|---|---|
| TRACT-EDM-256M ∗ | 1 | 3.78 ± 0.01 | 56M |
| TRACT-96M ∗ | 1 | 4.17 ± 0.03 | 56M |
| TRACT-256M | 1 | 4.45 ± 0.05 | 60M |
| BTD-96M [44] | - | 9.12 | 60M |
| TRACT-96M | 2 | 3.32 ± 0.02 | 60M |
| TRACT-EDM-256M ∗ | 2 | 3.55 ± 0.01 | 56M |
| TRACT-EDM-96M ∗ | 2 | 3.75 ± 0.02 | 56M |
| BTD-96M [44] | - | 4.51 | 60M |
| TRACT-96M | 1 | 7.43 ± 0.07 | 296M |
| TRACT-EDM-96M ∗ | 1 | 7.52 ± 0.05 | 296M |
- 1-step TRACT improves CIFAR-10 FID from 9.1 (BTD) to 4.5 on CIFAR-10 with the same architecture and budget for 1-step setups.
- 1-step TRACT achieves 7.4 FID on 64×64 ImageNet with 1-step sampling using EDM teachers, improving over BTD baselines.
- 2-step TRACT reaches 3.32 FID on CIFAR-10 with 32-step teacher distilled to 1-step, and 7.43 FID on 64×64 ImageNet with single-step distillation.
- .TRACT-EDM-256M achieves 3.78±0.01 FID with 1 NFE on CIFAR-10; TRACT-EDM-96M achieves 3.75±0.02 FID with 1 NFE on CIFAR-10 (Tables 1 and related text).
- On 64×64 ImageNet, TRACT-96M achieves 7.43±0.07 FID with 1 NFE and TRACT-EDM-96M achieves 7.52±0.05 with the same setup (Tables 2 and related text).
- Ablations show best performance with a 2-phase schedule (1024→32→1) and EMA-based self-teaching; more phases degrade performance due to objective degeneracy.

より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。