QUICK REVIEW

[論文レビュー] C$^2$FG: Control Classifier-Free Guidance via Score Discrepancy Analysis

Jiayang Gao, Tianyi Zheng|arXiv (Cornell University)|Mar 9, 2026

Model Reduction and Neural Networks被引用数 0

ひとこと要約

The paper theoretically analyzes classifier-free guidance (CFG), proves bounds on score discrepancy between conditional and unconditional distributions, and introduces C2FG—a training-free, time-dependent, exponential-decay guidance that improves diffusion-model conditional generation across tasks and backbones.

ABSTRACT

Classifier-Free Guidance (CFG) is a cornerstone of modern conditional diffusion models, yet its reliance on the fixed or heuristic dynamic guidance weight is predominantly empirical and overlooks the inherent dynamics of the diffusion process. In this paper, we provide a rigorous theoretical analysis of the Classifier-Free Guidance. Specifically, we establish strict upper bounds on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process. This finding explains the limitations of fixed-weight strategies and establishes a principled foundation for time-dependent guidance. Motivated by this insight, we introduce extbf{Control Classifier-Free Guidance (C$^2$FG)}, a novel, training-free, and plug-in method that aligns the guidance strength with the diffusion dynamics via an exponential decay control function. Extensive experiments demonstrate that C$^2$FG is effective and broadly applicable across diverse generative tasks, while also exhibiting orthogonality to existing strategies.

研究の動機と目的

Motivate why fixed CFG weights are sub-optimal due to time-varying conditional-unconditional score discrepancies in diffusion processes.
Provide rigorous upper bounds on score discrepancies for VP-SDE and VE-SDE to justify a time-dependent guidance strategy.
Propose C2FG, a training-free method with an exponential decay guidance function that aligns with diffusion dynamics.
Demonstrate C2FG’s effectiveness across diverse diffusion backbones (e.g., Stable Diffusion, EDM2, U-ViT, DiT, SiT) and tasks (image and text-to-image).
Show orthogonality and compatibility of C2FG with existing CFG enhancements (e.g., interval guidance) and samplers (SDE/ODE).

提案手法

Theoretical bounds on the difference between conditional and unconditional scores for VP-SDE and VE-SDE (Theorems 1 and 2) showing an exponential decay of discrepancy over time.
Interpretation of score discrepancy and PDF evolution via Harnack-type inequalities (Theorems 3 and 4) to motivate a time-varying guidance weight.
Design of Control Classifier-Free Guidance (C2FG): replace fixed CFG weight with an exponential decay control function ω(t) = ω0 exp(λ(1 - t/tmax)).
Integration of C2FG into sampling as: ε̂_c^ω(x_t) = ε̂_∅(x_t) + ω(t)[ε̂_c(x_t) − ε̂_∅(x_t)].
Theoretical interpretation of interval guidance as a special case within the C2FG framework, enabling flexible combinations with existing strategies.
Training-free implementation that works with multiple backbones and samplers (SDE/ODE).

実験結果

リサーチクエスチョン

RQ1Can the score discrepancy between conditional and unconditional distributions in CFG be bounded rigorously across VP-SDE and VE-SDE?
RQ2How does the conditional-unconditional score discrepancy evolve over the diffusion timeline, and can this guide a principled time-dependent guidance strategy?
RQ3Can a training-free, time-dependent guidance weight improve conditional generation quality across diverse diffusion architectures and tasks?
RQ4Is the proposed C2FG mechanism orthogonal to and compatible with existing CFG enhancements (e.g., interval guidance) and different samplers?
RQ5Do empirical results corroborate the theoretical bounds and demonstrate state-of-the-art performance improvements?

主な発見

Model	FID	IS	sFID	Prec	Rec
blackDiT-XL/2 (ω=1.5, ODE sampler)	2.29	276.8	4.6	0.83	0.57
DiT-XL/2 (Rectified Diffusion, ω=1.5, ODE)	2.13	/	/	0.83	0.58
DiT-XL/2 + Ours (ω0=1,λ=ln 2, ODE)	2.07	291.5	4.6	0.83	0.59
gray SiT-XL/2 (REPA)(ω=1.35, SDE)	1.80	284.0	4.5	0.81	0.61
SiT-XL/2 (REPA) + Ours (ω0=1,λ=1, SDE)	1.51	315.0	4.6	0.80	0.62
gray SiT-XL/2 (REPA, Interval)(ω=1.8, tl=0, th=0.7, SDE)	1.42	305.7	4.7	0.80	0.65
SiT-XL/2 (REPA, Interval) + Ours (ω0=1.8, λ=0.03, SDE)	1.41	308.0	4.7	0.80	0.65
gray SiT-XL/2 (REPA)(ω=1.8, ODE)	3.64	366.0	4.9	0.86	0.54
SiT-XL/2 (REPA)+Ours (ω0=1.7, λ=0.15, ODE)	3.40	364.2	4.7	0.86	0.55
gray SiT-XL/2 (REPA, Interval)(ω=1.8, tl=0, th=0.7, ODE)	1.56	283.1	4.6	0.78	0.66
SiT-XL/2 (REPA, Interval) + Ours (ω0=1.8, λ=0.03, ODE)	1.54	286.0	4.6	0.78	0.66
gray SiT-XL/2 (REPA)(ω=1.8, ODE)	3.64	366.0	4.9	0.86	0.54
SiT-XL/2 (REPA, Interval) + Ours (ω0=1.7, λ=0.15, ODE)	3.40	364.2	4.7	0.86	0.55
gray SiT-XL/2 (REPA, Interval)(ω=1.8, tl=0, th=0.7, ODE)	1.56	283.1	4.6	0.78	0.66
SiT-XL/2 (REPA, Interval) + Ours (ω0=1.8, λ=0.03, ODE)	1.54	286.0	4.6	0.78	0.66

The score discrepancy between conditional and unconditional outputs decays exponentially over time in the reparameterized diffusion time, justifying a time-dependent guidance weight.
Theoretical bounds (Theorems 1–2) hold for VP-SDE and VE-SDE, explaining limitations of fixed CFG weights.
Harnack-type PDF inequalities (Theorems 3–4) provide complementary insights into density evolution, supporting exponential guidance schedules.
C2FG, a training-free exponential-decay guidance function, integrates smoothly into various sampling schemes and backbones, improving FID/IS across tasks.
Empirical results show C2FG yielding systematic gains across Diffusion architectures (DiT, SiT, U-ViT, EDM2, Stable Diffusion) and datasets (ImageNet, MS-COCO), often outperforming strong baselines and remaining orthogonal to interval guidance.
C2FG remains effective with ODE/SDE samplers and high-resolution settings, indicating robustness and generality.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。