QUICK REVIEW

[논문 리뷰] Structural Pruning for Diffusion Models

Gongfan Fang, Xinyin Ma|arXiv (Cornell University)|2023. 05. 18.

Music and Audio Processing인용 수 17

한 줄 요약

Diff-Pruning은 Taylor 확장 기반의 구조적 가지치기 방법으로, 시점(timesteps)과 가중치를 가지치기하여 사전 학습된 확산 모델을 압축하고, 원래 학습 비용의 10–20%에 불과하면서 약 50% FLOPs 감소를 달성하고 생성 동작을 유지합니다.

ABSTRACT

Generative modeling has recently undergone remarkable advancements, primarily propelled by the transformative implications of Diffusion Probabilistic Models (DPMs). The impressive capability of these models, however, often entails significant computational overhead during both training and inference. To tackle this challenge, we present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones, without the need for extensive re-training. The essence of Diff-Pruning is encapsulated in a Taylor expansion over pruned timesteps, a process that disregards non-contributory diffusion steps and ensembles informative gradients to identify important weights. Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method: 1) Efficiency: it enables approximately a 50\% reduction in FLOPs at a mere 10\% to 20\% of the original training expenditure; 2) Consistency: the pruned diffusion models inherently preserve generative behavior congruent with their pre-trained models. Code is available at \url{https://github.com/VainF/Diff-Pruning}.

연구 동기 및 목표

확산 확률 모델(DPMs)의 학습 및 추론 오버헤드를 줄이기 위한 압축 필요성에 대한 동기 부여.
확산 모델에 특화된 가지치기 방법(Diff-Pruning) 제안.
중요한 가중치와 가지치기된 시점을 식별하기 위한 Taylor 확장 기반 기준 개발.
다양한 데이터셋에서 가지치기가 생성 품질과 일관성을 보존하거나 개선할 수 있음을 입증.

제안 방법

무작위 희소 매트릭스가 아닌 전체 가중치 서브 구조를 제거하여 희소화된 매개변수 행렬을 얻는 모델 가지치기.
per-timestep 손실 L_t의 Taylor 확장을 사용하여 매개변수의 중요성과 timesteps 간의 영향을 추정(Equations 7 variant).
상대 손실 L_t/L_max에 대한 임계값 메커니즘으로 가지치기된 timestep를 선택하여 timestep 인지 가지치기 도입(Equation 9/10).
부분 timestep에 걸친 그래디언트를 누적하여 각 매개변수의 강건한 중요도 점수를 계산(Equation 10).
사전 학습된 확산 모델에 대해 원샷 가지치기를 적용한 뒤 대상 데이터셋에서 파인튜닝.
다수의 데이터셋(CIFAR-10, CelebA-HQ, LSUN, ImageNet-1K)에서 매개변수 수, MACs, FID, 일관성(SSIM)을 평가.

Figure 1 : Diff-Pruning leverages Taylor expansion at pruned timesteps to estimate the importance of weights, where early steps focus on local details like edges and color and later ones pay more attention to contents such as object and shape. We propose a simple thresholding method to trade off the

실험 결과

연구 질문

RQ1구조적 가지치기가 재학습 없이도 확산 모델의 중복 구성요소를 정확히 식별하고 제거할 수 있는가?
RQ2timestep 가지치기가 콘텐츠 생성 대 세부사항 생성에 미치는 영향은 무엇인가?
RQ3가지치기 비율, 회복 노력, 생성 샘플 품질 간의 트레이드오프가 데이터셋 및 모델 유형(DDPMs, LDMs)에서 어떻게 나타나는가?

주요 결과

Diff-Pruning은 원래 학습 비용의 약 10%–20%만 사용하면서 약 50% FLOPs 감소로 상당한 압축을 달성한다.
가지치기된 모델은 사전 학습된 모델의 생성 동작과 샘플 일관성을 유지하거나 개선하는 경향이 있으며(예: LSUN Church의 경우 0.5M 대 4.4M 학습 스텝),
콘텐츠에 기여하는 시점은 확산의 끝에만 국한되지 않으며, 콘텐츠와 디테일의 균형을 맞추기 위해 시점 중요도에 따른 가중치가 필요하다.
전체 시점에 걸친 Taylor 확장은 노이즈가 많은 그래디언트를 누적할 수 있으며, 임계값이 있는 부분적인 Taylor 확장을 사용하면 가지치기 정확도가 향상된다.
LSUN Church/Bedroom 및 ImageNet-1K-LDM에서 가지치기된 모델은 비교적 낮은 매개변수 및 MACs로도 강력한 FID/SSIM을 달성한다.
Diff-Pruning은 CIFAR-10 및 CelebA-HQ에서 무작위, 크기, 순진한 Taylor 가지치기보다 일관되게 우수한 성능을 보인다.

Figure 2 : Generated images of the pre-trained models [ 18 ] (left) and the pruned models (right) on LSUN Church and LSUN Bedroom. SSIM measures the similarity between generated images.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.