QUICK REVIEW

[논문 리뷰] Improved Techniques for Training Consistency Models

Yang Song, Prafulla Dhariwal|arXiv (Cornell University)|2023. 10. 22.

Generative Adversarial Networks and Image Synthesis인용 수 12

한 줄 요약

이 논문은 교사에서 EMA를 제거하고 LPIPS 대신 Pseudo-Huber 손실을 사용하며 로그정규 노이즈 스케줄과 이산화 커리큘럼을 도입해 디스틸레이션 없이 한 단계 및 두 단계 샘플링에서 상태 유지형 FID 향상을 달성합니다.

ABSTRACT

Consistency models are a nascent family of generative models that can sample high quality data in one step without the need for adversarial training. Current consistency models achieve optimal sample quality by distilling from pre-trained diffusion models and employing learned metrics such as LPIPS. However, distillation limits the quality of consistency models to that of the pre-trained diffusion model, and LPIPS causes undesirable bias in evaluation. To tackle these challenges, we present improved techniques for consistency training, where consistency models learn directly from data without distillation. We delve into the theory behind consistency training and identify a previously overlooked flaw, which we address by eliminating Exponential Moving Average from the teacher consistency model. To replace learned metrics like LPIPS, we adopt Pseudo-Huber losses from robust statistics. Additionally, we introduce a lognormal noise schedule for the consistency training objective, and propose to double total discretization steps every set number of training iterations. Combined with better hyperparameter tuning, these modifications enable consistency models to achieve FID scores of 2.51 and 3.25 on CIFAR-10 and ImageNet $64 imes 64$ respectively in a single sampling step. These scores mark a 3.5$ imes$ and 4$ imes$ improvement compared to prior consistency training approaches. Through two-step sampling, we further reduce FID scores to 2.24 and 2.77 on these two datasets, surpassing those obtained via distillation in both one-step and two-step settings, while narrowing the gap between consistency models and other state-of-the-art generative models.

연구 동기 및 목표

Pre-trained diffusion models에 의존하지 않고 consistency distillation (CD)와 맞먹거나 이를 능가하도록 CT를 개선하라.
학습 및 평가에서 LPIPS와 같은 학습된 지표의 편향과 의존성을 제거하라.
한 단계 샘플링 품질을 높이되 확장 가능성을 유지하는 훈련 커리큘럼과 노이즈 스케줄을 개발하라.
CT에 대한 깊은 이론적 이해를 제공하고 이전 분석에서 간과된 약점을 다루라.

제안 방법

이전 CT 분석의 결함을 바로잡기 위해 교사 네트워크에서 EMA를 제거한다.
LPIPS를 Pseudo-Huber 손실로 대체하여 견고하고 지표 없는 학습 목표를 제시한다.
CT objectives를 위해 로그정규 분포 기반의 노이즈 스케줄을 도입한다.
학습 중 간격마다 전체 이산화 단계를 두 배로 늘리는 커리큘럼을 채택하여 데이터 효율성을 높인다.
이러한 변화로 CIFAR-10 및 ImageNet 64x64에서 CT가 한 단계 및 두 단계 샘플링에서 CD를 능가하는 것을 보여준다.

실험 결과

연구 질문

RQ1일관성 훈련(CT)이 확산 모델 사전 학습 없이도 일관성 디스틸레이션(CD)보다 더 높은 샘플 품질을 달성할 수 있는가?
RQ2교사 네트워크에서 EMA를 제거하면 CT 분석의 이론적 결함을 해결하고 실용적 성능을 개선하는가?
RQ3Pseudo-Huber와 로그정규 노이즈 스케줄 같은 견고한 손실이 LPIPS 같은 학습된 지표 없이 CT를 개선하는가?
RQ4이산화 단계 스케줄과 훈련 커리큘럼이 한 단계 및 다단계 샘플링 품질에 어떤 영향을 미치는가?

주요 결과

제안된 변화로 CT는 한 샘플링 단계에서 CIFAR-10에서 2.51, ImageNet 64x64에서 3.25의 FID를 달성한다.
두 단계 샘플링은 CIFAR-10에서 2.24, ImageNet 64x64에서 2.77로 FID를 개선하며 한 단계와 두 단계 설정 모두에서 CD를 능가한다.
LPIPS를 Pseudo-Huber 손실로 대체하면 학습된 지표에 대한 의존성과 평가 편향이 제거된다.
로그정규 노이즈 스케줄과 이산화 단계 커리큘럼은 샘플 품질과 학습 효율을 크게 향상시킨다.
이러한 개선을 갖춘 CT 방법은 기존 CT를 능가하고 디스틸레이션 없이 최상위 확산 모델 및 GAN과 어깨를 나란히 한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.