QUICK REVIEW

[논문 리뷰] Diffusion Models Beat GANs on Image Classification

Soumik Mukhopadhyay, Matthew Gwilliam|arXiv (Cornell University)|2023. 07. 17.

Generative Adversarial Networks and Image Synthesis인용 수 16

한 줄 요약

본 논문은 사전 학습된 확산 모델이 통합된 자기지도 표현으로 작용하여 강력한 이미지 분류 성능을 달성하고, 256x256 해상도에서 생성 및 분류 모두에서 BigBiGAN을 능가하며 경쟁력 있는 FGVC 전이학습을 가능하게 함을 보여준다.

ABSTRACT

While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which uses a single pre-training stage to address both families of tasks simultaneously. We identify diffusion models as a prime candidate. Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc. Such models involve training a U-Net to iteratively predict and remove noise, and the resulting model can synthesize high fidelity, diverse, novel images. The U-Net architecture, as a convolution-based architecture, generates a diverse set of feature representations in the form of intermediate feature maps. We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification. We explore optimal methods for extracting and using these embeddings for classification tasks, demonstrating promising results on the ImageNet classification task. We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods such as BigBiGAN for classification tasks. We investigate diffusion models in the transfer learning regime, examining their performance on several fine-grained visual classification datasets. We compare these embeddings to those generated by competing architectures and pre-trainings for classification tasks.

연구 동기 및 목표

식별적 및 생성적 작업을 모두 지원하는 통합된 비지도 표현 학습의 필요성을 제시한다.
확산 모델 임베딩이 높은 정확도의 이미지 분류에 충분히 식별적임을 입증한다.
분류를 위한 확산 특징의 효과적인 추출 및 풀링 전략을 탐구한다.
세부 세분화된 비주얼 분류(FGVC)에서 확산 파생 특징의 전이 학습 능력을 평가한다.
확산 계층과 시간 단계 전반에 걸친 표현을 특징화하고 CKA로 다른 사전 학습 방법과 비교한다.

제안 방법

사전 학습된 조건 없는 안내 확산 모델(ADM U-Net, 256x256)을 사용하여 중간 블록과 확산 시간 단계에서 특징을 추출한다.
노이즈가 섞인 x_t를 U-Net에 통과시킬 때 블록 b 이후의 활성화를 특징 추출 f_theta(x0,t,b)로 정의한다.
선형 탐색, MLP/CNN/어텐션 헤드, 다양한 풀링 전략을 평가하여 특징 맵을 분류용 벡터로 변환한다.
ImageNet-1k에서 정확도와 FID 측면에서 확산 기반 분류기를 BigBiGAN 및 MAGE 기준선과 비교한다.
FGVC 데이터세트로의 전이 가능성을 평가하고 중심 커널 정렬(CKA)로 표현을 분석한다.
시간 단계 t, 블록 인덱스 b, 풀링 크기에 대한 제거 실험을 수행하여 최적의 특징 추출 설정을 식별한다.

실험 결과

연구 질문

RQ1확산 모델 임베딩을 확산 모델을 미세 조정하지 않고 판별적 이미지 분류에 재목적화할 수 있는가?
RQ2확산 특징을 풀링하고 분류하는 최적의 방법은 무엇인가(선형, MLP, CNN, 어텐션 헤드)?
RQ3확산 파생 표현이 ImageNet 및 FGVC 과제에서 GAN 기반 및 자기지도 기준선과 어떻게 비교되는가?
RQ4확산 특징이 세부 분류 데이터에 잘 전이되는가, 태스크 간 하이퍼파라미터에 얼마나 민감한가?
RQ5CKA로 측정했을 때 확산 표현은 계층과 확산 시간 단계에서 어떻게 변화하는가?

주요 결과

방법	정확도	FID
BigBiGAN*	60.8%	28.54
MAGE	78.9%	9.10
U-Net Encoder	64.32%	n/a
GD (L, pool 1x1)	61.95%	26.21
GD (L, pool 2x2)	64.96%	26.21
GD (Attention)	71.89%	26.21

확산 모델은 고정된 특징을 사용한 주의 기반 헤드(b=24, t=90)로 ImageNet-1k에서 61.95% 정확도를 달성하여 분류에서 BigBiGAN을 능가한다.
GD 변형은 동일한 FID 26.21에서 64.96%(L, pool 2x2) 및 71.89%(Attention) 정확도를 달성하여 여러 통합 기준선을 능가한다.
확산 특징으로 선형 탐색은 ImageNet-1k에서 61.86%(b=24, t=150) 정확도와 Attention 헤드를 제공하며, 안정적인 확산 특징도 분류를 지원한다.
FGVC 데이터세트에서 확산 특징은 경쟁력 있는 성능을 보이며, Aircraft는 여러 헤드에서 SimCLR 기반 기준선을 자주 능가하지만 다른 경우에는 차이가 남는다.
CKA 분석은 초기 층이 모델 간 더 큰 유사성을 공유하는 반면 병목층이 ResNet/ViT와 유사한 식별 가능한 표현을 생성함을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.