QUICK REVIEW

[논문 리뷰] Revisiting Discriminative vs. Generative Classifiers: Theory and Implications

Chenyu Zheng, Guoqiang Wu|arXiv (Cornell University)|2023. 02. 05.

Generative Adversarial Networks and Image Synthesis인용 수 8

한 줄 요약

본 논문은 심층 표현 학습에서 다계 분류의 판별형(discriminative) 대 생성형(generative) 선형 분류기에 대한 차이를 분석하고, Naïve Bayes가 필요로 하는 샘플 수가 최소한 O(log n)일 수 있음을 증명하는 한편 로지스틱 회귀는 O(n)가 필요함을 보여준다; 다계 H-consistency 프레임워크를 개발하고 그 결과를 실증적으로 검증한다.

ABSTRACT

A large-scale deep model pre-trained on massive labeled or unlabeled data transfers well to downstream tasks. Linear evaluation freezes parameters in the pre-trained model and trains a linear classifier separately, which is efficient and attractive for transfer. However, little work has investigated the classifier in linear evaluation except for the default logistic regression. Inspired by the statistical efficiency of naive Bayes, the paper revisits the classical topic on discriminative vs. generative classifiers. Theoretically, the paper considers the surrogate loss instead of the zero-one loss in analyses and generalizes the classical results from binary cases to multiclass ones. We show that, under mild assumptions, multiclass naive Bayes requires $O(\log n)$ samples to approach its asymptotic error while the corresponding multiclass logistic regression requires $O(n)$ samples, where $n$ is the feature dimension. To establish it, we present a multiclass $\mathcal{H}$-consistency bound framework and an explicit bound for logistic loss, which are of independent interests. Simulation results on a mixture of Gaussian validate our theoretical findings. Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases. Besides, naive Bayes shows promise in few-shot cases and we observe the "two regimes" phenomenon in pre-trained supervised models. Our code is available at https://github.com/ML-GSAI/Revisiting-Dis-vs-Gen-Classifiers.

연구 동기 및 목표

심층 선형 평가 맥락에서 고전적인 discriminative vs. generative 분류기 비교를 재조명한다.
Ng & Jordan (2001) 결과를 이진 설정에서 다계 설정으로 일반화한다.
다계 H-consistency 프레임워크를 도입하고 로지스틱 손실에 대한 명시적 경계를 도출한다.
합성 혼합물과 사전 학습된 심층 비전 모델을 다양한 데이터셋에서 이론적 결과를 실증적으로 검증한다.

제안 방법

대체 손실을 영-하나 손실(zero-one loss)로 연결하는 다계 H-consistency 경계 프레임워크를 개발한다.
로지스틱 손실과 영-하나 손실에 대한 명시적 다계 경계(정리 3.3)를 도출한다.
샘플 복잡도 분석: Naïve Bayes는 O(log n) 샘플이 필요하고 로지스틱 회귀는 O(n) 샘플이 필요하다(정리 3.2 및 3.4).
훈련 샘플 효과를 경계하기 위한 쌍 활성화(pair activation) 및 오분류 간격(misclassification-gap) 구성(예: Δa_Gen, G̃(τ))을 정의한다.
다소의 분포 가정과 집중 도구를 활용하여 추정 차이를 한정한다.
가우시안 혼합물 시뮬레이션과 CIFAR-10/100의 딥모델 실험으로 이론을 검증한다.

실험 결과

연구 질문

RQ1다계 Naïve Bayes와 다계 로지스틱 회귀가 심층 표현에서 대체 손실 하에서 상대적으로 어떤 샘플 효율성을 보이는가?
RQ2H-consistency 경계를 다계 설정으로 확장하고 로지스틱 손실에 대한 명시적 경계를 얻을 수 있는가?
RQ3딥 표현은 판별형 vs 생성형 분류기 간의 두-레짐(two-regime) 현상을 보이고, 사전 학습 모드가 그것에 어떻게 영향을 미치는가?
RQ4다양한 사전 학습 백본으로 CIFAR-10/100에서 선형 평가 설정에서 이러한 이론적 결과가 어떻게 나타나는가?

주요 결과

다계 Naïve Bayes는 O(log n) 샘플로 전향적 오차(asymptotic error)에 수렴하는 반면 다계 로지스틱 회귀는 O(n) 샘플이 필요하다.
다계 H-consistency 프레임워크와 로지스틱 손실에 대한 명시적 경계가 확립되어 분포에 독립적인 영-하나 손실 제어를 가능하게 한다.
가우시안 혼합물 시뮬레이션이 이론적 샘플 복잡도 결과를 검증한다.
여러 사전 학습 비전 모델을 사용한 CIFAR-10/100의 실험에서 Naïve Bayes가 데이터가 커질수록 더 빠르게 수렴하는 경향을 보이고, 감독 학습 사전 학습 모델에서 두 가지 레짐 현상이 관찰된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.