QUICK REVIEW

[论文解读] Revisiting Discriminative vs. Generative Classifiers: Theory and Implications

Chenyu Zheng, Guoqiang Wu|arXiv (Cornell University)|Feb 5, 2023

Generative Adversarial Networks and Image Synthesis被引用 8

一句话总结

该论文在深度表征学习中分析多类别判别式与生成式线性分类器，证明朴素贝叶斯在样本量仅需 O(log n) 的情况下也可能达到较低误差，而逻辑回归需要 O(n)；它提出一个多类 H-consistency 框架并在经验上验证发现。

ABSTRACT

A large-scale deep model pre-trained on massive labeled or unlabeled data transfers well to downstream tasks. Linear evaluation freezes parameters in the pre-trained model and trains a linear classifier separately, which is efficient and attractive for transfer. However, little work has investigated the classifier in linear evaluation except for the default logistic regression. Inspired by the statistical efficiency of naive Bayes, the paper revisits the classical topic on discriminative vs. generative classifiers. Theoretically, the paper considers the surrogate loss instead of the zero-one loss in analyses and generalizes the classical results from binary cases to multiclass ones. We show that, under mild assumptions, multiclass naive Bayes requires $O(\log n)$ samples to approach its asymptotic error while the corresponding multiclass logistic regression requires $O(n)$ samples, where $n$ is the feature dimension. To establish it, we present a multiclass $\mathcal{H}$-consistency bound framework and an explicit bound for logistic loss, which are of independent interests. Simulation results on a mixture of Gaussian validate our theoretical findings. Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases. Besides, naive Bayes shows promise in few-shot cases and we observe the "two regimes" phenomenon in pre-trained supervised models. Our code is available at https://github.com/ML-GSAI/Revisiting-Dis-vs-Gen-Classifiers.

研究动机与目标

在深度线性评估的背景下重新考察经典的判别式与生成式分类器比较。
将 Ng & Jordan (2001) 的结果从二分类推广到多分类设置。
引入多类 H-consistency 框架并推导逻辑损失的明确界限。
在合成混合与预训练深度视觉模型的跨数据集上经验性验证理论结果。

提出的方法

建立一个多类 H-consistency 界框架，将代理损失与零一损失联系起来。
推导多类的逻辑损失和零一损失的显式界限（定理3.3）。
分析样本复杂性：朴素贝叶斯需要 O(log n) 个样本，而逻辑回归需要 O(n) 个样本（定理3.2和3.4）。
定义成对激活与判错间隙结构（如 Δa_Gen、G̃(τ)），以界定训练样本数量的影响。
在分布假设上作出温和的假设，并利用集中性工具来界定估计差距。
通过高斯混合分布的仿真和在 CIFAR-10/100 上的深度模型实验来验证理论。

实验结果

研究问题

RQ1在深度表示中，使用代理损失时多类朴素贝叶斯相对于多类逻辑回归的样本效率如何？
RQ2是否可以将 H-consistency 界扩展到多类设置并获得显式的逻辑损失界？
RQ3深度表示是否在判别式与生成式分类器之间呈现两段式现象，预训练模式如何影响这一点？
RQ4在 CIFAR-10/100 的线性评估设置下，针对不同预训练骨干网络，这些理论结果如何体现？

主要发现

多类朴素贝叶斯在 O(log n) 个样本下收敛到其渐近误差，而多类逻辑回归需要 O(n) 个样本。
建立了一个多类 H-consistency 框架以及对逻辑损失的显式界，能够实现与分布无关的零一损失控制。
带高斯混合的仿真验证了理论的样本复杂度结果。
在 CIFAR-10/100 上使用多种预训练视觉模型的经验结果表明，朴素贝叶斯在数据增多时收敛更快，且在监督预训练模型中观察到两段式现象。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。