QUICK REVIEW

[論文レビュー] Revisiting Discriminative vs. Generative Classifiers: Theory and Implications

Chenyu Zheng, Guoqiang Wu|arXiv (Cornell University)|Feb 5, 2023

Generative Adversarial Networks and Image Synthesis被引用数 8

ひとこと要約

要約: 本論文は深層表現学習における多クラスの識別的分類器と生成的分類器を比較し、朴素ベイズが必要とするサンプル数が O(log n) 程度で済む場合がある一方でロジスティック回帰は O(n) を要することを実証する。多クラス H-整合性フレームワークを構築し、経験的に所見を検証する。

ABSTRACT

A large-scale deep model pre-trained on massive labeled or unlabeled data transfers well to downstream tasks. Linear evaluation freezes parameters in the pre-trained model and trains a linear classifier separately, which is efficient and attractive for transfer. However, little work has investigated the classifier in linear evaluation except for the default logistic regression. Inspired by the statistical efficiency of naive Bayes, the paper revisits the classical topic on discriminative vs. generative classifiers. Theoretically, the paper considers the surrogate loss instead of the zero-one loss in analyses and generalizes the classical results from binary cases to multiclass ones. We show that, under mild assumptions, multiclass naive Bayes requires $O(\log n)$ samples to approach its asymptotic error while the corresponding multiclass logistic regression requires $O(n)$ samples, where $n$ is the feature dimension. To establish it, we present a multiclass $\mathcal{H}$-consistency bound framework and an explicit bound for logistic loss, which are of independent interests. Simulation results on a mixture of Gaussian validate our theoretical findings. Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases. Besides, naive Bayes shows promise in few-shot cases and we observe the "two regimes" phenomenon in pre-trained supervised models. Our code is available at https://github.com/ML-GSAI/Revisiting-Dis-vs-Gen-Classifiers.

研究の動機と目的

深層線形評価の文脈で古典的な識別的分類器と生成的分類器の比較を再検討する。
Ng & Jordan (2001) の結果を二項設定から多クラス設定へ一般化する。
多クラス H-整合性フレームワークを導入し、ロジスティック損失の明示的境界を導く。
合成混合データと事前学習済み深層ビジョンモデルを用いた実証実験で理論結果を検証する。

提案手法

代替損失をゼロ対一損失へ関係づける多クラス H-整合性境界フレームワークを開発する。
ロジスティック損失とゼロ対一損失の明示的多クラス境界を導出する（定理3.3）。
サンプル複雑さを分析する：朴素ベイズは O(log n) サンプル、ロジスティック回帰は O(n) サンプルを要する（定理3.2と3.4）。
対の活性化と誤分類ギャップの構成概念（例：Δa_Gen, G̃(τ)）を定義し、訓練サンプルの影響を境界づける。
緩やかな分布仮定を置き、濃度推定ギャップを境界づけるために集中化ツールを活用する。
シミュレーション（ガウス混合）と CIFAR-10/100 の深層モデル実験で理論を検証する。

実験結果

リサーチクエスチョン

RQ1深層表現における surrogate loss の下で多クラス Naïve Bayes と多クラスロジスティック回帰の相対的なサンプル効率はどの程度か。
RQ2H-整合性境界を多クラス設定へ拡張し、明示的なロジスティック損失境界を得られるか。
RQ3深層表現は識別的 vs 生成的分類器の二つのレジーム現象を示すか、事前訓練モードはそれにどう影響するか。
RQ4CIFAR-10/100 の線形評価設定で、さまざまな事前学習済みバックボーンとともにこれらの理論結果がどう現れるか。

主な発見

多クラス Naïve Bayes は O(log n) サンプルで漸近的誤差に収束する一方、多クラスロジスティック回帰は O(n) サンプルを要する。
多クラス H-整合性フレームワークとロジスティック損失の明示的境界が確立され、分布に依存しないゼロ対一損失制御を可能にする。
ガウス混合でのシミュレーションは理論的サンプル複雑さの結果を検証する。
CIFAR-10/100 で複数の事前学習済みビジョモデルを用いた経験的結果は、データが増加するにつれて Naïve Bayes が一貫して速く収束することを示し、教師あり事前学習モデルにおける二レジーム現象を指摘する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。