QUICK REVIEW

[論文レビュー] Classification with Deep Neural Networks and Logistic Loss

Zihan Zhang, Lei Shi|arXiv (Cornell University)|Jul 31, 2023

Stochastic Gradient Optimization Techniques被引用数 8

ひとこと要約

この論文は、完全連結のReLU深層ニューラルネットワーク分類器をロジスティック（クロスエントロピー）損失で訓練した際の、ターゲットが無界であっても厳密な収束率を提供する、一般化解析の新たなoracle型手法を開発する。

ABSTRACT

Deep neural networks (DNNs) trained with the logistic loss (i.e., the cross entropy loss) have made impressive advancements in various binary classification tasks. However, generalization analysis for binary classification with DNNs and logistic loss remains scarce. The unboundedness of the target function for the logistic loss is the main obstacle to deriving satisfactory generalization bounds. In this paper, we aim to fill this gap by establishing a novel and elegant oracle-type inequality, which enables us to deal with the boundedness restriction of the target function, and using it to derive sharp convergence rates for fully connected ReLU DNN classifiers trained with logistic loss. In particular, we obtain optimal convergence rates (up to log factors) only requiring the Hölder smoothness of the conditional class probability $η$ of data. Moreover, we consider a compositional assumption that requires $η$ to be the composition of several vector-valued functions of which each component function is either a maximum value function or a Hölder smooth function only depending on a small number of its input variables. Under this assumption, we derive optimal convergence rates (up to log factors) which are independent of the input dimension of data. This result explains why DNN classifiers can perform well in practical high-dimensional classification problems. Besides the novel oracle-type inequality, the sharp convergence rates given in our paper also owe to a tight error bound for approximating the natural logarithm function near zero (where it is unbounded) by ReLU DNNs. In addition, we justify our claims for the optimality of rates by proving corresponding minimax lower bounds. All these results are new in the literature and will deepen our theoretical understanding of classification with DNNs.

研究の動機と目的

深層ニューラルネットワークをロジスティック損失（クロスエントロピー）で訓練した2値分類の動機づけと分析。
無界なターゲット関数の問題を克服して厳密な一般化境界を導出。
ホlder滑らかさと組成仮定の下での収束率を提供。
ミニマックス下界を通じた最適性を示し、高次元データへの示唆を議論。

提案手法

ターゲット関数が有界である必要を課さずに過剰phiリスクを上に抑えるoracle型不等式を開発。
ロジスティック損失と対応する較正不等式を用いてphiリスクを誤分類リスクへ結ぶ。
経験的ロジスティックリスク最小化を通じて訓練された全結合ReLU DNN分類器の過剰ロジスティックリスクの収束率を確立。
条件付き確率関数ηに対する組成仮定を導入して次元依存を抑える収束率を達成。
自然対数を0付近でReLU DNNsによって厳密に近似する誤差界を導出し、最適性を裏付けるミニマックス下界を証明。
深さ/幅が有界でパラメータノルムの空間を特徴付ける全結合ReLUネットワークの空間を表現。

実験結果

リサーチクエスチョン

RQ1ロジスティック損失で訓練されたDNN分類器に対して、ターゲットを有界としなくてもどんな一般化境界が確立できるか？
RQ2ηのホlder滑らかさと入力次元を減らす組成仮定の下で、最適な収束率はいくらか？
RQ3部分的滑らかさの境界条件やマージン/ノイズ条件の下で次元依存を抑えた次元自由な収束率は得られるか？
RQ4この問題のミニマックス下界と比較して、収束率の緊密さはどれくらいか？

主な発見

ターゲット関数の有界性要件を撤去するロジスティック損失設定のoracle型不等式を確立。
ηのホlder滑らかさの下でFNN分類器の過剰ロジスティックリスクの最適収束率を導出：（log n)^5 / n を β/(β+d) に。
較正不等式を用いて誤分類過剰リスクの境界を対数因子までほぼ最適とする。
組成的η構造の下で次元自由な収束率を示し、入力次元dに依存しない。
厳密な自然対数近似誤差界を提供し、それに対応するミニマックス下界を証明して最適性を確認。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。