QUICK REVIEW

[논문 리뷰] Classification with Deep Neural Networks and Logistic Loss

Zihan Zhang, Lei Shi|arXiv (Cornell University)|2023. 07. 31.

Stochastic Gradient Optimization Techniques인용 수 8

한 줄 요약

이 논문은 이진 분류에서 로지스틱(크로스엔트로피) 손실로 학습된 완전히 연결된 ReLU DNN 분류기에 대해 오라클 타입 일반화 분석을 새롭게 제시하여 대상이 무한대인 경우에도 엄격한 수렴 속도를 제공합니다.

ABSTRACT

Deep neural networks (DNNs) trained with the logistic loss (i.e., the cross entropy loss) have made impressive advancements in various binary classification tasks. However, generalization analysis for binary classification with DNNs and logistic loss remains scarce. The unboundedness of the target function for the logistic loss is the main obstacle to deriving satisfactory generalization bounds. In this paper, we aim to fill this gap by establishing a novel and elegant oracle-type inequality, which enables us to deal with the boundedness restriction of the target function, and using it to derive sharp convergence rates for fully connected ReLU DNN classifiers trained with logistic loss. In particular, we obtain optimal convergence rates (up to log factors) only requiring the Hölder smoothness of the conditional class probability $η$ of data. Moreover, we consider a compositional assumption that requires $η$ to be the composition of several vector-valued functions of which each component function is either a maximum value function or a Hölder smooth function only depending on a small number of its input variables. Under this assumption, we derive optimal convergence rates (up to log factors) which are independent of the input dimension of data. This result explains why DNN classifiers can perform well in practical high-dimensional classification problems. Besides the novel oracle-type inequality, the sharp convergence rates given in our paper also owe to a tight error bound for approximating the natural logarithm function near zero (where it is unbounded) by ReLU DNNs. In addition, we justify our claims for the optimality of rates by proving corresponding minimax lower bounds. All these results are new in the literature and will deepen our theoretical understanding of classification with DNNs.

연구 동기 및 목표

로지스틱 손실(크로스 엔트로피)로 학습된 심층 신경망을 이용한 이진 분류를 동기 부여하고 분석한다.
무한한 대상 함수 문제를 극복하여 엄격한 일반화 경계를 도출한다.
홀더 매끄러움성과 구성 가정하에서 수렴 속도를 제공한다.
최적성을 미니맥스 하한으로 보여주고 고차원 데이터에 대한 함의를 논의한다.]
method: [
Develop an oracle-type inequality to bound excess phi-risk without requiring the target function to be bounded.
Use the logistic loss and an associated calibration inequality to connect phi-risk to misclassification risk.
Establish convergence rates for the excess logistic risk of fully connected ReLU DNN classifiers trained via empirical logistic risk minimization.
Introduce a compositional assumption on the conditional probability function eta to achieve dimension-free rates.
Derive a tight error bound for approximating the natural logarithm near zero by ReLU DNNs and prove minimax lower bounds for optimality.
Characterize spaces of fully connected ReLU networks with bounded depth/width and parameter norms.

제안 방법

오라클-유형 불평등을 개발하여 대상函数가 유계일 필요 없이 초과 phi-위험을 상한합니다.
로지스틱 손실과 관련 보정 부등식을 사용하여 phi-위험을 오분류 위험과 연결합니다.
경험적 로지스틱 위험 최소화로 학습된 완전 연결 ReLU DNN 분류기의 초과 로지스틱 위험에 대한 수렴 속도를 확립합니다.
조건부 확률 함수 eta에 구성 가정을 도입하여 차원 독립 속도를 달성합니다.
제로에 가까운 영역에서 자연로그를 근사하는 엄밀한 오차 상한을 도출하고 최적성을 확인하는 미니맥스 하한을 증명합니다.
깊이/너비가 한정되고 매개변수 노름이 제약된 완전 연결 ReLU 네트워크의 공간을 특징지습니다.

실험 결과

연구 질문

RQ1로지스틱 손실로 학습된 DNN 분류기의 일반화 경계는 대상 함수가 유계가 아니어도 설정될 수 있는가?
RQ2에타의 홀더 매끄러움에 따른 최적 수렴 속도와 입력 차원 의존도를 줄이는 구성 가정하에서의 속도는 어떠한가?
RQ3조각별로 매끄러운 경계나 여유/잡음 조건 하에서 차원 독립 속도가 도출될 수 있는가?
RQ4이들 속도가 이 문제에 대한 미니맥스 하한에 비해 얼마나 촘촘한가?

주요 결과

로지스틱 로손-손실 설정에 대해 대상 함수의 유계성 조건을 제거하는 오라클 타입 부등식을 확립했다.
에타의 홀더 매끄러움에 따른 초과 로지스틱 위험에 대한 최적 수렴 속도를 도출했다: 속도는 (log n)^5 / n에서 베타/(베타+d)로의 수렴.
보정 부등식을 통해 로그-혼합 오차의 초과 분류 오류율 상한이 거의 최적에 근접하도록 얻어졌다.
구성적 eta 구조 하에서 차원 독립 속도가 도출되며 입력 차원 d에 독립적이다.
엄밀한 로그 근사 오차 상한을 제공하고 최적성을 확인하는 미니맥스 하한을 제시했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.