QUICK REVIEW

[논문 리뷰] Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

Sumio Watanabe|arXiv (Cornell University)|2010. 04. 14.

Machine Learning and Algorithms참고 문헌 46인용 수 2,337

한 줄 요약

단일 학습 모델에서 Bayes cross-validation loss와 WAIC는 확률 변수로서 점근적으로 서로 같아지며, 이 둘의 합과 Bayes generalization error의 합은 real log canonical threshold와 특이 구조에 의해 지배된다.

ABSTRACT

In regular statistical models, the leave-one-out cross-validation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the cross-validation remains unknown. In previous studies, we established the singular learning theory and proposed a widely applicable information criterion, the expectation value of which is asymptotically equal to the average Bayes generalization loss. In the present paper, we theoretically compare the Bayes cross-validation loss and the widely applicable information criterion and prove two theorems. First, the Bayes cross-validation loss is asymptotically equivalent to the widely applicable information criterion as a random variable. Therefore, model selection and hyperparameter optimization using these two values are asymptotically equivalent. Second, the sum of the Bayes generalization error and the Bayes cross-validation error is asymptotically equal to $2λ/n$, where $λ$ is the real log canonical threshold and $n$ is the number of training samples. Therefore the relation between the cross-validation error and the generalization error is determined by the algebraic geometrical structure of a learning machine. We also clarify that the deviance information criteria are different from the Bayes cross-validation and the widely applicable information criterion.

연구 동기 및 목표

“AIC/BIC가 특이 모델에서 불충분하고 신뢰할 수 있는 일반화 오차 추정이 필요하다”는 연구 동기를 제시한다.
특이 학습 이론에서 Bayes cross-validation과 WAIC를 정의하고 이들의 점근적 거동을 확립한다.
대수기하적 불변량을 통해 교차 검증, WAIC, Bayes 일반화 오차 간의 관계를 특징짓는다.
실수 로그-정합 임계값 및 특이 요동이 일반화 및 CV 오차의 점근적 거동을 결정하는지 조사한다.

제안 방법

사전분포, 후방확률, 예측분포를 포함하는 Bayes 학습 프레임워크를 정의한다.
교차 검증 손실 CVL(n)과 후방 기대값으로의 leave-one-out 구성을 도입한다.
Yk(n) (k=1..4)로 표현된 CVL(n)과 WAIC(n)을 함수적 누적량과 생성 함수로 표현한다.
정리 1과 2를 증명한다: (i) CVL(n)과 WAIC(n)이 Op(1/n^2)까지 동일한 전개를 공유한다; (ii) 합 B g(n)+Cv(n)가 2λ/(β n)과 특이 fluctuation ν를 포함하는 항으로 수렴한다.
결과를 real log canonical threshold λ 및 모형의 birational invariants와 관련시킨다.

실험 결과

연구 질문

RQ1Bayes cross-validation loss와 WAIC가 특이 학습 모델에서 확률 변수로서 점근적으로 서로 동일한가?
RQ2Bayes 일반화 오차, 교차 검증 오차, WAIC가 real log canonical threshold λ 및 특이 fluctuation ν를 통해 어떻게 관련되는가?
RQ3모형의 대수기하학적 구조(λ, ν)가 이 기준들의 점근적 거동에서 어떤 역할을 하는가?

주요 결과

Bayes cross-validation loss와 WAIC는 확률 변수로서 점근적으로 서로 같아진다(C v L(n) = WAIC(n) + Op(n^(-3/2)); β=1일 때 Op(n^(-2)).
CVL(n)과 WAIC(n) 모두 같은 기능적 누적량 Y1(n), Y2(n), Y3(n)에 의해 지배되는 전개를 보인다.
Bayes 일반화 오차와 교차 검증 오차의 합은 B g(n)+C v(n) = (β−1)V(n)/n + 2λ/(β n) + o p(1/n); β=1일 때 이는 2λ/n + o p(1/n)으로 축소된다.
실수 로그-정합 임계값 λ와 특이 fluctuation ν는 이 점근적 행동을 좌우하는 birational invariants로, CV/WAIC를 모형의 대수적 구조와 연결한다.
Corollary 1은 C v L(n) = WAIC(n) + Op(n^(-3/2))를 나타내며, β=1일 때는 Op(n^(-2))이다.
본 논문은 이 특이 설정에서 deviance information criteria가 Bayes CV 및 WAIC와 다름을 명확히 한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.