QUICK REVIEW

[論文レビュー] User-friendly introduction to PAC-Bayes bounds

Pierre Alquier|arXiv (Cornell University)|Oct 21, 2021

Neural Networks and Applications被引用数 24

ひとこと要約

このチュートリアルは PAC-Bayes 境界を紹介し、単純〜高度な境界、それらの Catoni の方法と Donsker-Varadhan による導出、そしてそれらがランダム化された予測子および集約予測子へどのように適用されるかを概説し、深層学習との関連を示します。

ABSTRACT

Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution. Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds. Since the original PAC-Bayes bounds of D. McAllester, these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of O. Catoni that was missed by the community, and later rediscovered as "mutual information bounds"). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, "(Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights", organized by B. Guedj, F. Bach and P. Germain. One of the reason of this recent success is the successful application of these bounds to neural networks by G. Dziugaite and D. Roy. An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction.

研究の動機と目的

この分野を初めて学ぶ研究者のための、PAC-Bayes理論へのアクセスしやすい入門を提供する。
PAC-Bayes境界が和界の議論を有限集合から無限 predictor spaces に拡張する方法を説明する。
Catoni の枠組みを用いて、経験的および oracle PAC-Bayes境界を導出し、解釈する方法を示す。

提案手法

予測子空間に事前分布を固定し、データ依存後分布全体に対して一様に成立するリスク境界を導出する。
Hoeffding の不等式と Markov/ Chernoff 不等式を用いて、経験的リスクと真のリスクの偏差を上限化する。
Donsker-Varadhan の変分公式を適用して Gibbs 後方分布最小化問題を得る。
有限集合および一般の予測子集合に対する明示的な境界を導出し、Gibbs 後方分布の定式化を含む。
集約およびランダム化予測子への拡張、非 iid または厚尾分布設定への拡張（概要）について議論する。

実験結果

リサーチクエスチョン

RQ1単純な集中不等式から PAC-Bayes 境界をどのように導出できるか？
RQ2有限集合から無限集合へ境界を拡張する際の KL 発散の役割は何か？
RQ3PAC-Bayes フレームワーク下で Gibbs 後方分布がリスクを最小化するデータ依存分布としてどのように現れるか？
RQ4経験的リスク最小化と予測子の集約に対する PAC-Bayes 境界の含意は何か？
RQ5経験的 PAC-Bayes 境界は oracle PAC-Bayes 境界および高速収束率の結果とどう関連するか？

主な発見

単純な PAC-Bayes 境界（Catoni’s bound）は、任意の後方分布に対して、経験的リスクに加えて prior への KL 発散を含む項を伴い、平均リスクを上界する。
Gibbs 後方分布は Corollary 2.3 の意味で PAC-Bayes 境界を最小化し、経験的リスク、事前分布、複雑さを KL 発散を介して結びつける。
有限 Θ の場合、境界は ERM に類似した形を log( M ) 項と共に回復し、λ の選択を導き、明示的な収束率へと繋がる。
経験的境界は一般化を証明し ERM を動機づけるのに使える一方、oracle 境界はデータが増えるにつれて達成可能な性能の理論的限界を提供する。
この枠組みは深層学習の境界（非空の境界とデータ依存 prior）への橋渡しを構築し、他の関連アプローチ（MI 境界、ベイズ後方分布）とつながる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。