QUICK REVIEW

[論文レビュー] Deep learning generalizes because the parameter-function map is biased towards simple functions

Guillermo Valle-Pérez, Chico Q. Camargo|arXiv (Cornell University)|May 22, 2018

Gaussian Processes and Bayesian Inference参考文献 87被引用数 85

ひとこと要約

本論文は、DNNのパラメータ-関数写像が指数的に単純な関数へ偏っていると主張し、それが内部的正則化として機能し、優れた一般化を可能にする。さらにこの偏りを、アルゴリズム情報理論とガウス過程PAC-Bayes境界を用いて一般化性能と結びつける。

ABSTRACT

Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target functions in many real problems are expected to be highly structured, this intrinsic simplicity bias helps explain why deep networks generalize well on real world problems. This picture also facilitates a novel PAC-Bayes approach where the prior is taken over the DNN input-output function space, rather than the more conventional prior over parameter space. If we assume that the training algorithm samples parameters close to uniformly within the zero-error region then the PAC-Bayes theorem can be used to guarantee good expected generalization for target functions producing high-likelihood training sets. By exploiting recently discovered connections between DNNs and Gaussian processes to estimate the marginal likelihood, we produce relatively tight generalization PAC-Bayes error bounds which correlate well with the true error on realistic datasets such as MNIST and CIFAR10 and for architectures including convolutional and fully connected networks.

研究の動機と目的

AITベースの境界を用いて、DNNのパラメータ-関数写像が単純な関数に偏っていると主張する。
MNIST、CIFAR-10、およびブールタスクで、小型DNNと大規模アーキテクチャ（CNN、FCN）における実証的な単純さバイアスを示す。
ガウス過程で推定された入力-出力関数の事前分布を用いたPAC-Bayesフレームワークを導入し、一般化を評価・上界する。
GPによる周辺尤度の推定がニューラルネットの挙動を概ね再現し、アーキテクチャやデータセットを横断して有用な一般化境界を生み出すことを示す。

提案手法

ニューラルモデルのパラメータ-関数写像 M: Θ -> F を定義し、その単純さバイアスを分析する。
アルゴリズム情報理論の確率-複雑さ境界を適用して、関数の確率と記述的複雑さ K(f) を関連づける。
パラメータをサンプリングし関数頻度を数えることで、離散的なブール関数DNNの P(f) を経験的に推定する。
ガウス過程(GP)近似を用いて関数の事前分布 P(f) を推定し、学習データ U の周辺尤度 P(U) を計算する。
GPベースの事前分布を用いたPAC-Bayes境界を適用して、データセット横断で実際の一般化誤差を追跡する期待一般化境界を得る。
GPベースの周辺尤度を経験的NN確率と比較してGP近似を検証する。

実験結果

リサーチクエスチョン

RQ1DNNsのパラメータ-関数写像は単純な関数へ強い偏りを示すか？
RQ2アルゴリズム情報理論と関数空間事前分布（ガウス過程を介して）を用いたPAC-Bayes境界が、過parameter化 network の観測される一般化を説明できるか？
RQ3経験的指標（例：Lempel-Ziv 複雑性）が、ランダムパラメータサンプリング下での関数確率と相関するか？
RQ4GP近似の事前分布は、実データセットに対して有意義な一般化境界を生み出すだけのニュー NN 周辺尤度を再現できるか？
RQ5SGD様訓練とGPベースのベイズサンプリングは、単純で高確率な関数へ最適化がバイアスするという解釈を裏付けるか？

主な発見

DNNsのパラメータ-関数写像は低次元/低複雑度の関数へ指数的に偏っており、P(f) の分布が極めて歪んでいる。
ブール関数DNNおよびより大規模なアーキテクチャ（CNNsとFCNs）での経験的実験は、高確率の関数が低いLempel-Ziv複雑性と低いKolmogorov様の複雑さ指標を示す。
ガウス過程近似は有限幅ネットワークのNN周辺尤度を正確に再現し、PAC-Bayes境界のための P(U) の実用的な推定を可能にする。
PAC-Bayes境界は、MNIST、Fashion-MNIST、CIFAR-10、およびCNNとFCアーキテクチャ全体で実際の一般化誤差を追跡する。
SGD様訓練とGPベースのベイズサンプリングは、単純で高確率な関数へ最適化がバイアスするという解釈を裏付ける。
提案された関数空間PAC-Bayes境界は、データセット間で観察される一般化の傾向と一致する比較的狭い一般化境界をもたらす。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。