QUICK REVIEW

[論文レビュー] Approximation and Estimation for High-Dimensional Deep Learning Networks

Andrew R. Barron, Jason M. Klusowski|arXiv (Cornell University)|Sep 10, 2018

Machine Learning and Algorithms参考文献 35被引用数 42

ひとこと要約

本論文は、L1型ウェイト制御を持つディープ ramp ネットワークに対するリスク（平均二乗誤差）境界を導出し、パラメータ数に直接依存せず、log dと深さLに依存する minimax様のレートを示している。

ABSTRACT

It has been experimentally observed in recent years that multi-layer artificial neural networks have a surprising ability to generalize, even when trained with far more parameters than observations. Is there a theoretical basis for this? The best available bounds on their metric entropy and associated complexity measures are essentially linear in the number of parameters, which is inadequate to explain this phenomenon. Here we examine the statistical risk (mean squared predictive error) of multi-layer networks with $\ell^1$-type controls on their parameters and with ramp activation functions (also called lower-rectified linear units). In this setting, the risk is shown to be upper bounded by $[(L^3 \log d)/n]^{1/2}$, where $d$ is the input dimension to each layer, $L$ is the number of layers, and $n$ is the sample size. In this way, the input dimension can be much larger than the sample size and the estimator can still be accurate, provided the target function has such $\ell^1$ controls and that the sample size is at least moderately large compared to $L^3\log d$. The heart of the analysis is the development of a sampling strategy that demonstrates the accuracy of a sparse covering of deep ramp networks. Lower bounds show that the identified risk is close to being optimal.

研究の動機と目的

サンプル数より多くのパラメータを持つ高次元設定において、深層ネットワークがなぜ良く一般化するのかを動機づけ、定量化する。
複数層ネットワークの複雑さを捉えるために、variationとaverage variationの概念を導入・形式化する。
推定誤差とモデルの複雑さのバランスを取るために、疎近似量とカバー数の境界を構築する。
L1型ウェイト制御と ramp 活性化を用いたネットワークに対するリスク境界を確立する。
提案フレームワークの下で、ほぼ最適な minimax レートを示す。

提案手法

ramp 活性化を持つ深層ネットワークと、非負（または符号処理された）ウェイトをモデル化する。
サイズを定量化するために、ネットワ variation V_L とサブネットワーク変動 V_j^out, V_j^in、および平均変動 \u00061overline{V} を定義する。
f(W,x) を積構造のウェイト表現によって表現し、ウェイトの Markov様分解 a_{j1,...,jL} を導入する。
固定基数 M の乱択表現カバーによって疎近似量を構築し、カバー数の境界を導出する。
主リスク境界を証明する：合成変動 v = \u00061overline{V} sqrt{V} に対して、適切な確率測度の下で二乗誤差は (L v / sqrt{M})^2 に比例してスケールする。

実験結果

リサーチクエスチョン

RQ1パラメータノルムが制御されるとき、 ramp 活性化を持つ深層ネットワークに対する理論的なリスク保証は何か？
RQ2ネットワークのvariationをどのように定量化し活用して、疎近似と有利な一般化境界を実現できるか？
RQ3カバリング数境界が証明可能な疎ネットワーク近似を構築し、minimax様のレートを得られるか？
RQ4深さ L と入力次元 d が、L1型ペナリゼーション下の学習リスクにどのように影響するか？

主な発見

検討対象クラスに対するリスク境界は [(L^3 log d)/n]^{1/2} で上界され、適切な L および log d の因子がある場合、d が n に対して大きくても正確な推定を可能にする。
疎カバー推論により、対数基数が少なくとも (L-2)M log(min{d_bar, 2M}) + M log(8e d_in) 以下の部分集合が得られる。
主定理は、クラス内の任意の f(W,x) に対する誤差境界を、合成変動 v = overline{V} sqrt{V} として示し、提案フレームワーク下でほぼ minimax レートを示す。
下界は、特定されたリスクが定義されたモデルクラス内でほぼ最適に近いことを示している。
表現可能性と保存様の正準形は、層間ウェイトの流れを均衡させ、解析を容易にし境界を引き締める。
このアプローチは、パラメータ数ベースの指標ではなく、variationベースの複雑さ制御を強調し、高次元における一般化現象に対処する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。