QUICK REVIEW

[論文レビュー] Identification of Shallow Neural Networks by Fewest Samples.

Massimo Fornasier, Jan Vybíral|arXiv (Cornell University)|Apr 4, 2018

Advanced Neural Network Applications被引用数 2

ひとこと要約

本稿では、弱い滑らかさおよびほぼ直交性の仮定の下で、最小限のランダムサンプルを用いて浅いニューラルネットワーク（リッジ関数の和）を同定する手法を提案する。段階的なリッジ方向の部分空間近似、変数変換による次元削減、およびスペクトルノルム最大化によるランク1行列の同定を通じて、高い確率で一様近似が可能となり、2階微分を介して重み回復と行列テンソル分解を結びつける。

ABSTRACT

We address the uniform approximation of sums of ridge functions $\sum_{i=1}^m g_i(a_i\cdot x)$ on ${\mathbb R}^d$, representing the shallowest form of feed-forward neural network, from a small number of query samples, under mild smoothness assumptions on the functions $g_i$'s and near-orthogonality of the ridge directions $a_i$'s. The sample points are randomly generated and are universal, in the sense that the sampled queries on those points will allow the proposed recovery algorithms to perform a uniform approximation of any sum of ridge functions with high-probability. Our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We first approximate the span of the ridge directions. Then we use a straightforward substitution, which reduces the dimensionality of the problem from $d$ to $m$. The core of the construction is then the approximation of ridge directions expressed in terms of rank-$1$ matrices $a_i \otimes a_i$, realized by formulating their individual identification as a suitable nonlinear program, maximizing the spectral norm of certain competitors constrained over the unit Frobenius sphere. The final step is then to approximate the functions $g_1,\dots,g_m$ by $\hat g_1,\dots,\hat g_m$. Higher order differentiation, as used in our construction, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, we show that second order differentiation and tensors of order two (i.e., matrices) suffice.

研究の動機と目的

最小限のランダムサンプルから、浅いニューラルネットワーク（リッジ関数の和）の一様近似を可能にする。
わずかなクエリ数でのみ、リッジ方向および対応する関数 $ g_i $ を高い確率で同定する。
2階微分を介して、浅いニューラルネットワークの重み同定と行列テンソル分解の間の関係を確立する。
任意のリッジ関数の和に適用可能な普遍的なサンプリング戦略を構築する。
リッジ方向の部分空間近似と変換を用いて、元の $ d $ 次元問題を $ m $ 次元問題に削減する。

提案手法

ランダムサンプルと線形代数的手法を用いて、リッジ方向 $ a_i $ の部分空間を近似する。
部分空間近似を活用した変換により、問題の次元を $ d $ から $ m $ に削減する。
各リッジ方向の同定を、単位フロベニウスノルム球上で競合行列のスペクトルノルムを最大化する非線形計画問題として定式化する。
各リッジ方向をランク1行列 $ a_i imes a_i $ として表現し、行列ベースの最適化を可能にする。
2階微分を用いて、重み同定をランク2テンソル（行列）の分解に結びつける。
方向回復後に、関数 $ g_1, \dots, g_m $ を $ \hat g_1, \dots, \hat g_m $ として近似することで再構築する。

実験結果

リサーチクエスチョン

RQ1弱い滑らかさおよびほぼ直交性の仮定の下で、わずかなランダムサンプル数でのみ、浅いニューラルネットワークを高い確率で同定できるか？
RQ2わずかなデータから、弱い滑らかさおよびほぼ直交性の仮定の下で、リッジ方向 $ a_i $ をどのように回復できるか？
RQ32階微分は、ニューラルネットワークの重み同定と行列テンソル分解を結びつける役割を果たすか？
RQ4近似精度を保ちながら、問題の次元をどの程度まで削減できるか？
RQ5すべてのリッジ関数の和に一様に適用可能な普遍的なサンプリング戦略を構築できるか？

主な発見

提案手法は、わずかなランダムサンプル数でのみ、任意のリッジ関数の和の高確率での一様近似を達成する。
リッジ方向の部分空間がサンプルからうまく近似され、$ d $ から $ m $ への有効な次元削減が実現された。
各リッジ方向 $ a_i $ の同定は、単位フロベニウスノルム球上でスペクトルノルムを最大化する問題として定式化され、ロバスト性と収束性が保証された。
2階微分で十分であり、浅いニューラルネットワークの重み回復をランク2テンソル（行列）の分解に結びつけることができる。
関数 $ g_i $ の最終的な近似は $ \hat g_i $ を通じて達成され、全パイプラインが提示された仮定のもとで高確率での回復を保証する。
本手法は普遍的であり、関数や方向の事前知識がなくても、すべてのネットワークに同じサンプリング戦略が適用可能である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。