QUICK REVIEW

[论文解读] Identification of Shallow Neural Networks by Fewest Samples.

Massimo Fornasier, Jan Vybíral|arXiv (Cornell University)|Apr 4, 2018

Advanced Neural Network Applications被引用 2

一句话总结

该论文提出了一种方法，在温和光滑性和近正交性假设下，仅使用最少数量的随机样本，识别浅层神经网络（即径向基函数之和）。通过依次近似径向方向的张成空间，利用替换法降低维度，并通过谱范数最大化识别秩-1矩阵，该方法实现了高概率的统一逼近，且通过二阶微分将权重恢复与矩阵张量分解联系起来。

ABSTRACT

We address the uniform approximation of sums of ridge functions $\sum_{i=1}^m g_i(a_i\cdot x)$ on ${\mathbb R}^d$, representing the shallowest form of feed-forward neural network, from a small number of query samples, under mild smoothness assumptions on the functions $g_i$'s and near-orthogonality of the ridge directions $a_i$'s. The sample points are randomly generated and are universal, in the sense that the sampled queries on those points will allow the proposed recovery algorithms to perform a uniform approximation of any sum of ridge functions with high-probability. Our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We first approximate the span of the ridge directions. Then we use a straightforward substitution, which reduces the dimensionality of the problem from $d$ to $m$. The core of the construction is then the approximation of ridge directions expressed in terms of rank-$1$ matrices $a_i \otimes a_i$, realized by formulating their individual identification as a suitable nonlinear program, maximizing the spectral norm of certain competitors constrained over the unit Frobenius sphere. The final step is then to approximate the functions $g_1,\dots,g_m$ by $\hat g_1,\dots,\hat g_m$. Higher order differentiation, as used in our construction, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, we show that second order differentiation and tensors of order two (i.e., matrices) suffice.

研究动机与目标

通过最少数量的随机样本实现对浅层神经网络（径向基函数之和）的统一逼近。
仅通过少量查询，以高概率识别径向方向及其对应函数 $ g_i $。
通过二阶微分建立浅层神经网络权重识别与矩阵张量分解之间的联系。
开发一种通用采样策略，适用于任意满足温和正则性和几何条件的径向基函数之和。
通过张成空间近似与替换，将原始 $ d $-维问题降低为 $ m $-维问题。

提出的方法

使用随机样本和线性代数技术近似径向方向 $ a_i $ 的张成空间。
通过利用张成空间近似的替换，将问题维度从 $ d $ 降低至 $ m $。
将每个径向方向的识别表述为在单位Frobenius球面上最大化竞争矩阵谱范数的非线性规划问题。
将每个径向方向表示为秩-1矩阵 $ a_i \times a_i $，从而实现基于矩阵的优化。
利用二阶微分将权重识别与张量分解联系起来，特别针对秩-2张量（矩阵）。
在方向恢复后，通过近似 $ \hat g_1, \dots, \hat g_m $ 重构函数 $ g_1, \dots, g_m $。

实验结果

研究问题

RQ1能否仅使用少量随机样本，以高概率识别浅层神经网络？
RQ2在温和光滑性和近正交性假设下，如何从有限数据中恢复径向方向 $ a_i $？
RQ3二阶微分在将神经网络权重识别与矩阵张量分解联系起来的过程中起什么作用？
RQ4在保持逼近精度的前提下，问题的维度能在多大程度上被降低？
RQ5能否构建一种通用采样策略，使其在所有径向基函数之和上均适用？

主要发现

所提出的算法仅使用少量随机生成的样本，即可以高概率实现对任意径向基函数之和的统一逼近。
成功地从样本中近似了径向方向的张成空间，从而实现了从 $ d $ 维到 $ m $ 维的有效降维。
每个径向方向 $ a_i $ 的识别被表述为在单位Frobenius球面上的谱范数最大化问题，确保了鲁棒性与收敛性。
二阶微分足以将浅层神经网络权重恢复与秩-2张量（矩阵）的分解联系起来。
函数 $ g_i $ 的最终逼近通过 $ \hat g_i $ 实现，整个流程在给定假设下确保了高概率恢复。
该方法具有通用性，即相同的采样策略适用于所有此类网络，而无需事先了解函数或方向信息。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。