QUICK REVIEW

[论文解读] Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

Ankit Pensia, Shashank Rajput|arXiv (Cornell University)|Jun 14, 2020

Artificial Intelligence in Games参考文献 37被引用 28

一句话总结

该论文证明，仅比目标网络宽对数倍——具体为 $ O(\log(dl)) $ 因子——的随机神经网络即可通过剪枝近似任意宽度为 $ d $、深度为 $ l $ 的全连接 ReLU 网络，从而弥合了先前多项式过度参数化理论边界与实际观察之间的差距。关键洞见在于将网络剪枝与随机子集和问题联系起来，证明对数过度参数化对于固定深度网络而言不仅充分，而且本质上是最优的。

ABSTRACT

The strong {\it lottery ticket hypothesis} (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. \cite{MalachEtAl20} establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width $d$ and depth $l$, by pruning a random one that is a factor $O(d^4l^2)$ wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(\log(dl))$ wider and twice as deep. Our analysis heavily relies on connecting pruning random ReLU networks to random instances of the extsc{SubsetSum} problem. We then show that this logarithmic over-parameterization is essentially optimal for constant depth networks. Finally, we verify several of our theoretical insights with experiments.

研究动机与目标

弥合强彩票彩票假设中理论过度参数化需求与实际观察之间的差距。
确定保证随机初始化网络中存在彩票彩票的最小过度参数化因子。
证明对数过度参数化足以通过剪枝近似任意目标 ReLU 网络，且几乎最优。
将随机 ReLU 网络剪枝问题与随机子集和问题联系起来，以进行理论分析。
提供一个与实验发现一致的理论基础，即仅需常数倍过度参数化即可实现高性能彩票彩票。

提出的方法

作者将 ReLU 网络的剪枝建模为一个随机子集和问题，其中每个权重对应求和中的一个数。
他们利用 Lueker (1998) 关于随机子集和的研究结果，证明 $ O(d \log(dl/\epsilon)) $ 个随机系数可在高概率下以 $ \epsilon $ 误差近似任意目标线性函数。
通过将网络分解为线性变换和 ReLU 非线性部分，将分析从线性函数扩展到深层 ReLU 网络。
他们证明，一个宽度为 $ O(\log(dl)) $ 倍目标宽度、深度为 $ 2l $ 的随机网络，可包含一个子网络，该子网络能以 $ \epsilon $ 误差近似目标网络的输出。
该证明依赖于集中不等式和子集和概率界限，以确保所有层中高概率成功。
他们进一步通过构造下界证明，对数因子对于固定深度网络是渐近最优的。

实验结果

研究问题

RQ1强彩票彩票假设的过度参数化需求能否从多项式降低到对数级别？
RQ2神经网络剪枝与随机子集和问题之间是否存在理论联系？
RQ3对数过度参数化是否足以保证任意目标 ReLU 网络中彩票彩票的存在性？
RQ4对数过度参数化因子对于固定深度网络是否最优？
RQ5理论分析如何与实际观察保持一致，即在仅略微过度参数化的网络中即可实现高精度彩票彩票？

主要发现

该论文证明，仅比目标网络宽 $ O(\log(dl)) $ 倍的随机网络，即可通过剪枝以 $ \epsilon $ 误差近似任意宽度为 $ d $、深度为 $ l $ 的全连接 ReLU 网络。
该对数过度参数化被证明对固定深度网络本质上是最优的，因为下界与上界仅相差常数因子。
该分析建立了剪枝 ReLU 网络与随机子集和问题之间的直接理论联系，利用了 Lueker (1998) 的结果。
作者证明，所需过度参数化比 Malach 等人提出的 $ O(d^4 l^2) $ 边界小了指数级，从而解决了长期存在的理论差距。
实验表明，剪枝算法的近似能力在很大程度上依赖于网络拓扑结构，ReLU 激活在已稀疏设置下会降低性能。
该工作表明，现有剪枝算法可能受益于源自子集和问题的理论洞见，从而可能实现更高效且可证明有效的剪枝策略。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。