QUICK REVIEW

[论文解读] Stochastic Weighted Function Norm Regularization.

Amal Rannen Triki, Maxim Berman|arXiv (Cornell University)|Oct 18, 2017

Stochastic Gradient Optimization Techniques参考文献 24被引用 1

一句话总结

本文提出了一种用于深度神经网络的新型随机正则化方法，通过基于采样的加权函数范数近似，证明了精确计算的NP难性，并为凸函数集合建立了𝒪(N⁻¹ᐟ²)的一般化边界。该方法通过随机梯度下降实现稳定训练，并在真实世界分类与分割任务中表现出改进的性能。

ABSTRACT

Deep neural networks (DNNs) have become increasingly important due to their excellent empirical performance on a wide range of problems. However, regularization is generally achieved by indirect means, largely due to the complex set of functions defined by a network and the difficulty in measuring function complexity. There exists no method in the literature for additive regularization based on a norm of the function, as is classically considered in statistical learning theory. In this work, we propose sampling-based approximations to weighted function norms as regularizers for deep neural networks. We provide, to the best of our knowledge, the first proof in the literature of the NP-hardness of computing function norms of DNNs, motivating the necessity of a stochastic optimization strategy. Based on our proposed regularization scheme, stability-based bounds yield a $\mathcal{O}(N^{-\frac{1}{2}})$ generalization error for our proposed regularizer when applied to convex function sets. We demonstrate broad conditions for the convergence of stochastic gradient descent on our objective, including for non-convex function sets such as those defined by DNNs. Finally, we empirically validate the improved performance of the proposed regularization strategy for both convex function sets as well as DNNs on real-world classification and segmentation tasks.

研究动机与目标

解决深度神经网络中缺乏基于函数范数的直接、可加正则化的问题，尽管其在统计学习理论中具有重要理论意义。
通过提出加权函数范数的基于采样的近似，克服深度神经网络中测量函数复杂度的计算不可行性。
在凸与非凸设置下，为泛化误差和随机优化的收敛性建立理论保证。
提供一种实用的正则化框架，提升真实世界深度学习任务中的泛化性能。
通过实证验证，证明基于函数范数的正则化在深度学习中的可行性与有效性。

提出的方法

提出基于采样的加权函数范数近似作为正则化项，使深度网络中的优化变得可行。
据我们所知，首次证明了深度神经网络中精确函数范数计算的NP难性，从而为使用随机近似提供了合理性。
为在所提正则化下凸函数集合的稳定性，推导出𝒪(N⁻¹ᐟ²)阶的一般化边界。
设计一种与随机梯度下降兼容的随机优化框架，确保在凸与非凸函数集合下的收敛性。
将正则化项集成到标准的分类与分割任务深度学习训练流程中。
使用蒙特卡洛采样来估计加权函数范数，从而实现可扩展且可微分的正则化。

实验结果

研究问题

RQ1尽管测量函数复杂度存在计算困难，基于函数范数的正则化是否能有效应用于深度神经网络？
RQ2深度神经网络的函数范数计算是否为NP难？这是否支持使用基于采样的近似？
RQ3对于所提正则化在凸函数集合中可导出的泛化误差边界是什么？
RQ4在由深度神经网络定义的非凸函数集合上，随机梯度下降是否能收敛于所提目标？
RQ5所提正则化是否能提升真实世界分类与分割任务中的泛化性能？

主要发现

证明了深度神经网络中函数范数计算的NP难性，从而支持采用基于采样的近似方法。
所提正则化在凸函数集合上实现了𝒪(N⁻¹ᐟ²)阶的泛化误差边界，提供了理论依据。
在使用所提正则化时，随机梯度下降在广泛条件下对凸与非凸函数集合均能收敛。
实证结果表明，该方法在真实世界分类与分割任务中性能有所提升，验证了正则化策略的有效性。
基于采样的加权函数范数近似，使得深度学习中实现了可扩展且可微分的正则化。
该方法提供了一种直接、可加的基于函数范数的正则化——填补了深度学习正则化文献中的空白。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。